Category: SLO’s

SCSM SLO’s 101

I’m frequently asked about SLO’s when I do consulting work and I realised that many people may not full understand how SLO’s work and the key pieces that have to be in place to not only get these to work as we expect but to do it efficiently so they do not adversely impact performance on our SCSM environment.

What is an SLO?

An SLO within ITIL is a contract or agreement negotiated between you as a service provider and your customer(s). An SLA describes the service and specifies your responsibilities that you will deliver to the customer. You might use a single SLA across several services or even customers, depending on your business model.

A simple example of an SLA might be that we agree to resolve a priority 1 rated incident in 4 hours.

A more complicated example might be that we agree to provide a 99.99% up time for a service.

What Components Make Up an SLO within SCSM?

To create an SLO within SCSM we need four components:

  1. A metric to measure
  2. A Queue to apply it to
  3. A calendar that defines our “Work Hours”
    and
  4. A time set against the metric

Creating a Metric in SCSM

A metric, within SCSM, is defining any two properties that can have time difference between them.

For example: The Creation time and Resolution time of an Incident or Service Request.

The Metric is used as the point of measure for the workflow to use when displaying or reacting to a warning or breach event.

Out of all the SLO’s I’ve seen, the most common two are IR First Contact and IR Resolution.

Creating a Queue in SCSM

Not all SLO’s apply to all Work Items.

To limit what SLO’s apply to what Work Items, we need to group together a bunch of Work Items that we want to apply the SLO to.

Creating a Queue is a way of being about to group together a given type of work item based on a criteria that you choose.

Common examples used for Queues are:

  • Priority based queues (P1, P2, P3 etc.)
  • Category based queues (Server, Desktop, Network etc.)

The most critical thing to watch when creating Queues is to ensure you select a class that has the minimum number o relationships your required to achieve your goal. Selecting the “Incident (Advanced)” combination class for all Incident based Queues is the leading cause of SCSM slowdowns that I have seen.

Creating a Calendar in SCSM

The calendar is used to ensure that the SLO is only calculated when support staff are at work and not over weekend or overnight. (If you don’t work in a 24×7 organization)

You can have multiple calendars if you have different support groups working different hours, but for most organizations there is a single support schedule that the entire team works to.

Creating an SLO in SCSM

To create an SLO you have to have all of the perquisites created and available.

The SLO is then just a case of selecting the time to set against the metric type and applying it to a given queue.

Within the SLO creation wizard you will be asked for both a warning time and breach time.

Warning time triggers an event at a given time before the SLO breaches allowing you to have an e-mail sent to the relevant parties to give them fair warning that the Work Item needs to be worked on.

Breach time triggers an event at the time of the breach and can be used to notify management or an escalation team if required.

How to (and how not to) Use SLO’s in Day-to-Day Operations?

In this authors opinion, for MOST organizations, SLO’s are not required and provide nothing more than a false sense of security in reports and a great source of anxiety for support staff.

I only advise customers to implement SLO’s if they have strict, contractually binding service levels that they must achieve under penalty of contract breach or financial fine.

If your organization wishes to use the SLO’s purely as a reporting measure after the fact, then I suggest you use some advance reporting features to tease this information out of the data after the fact rather than placing the stress of the SLO clock on the support staff.

In a future post I will also offer an opinion on why I believe SLO’s for most organizations are terrible and should be killed with fire……   But that’s another post 😉

Advertisements

Service Manager – Incident “Stop the clock”

 

A close friend and fellow SCSM nerd solved a commonly asked question about pausing the SLO clock for Incidents.
This blog post covers the solution he came up with.

Thanks Shayne for sharing your Stop the Clock solution

System Center User Community Newcastle

Stop the clock – Pause SLA

Hey Everyone,

I have been asked a few times if I can post a blog about my “Stop the clock” solution I put in at my previous job. So, here it is!!

There are a few prerequisites.

You need to create the Incident Status you want to be included for status changes, which will trigger the “Stop the clock” workflow. Once these are created, follow the steps below.

1: Create Custom MP.

2: Create Notification Subscription including queues (Incident P1,P2,P3,”Paused”) and what will kickoff workflow (status change from x to y). Important: Create in Custom MP.

3: Create SLO’s in Custom MP – Resolution Time P1, Resulution Time P2, Resolution Time P3, Response Time P1, Response Time P2, Response Time P3.

4: Export Custom MP.

5: Open in XML Editor.

6: Find line – “NotificationSubscription_’GUID ID of…

View original post 194 more words

Two Different SLO Warning and Breeched E-Mails for Two Different Metrics

I saw a question pop up on the Social TechNet forums the other day and I realised there is a bit of confusion about how to send a warning or breeched e-mail for different metrics such as Response Time and Resolution Time.

Let me explain the scenario:

You want to set SLO’s for Incidents based on two different metrics:

  • How long it takes the analyst to respond to the Affected user
  • How long it takes to resolve the Incident

Your SLO’s are:

  • Priority 1 Incidents – Response Time: Response 2 hours; Warning 1 hour.
  • Priority 1 Incidents – Resolution Time: Resolution 4 hours; Warning 3 hours.

You then want to notify the analyst when the warning time in reached. Both for the response SLO and the resolution SLO.

It seems to make sense that you would create two notification templates: One for response SLO warning and one for resolution SLO warning.

However, you Only create one E-Mail template Warning and one for Breech regardless of the SLO and then use the details to make the template more specific.

Let’s run through the whole process end to end:

First off you need to create the SLO’s for the targets you want.

SLO’s are made up of:

  • A Metric
  • A Queue
  • An SLO

Create a Metric for your Incidents that you want to trigger on.

clip_image001

For Example:

  • Title: Incident Resolution Time
  • Description: The time is takes from the Incident being created to the analyst first responding to the affected user.
  • Class: Incident
  • Start Date: Created Date
  • End Date: Resolved Date

Next Create the Queue:

clip_image002

For Example:

  • Title: Incident Priority 1
  • Class: Incident
  • Criteria: Trouble Ticket Priority Equals 1 AND Status Equals Active

clip_image003

Finally create the SLO’s:

clip_image004

For Example:

  • Title: Priority 1 Incident Response Time
  • Description: Please respond to the affected user and enter a comment in to the Incident that is marked as “First Response” before the target end date.
  • Class: Incident
  • Queues: Incident Priority 1
  • Service Level Criteria
    • Calendar: Normal Support Hours
    • Metric: Incident Resolution Time
    • Target: 10 Hours
    • Warning: 5 Hours

AND

For Example:

  • Title: Priority 1 Incident Resolution Time
  • Description: Please resolve this Incident before the Target End Date or escalate to higher support.
  • Class: Incident
  • Queues: Incident Priority 1
  • Service Level Criteria
    • Calendar: Normal Support Hours
    • Metric: Incident Resolution Time
    • Target: 10 Hours
    • Warning: 5 Hours

Exclamation-icon (32x32)
NOTE
: The description fields are important. This will become clear later.

Next, you will need to create the notification template you want to send out.

clip_image005

In this case you will only need 2:

  • Incident SLO Warning
  • Incident SLO Breeched

Both of these templates will have to use the “Service Level Instance Time Information” target class. Remember to change the drop down list of class type to “All basic classes”

clip_image006

For Example – The Warning template might be:

Greetings <AnalystFirstName>,

<IR###> has reached a warning state for <SLODisplayName>.
This work item will breech this SLO on <TargetEndDate>
<SLODescription>

Thanks,

Service Desk

clip_image007

This way whenever the SLO status changes to warning it will show the specifics of the Incident and the specifics of the SLO. This includes the description and display name fields that we can use to provide more detail.

From the input described above the actual E-Mail sent for an Incident would look like:

Greetings Brett,

IR1234 has reached a warning state for Incident .
This work item will breech this SLO on 31/03/2015 14:00
Please resolve this Incident before the Target End Date or escalate to higher support.

Thanks,
Service Desk

I hope this covers everything you need.