Where can I find ACCC Incident Management Process?
Provides details of the standard Incident Management process for ACCC.
Incident Management Process Team
Process Owner: Anthony Marino
Service Desk Managers:
- Client Service Solutions:
- General Service Desk: Fredy Amaya
- Learning Technologies: Jomit Joseph
- Network Engineering: Jelene Crehan
- Network Operations Center: Sidney Hood
- Unified Communications: Alice Wallace
- EAD: Dean Dang
- ACER: Himanshu Sharma
- Security: Esteban Perez
- PPMO: Mathew Willis
- Incident: An unplanned interruption to, or reduction in the quality of, an IT service. Failure of a configuration item that has not yet affected service is also an incident; for example, failure of one disk from a mirror set.
- Incident Management: The process responsible for managing the lifecycle of all Incidents. The primary purpose of Incident Management is to restore normal IT service operation as quickly as possible.
- Incident Model: A way of pre-defining the steps that should be taken to handle a process for dealing with a particular type of Incident in an agreed way.
- Incident Record: A record containing the details of an Incident. Each Incident record documents the lifecycle of a single Incident. Often referred to as incident ticket.
- Incident Status Tracking: Tracking Incidents throughout their lifecycle for proper handling and status reporting using indicators of New, Open, Stalled, Resolved, and (eventually) Closed.
Incident Management Process Roles
- Incident Management Process Owner:
- The person fulfilling this role is accountable for ensuring that the Incident Management process is being performed according to the agreed and documented process and is meeting the aims of the process definition.
- A person fulfilling this role must have visibility at an executive management level and must have the authority to ensure its success across ACCC
- There will be one Incident Management Process Owner for ACCC
- Assist with and ultimately be accountable for the process design and ensure that the Incident Management process is Fit for Purpose
- Provide guidance and approval for policies and standards to be employed throughout the process
- Provide guidance and authorize Key Performance Indicators (KPIs) and Critical Success Factors (CSF)
- Review KPIs and initiate appropriate action
- Periodically audit the process to ensure compliance to policy and standards
- Act as an escalation point to address any issues with the effectiveness of the process including integration issues between the various processes
- Review and approve opportunities for process enhancements and for improving the efficiency and effectiveness of the process
- Provide guidance and direction to enable the training, business understanding and knowledge required for process personnel
- Ensure proper communication for process information or changes, as appropriate, to ensure awareness
- Ensure the process is adopted within ACCC in accordance with the defined scope of the process
- Promote the process vision, goals and objectives within ACCC and the university
- Ensure on-going Service Improvement Program
- Over the Incident Manager(s) and other process roles where process activities are concerned
- To enforce Incident Management policy(s)
- To approve proposed changes to the process
- To recommend Service Management Supporting technology changes or enhancements to support the process.
- Recommend, approve and ensure process related training for ACCC employees and nominates staff to attend.
- Support the appropriate Director, for requests for additional resources for the Incident Process.
- Service Desk/Incident Manager:
- The Service Desk/Incident Manager is accountable to the Incident Management Process Owner and performs the day-to-day operational and managerial tasks demanded by the process activities.
- A person fulfilling this role must have visibility at the Senior Management level and must have the authority to manage the process within the respective group. Within the ACCC it is recommended that there is at least one Incident Management Process Manager within each group.
- Awareness of the University’s priorities, objectives and business drivers and the role that Incident Management plays in enabling those business objectives.
- Ensure and promote the correct usage of the Incident Management process, policies, procedures, tools.
- Ensure that the Incident Management Key Performance Indicators are met
- Ensure that the Incident Management process operates effectively and efficiently, and that ACCC staff for which they are accountable for are compliant with the process
- Produce regular and accurate management reports
- Identify process training needs
- Identify improvement opportunities to ensure that the process and tools are effective and efficient
- Identify the need for additional resources
- Function as a point of escalation for Incident Management Process Analysts and escalate to the process owner
- Work with Vendor Management to ensure Incident Management expectations and requirements are included or considered in contracts.
- Escalate to ACCC group management and the Incident Management Process Owner in the event of a conflict within the process
- Work with external process managers (problem, change, etc) to ensure proper integration of process hand-offs and interactions
- Manage and monitor the lifecycle status of all incidents within their scope of responsibility
- Support and fulfill the duties of the Incident Management Process Owner as directed and required
- Perform other process roles (Incident Analyst) as necessary
- To authorize, coordinate and manage activities throughout the Incident lifecycle, within their group
- Recommend service and process improvements
- Address escalated issues related to the day-to-day management of the process / incidents
- Service Owner:
- The service owner is accountable for the delivery of a specific IT service as a consumable service and represents the service to the clients and customers.
- A person fulfilling this role must have visibility at the executive management level and must have the authority to direct ACCC resources towards the successful delivery of the service.
- Designing request and incident models for the given service, to standardize processing of common requests and incidents.
- Reviewing service performance metrics to ensure service is meeting business needs.
- Working with campus constituents and ACCC resources to define Service Level Agreements that meet the business needs of the service's clients.
- [Additional detail needed here]
- Major Incident Commander:
- The Major Incident Commander is accountable for the lifecycle management of major incidents.
- A person fulfilling this role must have visibility at the Senior Management level and must have the authority to manage the process within ACCC.
- The Major Incident Commander can be either an:
- ACCC Director or designate
- Service Owner of the affected service
- To assist in determining the existence of a Major Incident, and / or to make the decision if a Major Incident exists
- To manage the lifecycle of the Major Incident through resolution
- Ensure the involvement of the proper resources and provide direction on necessary activities
- Ensure and/or perform stakeholder communication as needed
- Ensure hand-off to Problem Management upon resolution and closure of the Major Incident
- To engage and manage all ACCC resources needed to resolve a major incident
- To communicate with and escalate to ACCC Senior Management and key university stakeholders as needed
- Identify and recommend improvements to the major incident management process / procedures to the Incident Management Process Owner
- Service Desk Consultant:
- The Service Desk Consultant role describes the activities of the ACCC staff members performing service desk duties related to Incident Management
- ACCC staff members outside of the Service Desk, assume the role of the Service Desk where they are logging and initially managing incidents outside of the Service Desk.
- Log all relevant Incident details, categorize, and prioritize the incidents
- Understand and follow the process, procedures, work instructions, policies, and supporting tools in the management of incidents
- Ensure that the incidents are handled in such a way to ensure resolution or escalation within priority defined time frames, and to the satisfaction of the affected clients
- Take ownership, monitor the status of all incidents and communicate the status of incidents to clients as needed
- Validate resolution and close all resolved Incidents
- Conduct client satisfaction callbacks/surveys as agreed
- Identify and recommend improvements to the Incident Management Process
- Major Incident Response Team:
- The Major Incident Response Team (MIRT) is a fluid team whose membership is determined by a Major Incident Commander and the conditions of the specific Major Incident.
- Any ACCC staff member may be called to be a member of any given MIRT depending on the specifics of the Major Incident. They will be identified by the Major Incident Commander.
- If the identified Service Owner or their designate of the affected Service is not the Major Incident Commander, it is required that this person is a part of the MIRT.
- Communications team is required to be part of the MIRT.
- Attend all Major Incident Meetings as convened by the Major Incident Manager and necessitated by the conditions of the major incident
- Provide input and advice to the major incident with the goal of developing and implementing a resolution as quickly as possible
- Perform all assigned actions, in concert with other MIRT members, to investigate, diagnose and resolve the major incidents
- Provide communication to peers and management regarding the actions and status of the major incident
- The Communications role is accountable for keeping clients informed of the status of Major Incidents
- This role must be familiar with, and have access to, the various communication tools utilized by ACCC.
- Outside of business hours, this role is typically filled by the NOC.
- Draft and send client-facing mass communications during Major Incident.
- Draft response templates for individual incidents that are part of a Major Incident or Service Problem and make available to Service Desk Consultants
In order to have all ACCC staff working with the same understanding of what a ticket status means, the following definitions have been created:
- NEW: The initial status of a ticket when a client has contacted ACCC, before an ACCC employee has replied to or manipulated the ticket.
- OPEN: This status indicates that an ACCC employee has responded to the incident (e.g. informing the client that they are attempting to resolve the incident or that they have escalated the incident). The OPEN ticket status indicates that additional work is needed on the part of ACCC employees.
- STALLED: This status is used only when tickets require input from either the client who opened the ticket or a vendor that manages or maintains an associated system. In this case, the ACCC employee is no longer “working” on the ticket, and information necessary for resolution is in the hands of someone outside ACCC. An important reason to STALL a ticket is that it puts the time-to-resolution Service Level Agreement (SLA) on hold. Note that it is only appropriate to change a status to STALLED while waiting for vendor if the associated service’s SLA specifically mentions that incidents escalated to a vendor fall outside of normal SLA resolution times.
- RESOLVED: This ticket status should be used when an ACCC employee believes that the incident is resolved - meaning the client’s service has been restored to the defined SLA levels. When using the RESOLVED status the ACCC employee must include the standard response above their signature (Edit phone number and email address as appropriate for the respective service desk, if needed.).
Incident Management Process Overview
Incident Management Process Details
1.0 - Incident Identification
Description: Incidents may be identified by many sources, including but not limited to: clients, Service Providers, monitoring of key IT services and Service components.
2.0 - Incident Logging
Description: All relevant information relating to the nature of the Incident must be logged so that a full historical record is maintained. At a minimum, the following Incident details are input during initial Incident Recording:
- Unique reference number (RT Ticket)
- Date/time recorded
- Name and contact information and/or group recording the Incident
- Name and contact information of client
- Description of symptoms
- Activities already undertaken to resolve the Incident if appropriate
- Service specific details (e.g. MAC Address for Wired/Wireless Network)
3.0 - Incident Categorization & Prioritization
Description:Incidents are categorized to help with reporting, trend analysis and matching Incidents to Problems, Known Errors and validated workarounds. Incidents are prioritized by assessing impact and urgency. Priority is used to determine how the Incident is managed by support staff.
For all RT queues that hold tickets involving client interaction, two Custom Fields are applied and made mandatory for all tickets.
- Classification - This field designates whether the ticket is a Request or an Incident.
- Service - This field provides a drop-down, searchable list of all ACCC services and requires that a single service be selected.
At the highest level, the priority of an incident is determined based on the impact to the University and the urgency of restoration. The descriptions provided below are intended to provide very objective guidelines for assessing the priority of incidents. The goal of these definitions is that every individual in ACCC interprets each standard in the same manner to remove as much subjectivity as possible. In this way, we can ensure that a Priority 1 (Major Incident) ticket means the same thing to everyone at all levels and in all teams of ACCC. Standardizing these definitions ensures the most efficient use of resources and fairness across the entire support spectrum.
Client Knowledge of Priority
Clients should not request an upgrade or downgrade in priority; they should simply provide the most accurate impact and urgency information to the analyst who, using the definitions provided, assigns the impact and urgency to the interaction. End users should not be told the priority of tickets since they may not understand the priority determinations. Furthermore, promoting a ticket based upon an end user’s ‘emotion’ negatively impacts those resources which might otherwise be needed elsewhere.
Standardizing definitions and making them as objective as possible represents the most equitable distribution of resources and promotes a level of fairness for all clients regardless of a client’s status or emotion. If a client is unhappy about the timeliness of an issue, the requested resolution date can be altered without affecting the impact, urgency, or priority.
The below matrix is used to assign a Priority to each incident, based on the level of Urgency and Impact. Utilize the definitions of each level of Urgency and Impact to identify where in the matrix each incident falls. You can download a one-page reference for this process here.
|Priority||Target Resolution Time|
|Elevated||1 business day|
|Standard||Per Service SLA|
Urgency is the level of business criticality, and reflects the time available for resolution:
- Imminent threat to public/life safety - OR -
- The incident is currently impeding UIC's ability to meet academic, research, or operational goals - OR -
- Service restoration must be completed immediately or significant loss of reputation, revenue, or productivity will occur.
- Possible threat to public/life safety - OR -
- The incident threatens UIC's ability to meet academic, research, or operational goals - OR -
- Service restoration must be completed promptly or significant loss of reputation, revenue, or productivity will occur.
- No threat to public/life safety - OR -
- The incident does not threaten UIC's ability to meet academic or operation goals - OR -
- Service restoration should be completed as per defined Service Level Agreement
Impact is the degree of effect on service levels or business processes:
- Entire campus affected - OR -
- Infrastructure affecting multiple services - OR -
- Widespread degradation or interruption of an Essential service (such as instruction in multiple classrooms)
- Multiple departments, Single entire Unit/College, or greater than 50 clients affected - OR -
- Limited degradation or interruption of an Essential service (such as instruction in a single classroom)
- Limited number (less than 50) of clients affected - OR -
- Degradation or interruption of a Value-Add service(such as Recharge Stations)
For all RT queues that hold tickets involving client interaction, utilization of the Priority field (not the Final Priority field) on each ticket in RT has been standardized. ACCC staff will follow the Incident Prioritization guidelines found above, and fill in the Priority level of 1 through 5 in the RT ticket as identified through use of Urgency and Impact levels in the Priority Matrix.
4.0 - Initial Diagnosis
Description: The Service Desk performs initial diagnosis for any Incidents reported to them to determine if they can resolve it or if it needs escalation.
5.0 - Functional and Hierarchic Escalation
Description: As soon as it becomes clear that the Service Desk is unable to resolve the Incident, it must be escalated to the appropriate support team. High impact and/or urgent Incidents may need to be escalated to management, even if just for notification purposes. Incidents may also be escalated to management if they are taking too long to resolve or if management authorization is needed to resolve the Incident.
6.0 - Investigation & Diagnosis
Description: Investigation and diagnosis is performed by higher skilled support staff with a primary focus on resolving the incident. This may become an iterative step requiring multiple support groups and possibly multiple escalations. Incident investigation is not concerned with the search for a root cause.
7.0 - Resolution & Recovery
Description: After successful execution of the resolution or some acceptable work around, service recovery actions can be carried out. All events and actions during the resolution and recovery activity should be documented in the Incident record so that a full history is maintained.
8.0 - Incident Closure
Description: The group or person closing the incident should ensure that:
- Details of the actions taken to resolve the Incident are documented
- Classification is complete and accurate
- Resolution/action is agreed with the customer – if possible
- All details applicable to this phase of the Incident control are recorded, such that:
- The customer is satisfied
- The time spent on the Incident is recorded
- The person, date and time of closure are recorded
9.0 - Major Incident Procedure
Description: Major Incidents follow the Document 75015 is unavailable at this time. with shorter timelines and greater urgency. The decision criteria for Major Incidents is defined, agreed and documented within the our Incident Management prioritization model.
10.0 - Ownership, Monitoring, Tracking, Communication
Description: The Service Desk is responsible for owning and overseeing the resolution of all outstanding Incidents, whatever the initial source, by:
- Regularly monitoring all open Incidents for status, progress towards resolution and service level commitments
- Noting Incidents that move between different specialist support groups, as this may be indicative of uncertainty and, possibly, a dispute between support staff.
- Giving priority to monitoring high-impact Incidents
- Keeping affected clients informed of progress
- Checking for similar Incident