Whether you are an IT professional or the beneficiary of IT support, you’ve probably experienced the incident management process, a specific process from the IT service management (ITSM) and IT Information Library (ITIL) frameworks. This might have been a formal ITIL incident management process flow where you submitted a ticket to fix the copier, or perhaps you took incident management into your own hands and tried turning it off and on again. (A surprisingly effective workaround for almost everything.)
When you take a closer look at your ITSM processes, ITIL incident management plays a vital role in the day-to-day operations of organizations large and small. To ensure your IT support team is most effective, implement a clear process flow from the incident report to resolution. ITIL offers a thorough framework, from which you can follow or borrow from to create your own IT and incident management processes.
What is incident management?
ITSM and ITIL define an incident as an unplanned interruption or quality reduction of normal service, which can include anything from a broken printer to an app that won’t load (or loads too slowly). Bottom line: incidents mean something is broken or needs fixing.
That is where a good ITIL incident management process comes in. Incident management teams are the frontline support when incidents occur. They are the IT firefighters. Their role is to identify and repair incidents to restore the defined service levels as quickly as possible.
While the IT help desk can conduct incident management via email with the user and other stakeholders, the best incident management teams work through a dedicated process flow with a formal ticketing system.
Incidents vs. problems
Incident management is a close cousin of problem management, another ITSM process, but the two terms are not interchangeable. An incident is an unplanned interruption of normal services. A problem is the underlying cause of the incident or series of incidents.
The distinction is important to note because the goal of incident management is focused on the user level: restoring normal service as quickly as possible. As the IT firefighters, incident management specialists focus on putting the fires out rather than asking how the fires started. As you problem solve and develop your problem management protocols, note that problem management specifically digs down into the root cause of the incident (or incidents) with the goal of preventing future incidents from occurring.
ITIL incident management process flow
The best incident management teams rely on a clear process with defined steps to work through each incident. The approach may vary slightly between organizations, teams, and and how rigidly you follow the ITIL framework, but most follow the same basic path to resolution. Use the following steps to create your own incident management runbook.
The catalyst for incident management is when an end user, monitoring system, or IT specialist reports an interruption. Notifications can come via email, phone, in person, or automated notices from the system.
At this point, the help desk should record and identify the incident: Is it an incident or a service request? Service requests are handled differently than incidents and should be handed off to the request fulfillment team (or processed through the request workflow).
2. Incident logging
Once the team identifies the incident, they can then log the incident as a ticket with the following information:
- User name and contact info
- Description of the issue
- Date and time of the report
The more detailed you can be, the better. Robust data collection can help your problem management team identify patterns among incidences to improve root cause analysis efforts.
Write an effective bug report that actually gets resolved.Learn how
Additionally, by recording detailed information surrounding each incident, you can create better models and categories to organize your incident data.
3. Incident categorization and prioritization
Effective incident categorization streamlines incident logging, reduces redundancy, and speeds up the resolution process. Proper organization allows service staff can make more informed service decisions, quickly identifying whether an incident is a known and easily resolvable issue, or a problem that requires escalation.
Incident categorization is typically multi-level and involves three to four levels of hierarchical granularity. For example, your first level may be hardware, and the fourth level may indicate a card failure. Categorizations will vary significantly depending on your organization, but there are some general ITIL strategies to use to help each business establish or revise applicable category schemes.
Simply research ITSM and ITIL multi-level incident categorization steps.
After you’ve logged the ticket, the incident needs to be classified and prioritized to determine how the issue is handled (and by whom). Categorization helps the team sort and model the incidents more easily and also streamlines prioritization. Incidents are typically categorized by low, medium, or high priority.
For example, if an incident is categorized as a system outage, this might automatically escalate the incident to a higher priority. A categorization system also makes it easier for the problem management team to track and identify patterns between incidents, improving incident prevention.
Based on the categorization, you can determine how best to prioritize any given incident. Prioritization is a vital part of incident management because it directly affects your SLA response adherence. Rank incidents based on their urgency and their impact on end users and the business or organizational operations.
4. Initial diagnosis
Diagnosis (sometimes referred to as the response stage) often takes the longest. During this step, the team investigates the incident, particularly by describing the problem and running through their standard set of troubleshooting questions, and then develops an initial hypothesis for the issue.
The initial diagnostic phase can quickly bog down IT support in time-consuming research and investigation. Consider drafting a troubleshooting runbook or flowchart to streamline the investigation process and make it easier for your team to identify or eliminate possible causes. By breaking the process down into clear steps, your team will be able to run diagnostics more efficiently and resolve faster.
If they are not able to resolve the ticket based on their hypothesis and available resources, they will escalate the issue to the next level.
5. Functional and hierarchic escalation
At this point, the next level of technical support will continue investigating the issue, relying on their additional expertise or resources to find the right fix for the incident. Most of the time, the first-level support team at the help desk can successfully resolve incidents without the need for escalation.
In instances where escalation is needed, an incident can be escalated by means of functional and/or hierarchic escalation.
- Functional escalation: The process of a ticket/incident being routed to a higher level or more specialized team that can deliver the proper support in order to resolve the incident.
- Hierarchic escalation: A process—namely a communications/consultation process—where a manager or person of authority determines if greater resources should be assigned to resolve an incident.
6. Investigation and diagnoses
Incident investigation and diagnosis occur during the troubleshooting process. After receiving a ticket, the help desk employee will first identify and test an initial hypothesis based on the most likely cause of the issue.
After the incident is diagnosed, the support staff start working on the solution, such as patching software or replacing hardware.
7. Resolution and recovery
Once the team has nailed down the correct diagnosis, they can get to work fixing the issue. In this stage, the service desk will confirm the service has been properly restored.
8. Incident closure
When the incident is resolved, the service desk confirms the fix and closes the ticket. Be sure to confirm with the user who originally reported the incident that the service has been fully restored before closing the ticket.
Mapping the incident management process
Unfortunately, incidents are not a rare occurrence for most organizations. In fact, in a recent survey of 400 companies, Dimensional Research found that 32% of organizations experience one major incident at least once a month. And when major incidents can cost hundreds of thousands of dollars, you can’t afford to have a disorganized management process. Instead, optimize your ITSM processes with Lucidchart, kicking gaps, bottlenecks, and weaknesses to the curb.
Lucidchart helps IT support professionals collaborate across the ITSM lifecycle, from incident management and beyond. Work better together to resolve issues with cloud-based visual outlines and documentation, that help your IT staff track incident response actions through clear process mapping.
With an intuitive interface, you can quickly build out your process and ensure your team knows exactly what to do and when to do it, bringing value to customers with as few interruptions as possible.
Strengthen your service operations and service desk strategies with Lucidchart.Learn how
Lucidchart, a cloud-based intelligent diagramming application, is a core component of Lucid Software's Visual Collaboration Suite. This intuitive, cloud-based solution empowers teams to collaborate in real-time to build flowcharts, mockups, UML diagrams, customer journey maps, and more. Lucidchart propels teams forward to build the future faster. Lucid is proud to serve top businesses around the world, including customers such as Google, GE, and NBC Universal, and 99% of the Fortune 500. Lucid partners with industry leaders, including Google, Atlassian, and Microsoft. Since its founding, Lucid has received numerous awards for its products, business, and workplace culture. For more information, visit lucidchart.com.
How to Track AWS Status With Lucidchart
Millions of companies run their applications through AWS, but you need to remain vigilant and monitor AWS status in case the service experiences downtime or your application doesn't perform as well as it should. Learn how to create an AWS status dashboard.
Improving your organization’s web security testing
High-profile data breaches in the last few years have put data security at the forefront of political, tech, and business news. Learn what security testing is and how you can implement better testing processes to protect your organization and your users.