Major Incident Process and Procedure
Definition:
A major incident is a highest-impact, highest-urgency incident that affects a large number of users, impacting one or more critical services. Given the urgency of the situation, a well-coordinated response process is required to accelerate the resolution and minimize the business impact.
Major Incident Goals:
-
Minimize the impact of service interruptions.
-
Ensure that a Major Incident Command Center is mobilized to manage a major incident.
-
Ensure that stakeholders are well-informed of service interruptions, degradations, and resolutions.
-
Conduct PIR.
-
Analyze the incident and understand what can be done to prevent a similar incident in the future. This review also provides an opportunity to evaluate the incident response process and identify areas for improvement.
-
-
Identify root cause via Problem Management process.
Major Incident Identification:
Major Incidents are classified as such via two processes:
-
A DoIT Director/Deputy CIO/CIO/CISO declares an incident as “Major”.
-
Any DoIT staff can propose an incident as a Major Incident Candidate.
-
Major Incident Candidates can be brought directly to management (Director level or higher) or to the Incident/Problem Manager for expedited review and subsequent declaration of Major Incident.
-
-
Service Owners, with Director approval, can specify rules and/or criteria to immediately classify an incident as “Major”.
-
Note: At this time DoIT does not have such criteria in place.
-
Major Incident Commander responsibilities:
-
Mobilize Major Incident Command Center (MICC) upon discovery of Major Incident
-
Establish the Incident Commander (IC); publicize who that individual is to all individuals who need to know.
-
Create MS Teams Chat for MICC
-
Naming convention: Major Incident- [Date] - [Incident Description]
-
Use best judgment to invite relevant parties:
-
Directors and Associate Directors
-
CIO/Deputy CIO/CTO/CISO
-
Situation manager
-
Cybersecurity
-
Technologists/Engineers
-
Help Desk
-
ITSM Problem Manager
-
Communications
-
SNCC
-
Building Manager
-
Administrative Manager/Assistant
-
-
-
-
Request group to invite others as needed while understanding that too many becomes problematic, but too few and we don’t have appropriate representation.
-
Create and share Google Doc for Major Incident Notes and Documentation
-
Verify Outage content is accurate and appropriate
-
Establish cadence for future updates
-
Request SNCC uses ‘Send Notification’ function for all Outage updates
-
In collaboration with Communications
-
-
Establish cadence for check ins with MICC
-
Check ins can be Video or Chat, as needed
-
Incident Commander will facilitate check in meetings
-
Check in agenda template:
-
Technical response update- Situation Manager
-
Help Desk Update- HD Associate Director
-
Cybersecurity update- CISO/Incident Response Team Lead
-
Communications update- Comms Lead
-
Identify next steps- Incident Commander
-
-
-
Campus Outreach
-
DoIT Communications will take lead in coordinating all campus outreach, including messaging to campus and assigning responsibilities to others.
-