Major Incident Process and Procedure

A major incident is a highest-impact, highest-urgency incident that affects a large number of users, impacting one or more critical services. Given the urgency of the situation, a well-coordinated response process is required to accelerate the resolution and minimize the business impact.

Definition:

Major Incident Goals:

Minimize the impact of service interruptions.
Ensure that a Major Incident Command Center is mobilized to manage a major incident.
Ensure that stakeholders are well-informed of service interruptions, degradations, and resolutions.
Conduct PIR.
- Analyze the incident and understand what can be done to prevent a similar incident in the future. This review also provides an opportunity to evaluate the incident response process and identify areas for improvement.
Identify root cause via Problem Management process.

Major Incident Identification:

Major Incidents are classified as such via two processes:

A DoIT Director/Deputy CIO/CIO/CISO declares an incident as “Major”.
- Any DoIT staff can propose an incident as a Major Incident Candidate.
- Major Incident Candidates can be brought directly to management (Director level or higher) or to the Incident/Problem Manager for expedited review and subsequent declaration of Major Incident.
Service Owners, with Director approval, can specify rules and/or criteria to immediately classify an incident as “Major”.
- Note: At this time DoIT does not have such criteria in place.

Major Incident Commander responsibilities:

Mobilize Major Incident Command Center (MICC) upon discovery of Major Incident
- Establish the Incident Commander (IC); publicize who that individual is to all individuals who need to know.
- Create MS Teams Chat for MICC
  - Naming convention: Major Incident- [Date] - [Incident Description]
  - Use best judgment to invite relevant parties:
    - Directors and Associate Directors
    - CIO/Deputy CIO/CTO/CISO
    - Situation manager
    - Cybersecurity
    - Technologists/Engineers
    - Help Desk
    - ITSM Problem Manager
    - Communications
    - SNCC
    - Building Manager
    - Administrative Manager/Assistant
Request group to invite others as needed while understanding that too many becomes problematic, but too few and we don’t have appropriate representation.

Create and share Google Doc for Major Incident Notes and Documentation

Verify Outage content is accurate and appropriate
- Establish cadence for future updates
- Request SNCC uses ‘Send Notification’ function for all Outage updates
- In collaboration with Communications
Establish cadence for check ins with MICC
- Check ins can be Video or Chat, as needed
- Incident Commander will facilitate check in meetings
- Check in agenda template:
  - Technical response update- Situation Manager
  - Help Desk Update- HD Associate Director
  - Cybersecurity update- CISO/Incident Response Team Lead
  - Communications update- Comms Lead
  - Identify next steps- Incident Commander
Campus Outreach
- DoIT Communications will take lead in coordinating all campus outreach, including messaging to campus and assigning responsibilities to others.

Keywords:

Major Incident Problem Management Change ITSM Operational Framework Situation Response Commander

Doc ID:

147554

Owned by:

Ramsay B. in ITSM

Created:

2025-01-16

Updated:

2025-01-16

Sites:

DoIT Help Desk, DoIT IT Service Management, DoIT Staff, Systems & Network Control Center

0 0 Comment Suggest new doc