Major Incident Process and Procedure

A major incident is a highest-impact, highest-urgency incident that affects a large number of users, impacting one or more critical services. Given the urgency of the situation, a well-coordinated response process is required to accelerate the resolution and minimize the business impact.

Definition:

A major incident is a highest-impact, highest-urgency incident that affects a large number of users, impacting one or more critical services. Given the urgency of the situation, a well-coordinated response process is required to accelerate the resolution and minimize the business impact.

Major Incident Goals:

  • Minimize the impact of service interruptions.

  • Ensure that a Major Incident Command Center is mobilized to manage a major incident.

  • Ensure that stakeholders are well-informed of service interruptions, degradations, and resolutions.

  • Conduct PIR. 

    • Analyze the incident and understand what can be done to prevent a similar incident in the future. This review also provides an opportunity to evaluate the incident response process and identify areas for improvement.

  • Identify root cause via Problem Management process.

Major Incident Identification:

Major Incidents are classified as such via two processes:

  • A DoIT Director/Deputy CIO/CIO/CISO declares an incident as “Major”.

    • Any DoIT staff can propose an incident as a Major Incident Candidate. 

    • Major Incident Candidates can be brought directly to management (Director level or higher) or to the Incident/Problem Manager for expedited review and subsequent declaration of Major Incident.

  • Service Owners, with Director approval, can specify rules and/or criteria to immediately classify an incident as “Major”.

    • Note: At this time DoIT does not have such criteria in place.

Major Incident Commander responsibilities:

  • Mobilize Major Incident Command Center (MICC) upon discovery of Major Incident

    • Establish the Incident Commander (IC); publicize who that individual is to all individuals who need to know.

    • Create MS Teams Chat for MICC

      • Naming convention: Major Incident- [Date] - [Incident Description]

      • Use best judgment to invite relevant parties:

        • Directors and Associate Directors

        • CIO/Deputy CIO/CTO/CISO

        • Situation manager

        • Cybersecurity

        • Technologists/Engineers

        • Help Desk

        • ITSM Problem Manager

        • Communications

        • SNCC

        • Building Manager

        • Administrative Manager/Assistant

  • Request group to invite others as needed while understanding that too many becomes problematic, but too few and we don’t have appropriate representation.

  • Create and share Google Doc for Major Incident Notes and Documentation

  • Verify Outage content is accurate and appropriate

    • Establish cadence for future updates

    • Request SNCC uses ‘Send Notification’ function for all Outage updates

    • In collaboration with Communications

  • Establish cadence for check ins with MICC

    • Check ins can be Video or Chat, as needed

    • Incident Commander will facilitate check in meetings

    • Check in agenda template:

      • Technical response update- Situation Manager

      • Help Desk Update- HD Associate Director

      • Cybersecurity update- CISO/Incident Response Team Lead

      • Communications update- Comms Lead

      • Identify next steps- Incident Commander

  • Campus Outreach

    • DoIT Communications will take lead in coordinating all campus outreach, including messaging to campus and assigning responsibilities to others.



Keywords:
Major Incident Problem Management Change ITSM Operational Framework Situation Response Commander 
Doc ID:
147554
Owned by:
Ramsay B. in ITSM
Created:
2025-01-16
Updated:
2025-01-16
Sites:
DoIT Help Desk, DoIT IT Service Management, DoIT Staff, Systems & Network Control Center