Operational Excellence Assessment Guidance 07 What Is Your Emergency Operations Practice
An effective emergency operations practice supports you in identifying and dealing with anomalies across reliability, security, and performance. It comprises instructions to triage incidents, communicate with stakeholders and identify root causes to continuously improve the workload management.
The team uses a reliable alerting and notification system. They communicate using pre-agreed upon venues, such as Microsoft Teams. A formal ticket tracking solution, such as an IT service management tool or product backlog, maintains incident history.
**NOTE - **
Your incident response plan provides details about the roles, responsibilities, and processes to react to incidents. It determines who is the incident response manager in charge and who is responsible for overseeing the incident response procedures.
**NOTE - **
The severity of incidents is crucial to determine how your team responds to them. There is a shared understanding of the classification criteria and response expectation across all team members and business stakeholders.
**NOTE - **
Your teams know when and how to inform decision makers and business stakeholder to proactively inform them about severe incidents and request decisions or support to deal with them. The team also knowns how to contact organizational or external support channels.
**NOTE - **
[X] We understand that there can be different sources for incident reports, but our team handles incidents consistently.
Your team consistently reacts to incidents no matter how they have been reported, such as through tooling, alerting, users, or informal communication.
**NOTE - **
The service-level agreements (SLAs) might dictate that you have to issue RCA reports within a certain time period after the incident is resolved. The incident owner should create the RCA reports. If that’s not possible, another person who worked closely with the incident owner can create the RCA reports. This strategy ensures an accurate accounting of the incident
**NOTE - **
By running post-mortem sessions after drilled and real incidents, the team explores what went well and where improvements are required in the process. You ensure these sessions are not blaming individuals or teams and there is an actionable list of (organizational or technical) follow-ups to improve the procedure.
**NOTE - **