Severity - best practice
This is a guideline on how to set trigger severity. Zabbix supports the following trigger severities:
| SEVERITY | DEFINITION | COLOUR |
|---|---|---|
| Not classified | Unknown severity. | Grey |
| Information | For information purposes. | Light green |
| Warning | Be warned. | Yellow |
| Average | Average problem. | Orange |
| High | Something important has happened. | Red |
| Disaster | Disaster. Financial losses, etc. | Bright red |
More about definition:
Not classified
Avoid to use this severity on zabbix triggers that are in production. Can be used during implementing/testing new triggers.
Information
Use this severity in cases where you want to be informed when "something has changed" and this change (isolated) has no impact.
Can also be used as "proactive first warning" severity. Like:
- SSL certificate is expiring (e.g. in 30 days).
No action required.
Warning
Use this severity in cases where a sub-component with limited impact has stopped or "proactive last warning". Like:
- SSL certificate is expiring (e.g. in 7 days).
No action required for SSC "on call".
Average
Use this severity in cases where a sub-component with possible high impact is down/do not respond. Like:
- A redundant component is down but traffic continues on the remaining nodes.
- A database replica has fallen behind its primary.
No action required for SSC "on call".
High
Use this severity where a internal service/component is down, where a critical redundant component (in a external service) is down or where a not high-impact external service is down. Like:
- A single-instance internal service is down.
- The last redundant component of a service is degraded.
- A non-critical external website is unavailable.
Action for SSC "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement.
Disaster
Use this severity in cases where a high impact service/component is down. Critical external service or critical internal infrastructure. Like:
- A user-facing authentication or SSO service is down.
- A core database cluster is unreachable.
- All redundant instances of a critical service have failed.
Immediate action for SSC "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement. Escalation should be done by direct communication with responsible person/departement.