Severity - best practice
This is a guideline on how to set trigger severity. Zabbix supports the following trigger severities:
SEVERITY | DEFINITION | COLOUR |
---|---|---|
Not classified | Unknown severity. | Grey |
Information | For information purposes. | Light green |
Warning | Be warned. | Yellow |
Average | Average problem. | Orange |
High | Something important has happened. | Red |
Disaster | Disaster. Financial losses, etc. | Bright red |
More about definition:
Not classified
Avoid to use this severity on zabbix triggers that are in production. Can be used during implementing/testing new triggers.
Information
Use this severity in cases where you want to be informed when "something has changed" and this change (isolated) has no impact. Like:
- Zabbix server do not have connection (or do not get data) from a host with zabbix agent.
- Host is rebooted.
- Version on OS/Application has changed.
Can also be used as "proactive first warning" severity. Like:
- SSL certificate is expiring in xxx days.
No action required.
Warning
Use this severity in cases where a sub-component with limited impact has stopped or "proactive last warning". Like:
- yyy process is not running on host <host>.
- SSL certificate is expiring in yyy days.
- Cannot connect to SSH on host <host>.
No action required for UDS "on call".
Average
Use this severity in cases where a sub-component with possible high impact is down/do not respond. Like:
- One or more NetApp power supplies is faulty.
- Node cannot takeover (NetApp Cluster)
No action required for UDS "on call".
High
Use this severity where a internal service/component is down, where a critical redudant component (in a external service) is down or where a not high-impact external service is down.
Like:
- NOVA portal is down
- Node do not respond to ping (isolated).
- 1 of 2 Nova GW is down.
- www.uninett.no is down (Since this is a information web-site this service is not a high-impact service).
Action for UDS "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement.
Disaster
Use this severity in cases where a high impact service/component is down. Critical external service or critical internal infrastructure. Like:
- Feide service is down.
- Agora web-site is down.
- NOVA platform is down / NetApp is down.
Immediate action for UDS "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement. Escalation should be done by direct communication with responsible person/departement.