Skip to main content
Gå til innhold

Severity - best practice

This is a guideline on how to set trigger severity. Zabbix supports the following trigger severities:

SEVERITYDEFINITIONCOLOUR
Not classifiedUnknown severity.Grey
InformationFor information purposes.Light green
WarningBe warned.Yellow
AverageAverage problem.Orange
HighSomething important has happened.Red
DisasterDisaster. Financial losses, etc.Bright red

More about definition:

Not classified

Avoid to use this severity on zabbix triggers that are in production. Can be used during implementing/testing new triggers.

Information

Use this severity in cases where you want to be informed when "something has changed" and this change (isolated) has no impact. Like:

  • Zabbix server do not have connection (or do not get data) from a host with zabbix agent.
  • Host is rebooted.
  • Version on OS/Application has changed.

Can also be used as "proactive first warning" severity. Like:

  • SSL certificate is expiring in xxx days.

No action required.

Warning

Use this severity in cases where a sub-component with limited impact has stopped or "proactive last warning". Like:

  • yyy process is not running on host <host>.
  • SSL certificate is expiring in yyy days.
  • Cannot connect to SSH on host <host>.

No action required for UDS "on call".

Average

Use this severity in cases where a sub-component with possible high impact is down/do not respond. Like:

  • One or more NetApp power supplies is faulty.
  • Node cannot takeover (NetApp Cluster)

No action required for UDS "on call".

High

Use this severity where a internal service/component is down, where a critical redudant component (in a external service) is down or where a not high-impact external service is down.

Like:

  • NOVA portal is down
  • Node do not respond to ping (isolated).
  • 1 of 2 Nova GW is down.
  • www.uninett.no is down (Since this is a information web-site this service is not a high-impact service).

Action for UDS "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement.

Disaster

Use this severity in cases where a high impact service/component is down. Critical external service or critical internal infrastructure. Like:

  • Feide service is down.
  • Agora web-site is down.
  • NOVA platform is down / NetApp is down.

Immediate action for UDS "on call". Should take action on triggers with this severity in terms of solving the issue if documentation or personal knowledge exist or escalate the incident to responsible person/departement. Escalation should be done by direct communication with responsible person/departement.