Email Nagios
Background and Details
Nagios XI offers the ability to configure Inbound Email Commands, but uses POP3/IMAP, which protocols cannot access the University's Microsoft Exchange to read messages. In 2011, when we were running on Nagios Core, we built a custom Perl library to accept Nagios command line interactions from email or SMS. That library reached end-of-life with the migration of the Nagios XI application to RHEL9 servers in 2024. We have recreated the downtime and acknowledge functionalities with new code.
For each message:
- If the email subject contains "Automated Reply" (case-insensitive), it logs a debug message and does nothing. Otherwise, it extracts the host and service information from the email subject, using regular expressions.
- If the subject indicates a problem with a service, formatted like "PROBLEM: [host]/[service] is [status]" where status can be WARNING, CRITICAL, or UNKNOWN, it extracts the host and service names.
- If the subject contains a host and service combination, formatted like "[host]/[service]", it extracts the host and service names.
- If the subject indicates a problem with a host, formatted like "PROBLEM: [host] is [status]" where status can be DOWN or UNREACHABLE, it extracts the host name.
- If the subject contains only a host name, formatted like "[host]", it extracts the host name.
Command Processing:
- Acknowledge Alert: If the email body contains "ack" or "acknowledge", it extracts the comment and sends a POST request to acknowledge the alert in Nagios.
- Schedule Downtime: If the email body contains "downtime", it extracts the downtime details, calculates the start and end times, and sends a POST request to schedule downtime in Nagios.
Formats and Examples
Do not Reply to Nagios alert notification messages. Email nagiosxi@doit.wisc.edu with commands formatted according to these examples.
You are encouraged to provide a comment (e.g. more information) after the acknowledge or downtime commands. This will be displayed in the Nagios XI Service Status Detail window.
Subject Formats:
Host Only:
- [host]
- PROBLEM: [host] is [status]
Examples:
-
- gidney.doit.wisc.edu
- PROBLEM: biocomp2.doit.wisc.edu is DOWN
Service Checks:
- [host]/[service]
- PROBLEM: [host]/[service] is [status]
Examples:
-
- warhawk.doit.wisc.edu/l_load
- PROBLEM: gidney.doit.wisc.edu/l_disk_app1 is CRITICAL
Downtime Format:
The email body should follow this pattern:
downtime -t [start_time] -l [length] -c [comment]
-t [start_time]: Specifies the start time of the downtime. The format should be dd-MMM@HH:MM (e.g., 13-Feb@05:00).
-l [length]: Specifies the length of the downtime in hours (e.g., 1 for 1 hour).
-c [comment]: Provides a comment or reason for the downtime (e.g., 'Patching').
Example:
downtime -t 13-Feb@05:00 -l 1 -c 'Patching'
Examples:
To acknowledge an alert, which will silence it until the next change of state:
Subject: warhawk.doit.wisc.edu/l_load
Body: ack Load testing for next 2 hours
To proactively schedule a 4 hour downtime for gidney:
Subject: gidney.doit.wisc.edu
Body: downtime -t 24-Feb@15:00 -l 4 -c 'Routine patching.'