Email Nagios
Background and Details
Nagios XI offers the ability to configure Inbound Email Commands, but uses POP3/IMAP, which protocols cannot access the University's Microsoft Exchange to read messages. In 2011, when we were running on Nagios Core, we built a custom Perl library to accept Nagios command line interactions from email or SMS. That library reached end-of-life with the migration of the Nagios XI application to RHEL9 servers in 2024. We have recreated the downtime and acknowledge functionalities with new code.
For each message:
- If the email subject contains "Automated Reply" (case-insensitive), it logs a debug message and does nothing. Otherwise, it extracts the host and service information from the email subject, using regular expressions.
- If the subject indicates a problem with a service, formatted like "PROBLEM: [host]/[service] is [status]" where status can be WARNING, CRITICAL, or UNKNOWN, it extracts the host and service names.
- If the subject contains a host and service combination, formatted like "[host]/[service]", it extracts the host and service names.
- If the subject indicates a problem with a host, formatted like "PROBLEM: [host] is [status]" where status can be DOWN or UNREACHABLE, it extracts the host name.
- If the subject contains only a host name, formatted like "[host]", it extracts the host name.
Command Processing:
- Acknowledge Alert: If the email body contains "ack" or "acknowledge", it extracts the comment and sends a POST request to acknowledge the alert in Nagios.
- Schedule Downtime: If the email body contains "downtime", it extracts the downtime details, calculates the start and end times, and sends a POST request to schedule downtime in Nagios.
Formats and Examples
Do not Reply to Nagios alert notification messages. Email with commands formatted according to these examples.
You are encouraged to provide a comment (e.g. more information) after the acknowledge or downtime commands. This will be displayed in the Nagios XI Service Status Detail window.
Subject Formats:
Host Only:
- [host]
- PROBLEM: [host] is [status]
Service Checks:
- [host]/[service]
- PROBLEM: [host]/[service] is [status]
Downtime Format:
The email body should follow this pattern:
downtime -t [start_time] -l [length] -c [comment]
-t [start_time]: Specifies the start time of the downtime. The format should be dd-MMM@HH:MM (e.g., 13-Feb@05:00).
-l [length]: Specifies the length of the downtime in hours (e.g., 1 for 1 hour).
-c [comment]: Provides a comment or reason for the downtime (e.g., 'Patching').
downtime -t 13-Feb@05:00 -l 1 -c 'Patching'
To acknowledge an alert, which will silence it until the next change of state:
Body: ack Load testing for next 2 hours
To proactively schedule a 4 hour downtime for gidney:
Body: downtime -t 24-Feb@15:00 -l 4 -c 'Routine patching.'