Email Nagios

How to interact with DoIT Nagios via email

Background and Details

Nagios XI offers the ability to configure Inbound Email Commands, but uses POP3/IMAP, which protocols cannot access the University's Microsoft Exchange to read messages. In 2011, when we were running on Nagios Core, we built a custom Perl library to accept Nagios command line interactions from email or SMS. That library reached end-of-life with the migration of the Nagios XI application to RHEL9 servers in 2024. We have recreated the downtime and acknowledge functionalities with new code.

For each message: 

  • If the email subject contains "Automated Reply" (case-insensitive), it logs a debug message and does nothing. Otherwise, it extracts the host and service information from the email subject, using regular expressions.
  • If the subject indicates a problem with a service, formatted like "PROBLEM: [host]/[service] is [status]" where status can be WARNING, CRITICAL, or UNKNOWN, it extracts the host and service names.
  • If the subject contains a host and service combination, formatted like "[host]/[service]", it extracts the host and service names.
  • If the subject indicates a problem with a host, formatted like "PROBLEM: [host] is [status]" where status can be DOWN or UNREACHABLE, it extracts the host name.
  • If the subject contains only a host name, formatted like "[host]", it extracts the host name.

    Command Processing:

  • Acknowledge Alert: If the email body contains "ack" or "acknowledge", it extracts the comment and sends a POST request to acknowledge the alert in Nagios.
  • Schedule Downtime: If the email body contains "downtime", it extracts the downtime details, calculates the start and end times, and sends a POST request to schedule downtime in Nagios.

Back to top

Formats and Examples

Do not Reply to Nagios alert notification messages. Email nagiosxi@doit.wisc.edu with commands formatted according to these examples.

You are encouraged to provide a comment (e.g. more information) after the acknowledge or downtime commands. This will be displayed in the Nagios XI Service Status Detail window. 

Subject Formats:

Host Only:

  1. [host]
  2. PROBLEM: [host] is [status]

Examples:

    1. gidney.doit.wisc.edu
    2. PROBLEM: biocomp2.doit.wisc.edu is DOWN

Service Checks:

  1. [host]/[service]
  2. PROBLEM: [host]/[service] is [status]

Examples:

    1. warhawk.doit.wisc.edu/l_load
    2. PROBLEM: gidney.doit.wisc.edu/l_disk_app1 is CRITICAL

Downtime Format:

The email body should follow this pattern:

downtime -t [start_time] -l [length] -c [comment]

    -t [start_time]: Specifies the start time of the downtime. The format should be dd-MMM@HH:MM (e.g., 13-Feb@05:00).
    -l [length]: Specifies the length of the downtime in hours (e.g., 1 for 1 hour).
    -c [comment]: Provides a comment or reason for the downtime (e.g., 'Patching').

Example:

downtime -t 13-Feb@05:00 -l 1 -c 'Patching'

Back to top

Examples:

To acknowledge an alert, which will silence it until the next change of state:

Subject: warhawk.doit.wisc.edu/l_load

Body: ack Load testing for next 2 hours

To proactively schedule a 4 hour downtime for gidney:

Subject: gidney.doit.wisc.edu

Body: downtime -t 24-Feb@15:00 -l 4 -c 'Routine patching.'

Back to top



Keywords:
Event Management and Monitoring 
Doc ID:
122519
Owned by:
Sarah M. in Event Management and Monitoring
Created:
2022-11-15
Updated:
2025-03-12
Sites:
Event Management and Monitoring, Systems Engineering