Nagios

Introduction to Nagios.

Nagios (https://nagios.doit.wisc.edu) is an enterprise-class monitoring tool used by DoIT for hosts in our data center.

System administrators use, setup, and modify monitoring in Nagios; they have complete access to it.  Application folks work with their sysadmin or the monitoring team (monitoring@doit.wisc.edu) when monitoring changes are needed, and can get access to the Nagios user interface.

There are host and service checks in Nagios. Host checks indicate if a host is up or down, and service checks indicate if a system service (e.g. a process) is ok or not.  Services monitored include:

  • Application services:  disk space, processes, URL monitoring, and application log monitoring, etc.
  • Operating System services:  disk space, processes, CPU usage, swap file usage, and system log file monitoring, etc.


If you have existing checks for an application, it's relatively easy for Nagios to "hook" into these.

DoIT's instance of Nagios Core was enhanced in several ways over the standard release including:

  • Enhanced logfile monitoring allows centrally-managed global, group (e.g. OS-specific) and host-specific configurations (e.g. pattern/exception strings).  Can also monitor application logs.  There's a way to clear alerts on the host (nclear). The pattern/exception strings are exactly the same as what Loggle uses.
  • Added email interface for certain operations: schedule/delete downtime, acknowledge/remove ack, disable/enable a check.
  • Use a RAM disk for speed for certain I/O-intensive files.
  • Added the ability to get live status/configuration info from Nagios using the command-line, which enables us to check host/service data in aggregate, e.g. warn when 30% of hosts take longer than 3ms to respond to a PING.
  • Most notifications are promoted to WiscIT for further filtering and possible promotion to the 24x7 staff.


Keywords:
nagios pnp4nagios monitor event management 
Doc ID:
62723
Owned by:
Sarah M. in Event Management and Monitoring
Created:
2016-04-14
Updated:
2025-12-15
Sites:
DoIT Staff, Event Management and Monitoring, Systems & Network Control Center, Systems Engineering