DoIT Monitoring Services (Overview)
Service Description
DoIT Event Management and Monitoring Service is available free of charge to UW Madison Campus customers.
We maintain an instance of the Nagios XI application, which we use to schedule and run scripts that return values for OK, WARNING, CRITICAL, and UNKNOWN, represented by digits 0-3. When a target changes state, we send an alert.
Monitoring Alerts
Monitoring alerts are directed to WiscIT, for distribution of notifications. See WiscIT Event Notifications for a description of Event Management and Monitoring alerts as directed out of the CMDB.
WiscIT notifications can be flagged for monitoring 24x7 by SNCC systems operators, if the service participates in the DoIT Operational Framework (All Sections). The systems operator has available relevant contact information for system administrators and technicians; he/she is also in contact with the DoIT Help Desk. When alarms and notifications are observed, many different actions may be taken by the systems operator depending on the severity of the alert, customers affected, instructions given with the alert, etc. Alarms and actions taken are all customizable for every application or server.
Server Monitoring
DoIT monitors the servers for which it has operational responsibility. This monitoring is done via several applications maintained internally by DoIT's System Engineers, in addition to Nagios XI, maintained by Event Management and Monitoring Team.
These monitors generally breakdown into two separate categories:
Server Monitors
We use either of two Nagios agents, NCPA and / or NRPE, which live on the customer host. These agents, when deployed by DoIT's System Engineers, are modified to only accept preset standardized queries for the benefit of their management and administration, such as:
- Hardware monitoring (on/off, CPU usage, drive capacity, etc.)
- Process monitoring (checks if certain applications are running)
- Logfile monitoring (scan specified files, like syslog, for certain conditions)
These monitors are directed to the server Configuration Item in the CMDB, and the alerts are available for subscription by interested parties. For the most part, these alerts are deployed and managed in bulk and are not subject to customization.
Nagios Server Agent Configuration
Application Monitors
The server agent clients can accept server process queries on behalf of their customer owners. These might be hardware or process monitors as above, with parameters specific to the interests of the application or service administrators. These alerts should be directed to application or service Configuration Items in the CMDB.
In general, most of the Nagios Plugins should be considered to be available to run from the client, by means of these server agents. Enterprise environment security policies and best practices will usually preclude simple deployment directly from Nagios XI.
ANY PROCESS can be run locally, by a script that returns a numeric value 0-3 and corresponding optional line of text. The Event Management and Monitoring Team can offer limited advice and technical assistance in suggesting and writing such scripts. Once a process control script is working, the server customer owner would (work with their system administrator to) add its reference entry to the agent configuration file, and Nagios XI can take up the schedule of running the script and processing its output for distribution of state change alerts.
In addition to monitoring, the application is able to schedule processes to be done on the server and respond to alarms (e.g. restart an application that has ended).
Application Monitoring
We use Nagios Plugins to check the functionality of services by attempting to access them remotely. These probes can be configured to check if something is available or to look for a specific response. Most commonly, we use check_http to verify that specific text loads to a given web page. Nagios URL Monitoring
We have customized some of these plugins to the UW enterprise environment for regular use:
- SSL certificate expiration monitoring (checks expiration date)
- Oracle database monitors (requires a limited account for our user to "select 'x' from dual")
- Emulation login monitoring: a special User with restricted permissions can, in some cases, carry out NetID sign-on and verify text at completion of that process. This feature is subject to a number of caveats.
How to Create a New Monitored Event
See Event Monitor Request for DoIT Supported Services (Procedure)