DoIT Monitoring Services (Overview)

Overview of the monitoring capabilities DoIT provides on the servers for which it has operational responsibility.

Event Management and Monitoring

DoIT monitors the servers for which it has operational responsibility. This monitoring is done through several applications that send alarms and notifications that can be monitored 24x7 by SNCC systems operators. The systems operator has available relevant contact information for system administrators and technicians; he/she is also in contact with the DoIT Help Desk. When alarms and notifications are observed, many different actions may be taken by the systems operator depending on the severity of the alert, customers affected, instructions given with the alert, etc. Alarms and actions taken are all customizable for every application or server. The alarms generally breakdown into two separate categories:

Remote Monitoring

To monitor remotely accessed servers, several applications are used. These applications use various configurable probes to check functionality of services by attempting to access them remotely. These probes can be configured to check if something is available or to look for a specific response.

Local monitoring

Monitoring is also done with applications running locally on the server DoIT is monitoring. These applications are able to check for several things:

  • Hardware monitoring (on/off, CPU usage, drive capacity, etc.)
  • Process monitoring (checks if certain applications are running)
  • Logfile monitoring (scan specified files, like syslog, for certain conditions)
  • SSL certificate expiration monitoring (checks expiration date)

In addition to monitoring, the application is able to schedule processes to be done on the server and respond to alarms (e.g. restart an application that has ended).

How to Create a New Monitored Event

See Event Monitor Request for DoIT Supported Services (Procedure)

Keywordsmonitor event new alarm alert SNCC server application service Event Management and Monitoring   Doc ID8454
OwnerSarah M.GroupEvent Management and Monitoring
Created2008-11-03 19:00:00Updated2022-12-24 17:11:35
SitesEvent Management and Monitoring, Systems & Network Control Center, Systems Engineering
Feedback  1   0