FIDO: process watchdog
FIDO: process watchdog
On the campus FIDO instance, multiple times it has been observed that the fido_snmp non blocking call to Net::SNMP::snmp_dispatcher() doesn't return. In 2013, I introduced the non-standard module AnyEvent::SNMP which replaces Net::SNMP's event loop. Many months went by but the problem recurred.
/etc/cron.d/ns-2m-FIDO-watchdog
*/2 * * * * root /usr/local/fido/bin/fido_watchdog.pl 2>&1 | /usr/bin/logger -p daemon.warning -t fido_watchdog.pl.logger
The fido_watchdog opens the latest FIDO status file and looks for tests that have gone STALE that and have restart instructions. If conditions have been met, the watchdog attempts to restart the stalled processes.
As of 2017/08, restart instructions are in place for several tests