FIDO: 'not found' alarms from the fido_snmp test

FIDO: 'not found' alarms from the fido_snmp test

When interacting with a device via SNMP, the snmp client receives OID and instance information.  An OID can be thought of as an entry point into information of a certain class.  For example, .1.3.6.1.2.1.2.1 is the OID for the ifTable, which contains information about interfaces.  An instance is an arbitrary number that is mapped to humanly readable data such as interface name.

These bindings are not necessarily persistent. When the configuration is changed [software or hardware], or if a device reboots, these bindings can change.  The behavior is dependent from device to device.  Until caching applications 'relearn' these bindings by repolling, there can be a disconnect between device reality and an application's SNMP instance cache.

Many applications that interact with devices via SNMP cache SNMP instance information to improve performance, including FIDO. FIDO will let you know this because it will paint the row a yellow/green color and list the state as 'not found' instead of 'down'.

To help quickly clear out less useful alarms, the fido_snmp daemon will schedule a repoll of a device if it notices that the sysUptime counter has reset.


Copy/paste from 2024/01/16 ms-teams chat:

Setting to impact 3 and adding a note "interface has no description" is a toggable feature.  This default was introduced quite some time ago [>1yr+] and was introduced as a way to help the AS3128 noc differentiate between something that needed escalation or not.

Reporting admin down or "not found" is alarms is also a toggle feature, although that feature's behavior has been basically changed over the last 10+ years.

If AS59 wants to remove either feature, let me know.  The first feature is done by removing [or commenting out] the below.  The second feature, I would want to have follow up conversations of desired behavior. 

from: /home/net/ns-ansible-uwmadison/ansible/files/fido/sitelocal_config, see line 492-511, rule 100000.

492    ########################
493    #       decent defaults#
494    ########################
495 
496   ################## setting impact
497 
498   # if this interface is so unimportant that it doesn't have a description, its an impact 3 with 15 minute holddown
499   # this rule is essentially required for Juniper LACP bundles
500   100000:
501     fido_impact:
502       reason: interface has no description
503       value: '3'
504     matches:
505       '10':
506         key_match: test
507         equal: ifOperStatus
508       '20':
509         key_match: ___infohash___Descr
510         undefined: ''
511 


As of 2017 is has observed that the FIDO SNMP daemon can fail in an odd way.  As described in https://jira.doit.wisc.edu/jira/browse/NS-2643 a varied assortments of devices and snmp tests suddenly returning "not found" may require a manual restart of fido_snmp.  As of 2020/04/03 there is an automatic watchdog to perform this restart if the situation occurs.


Keywords:
FIDO: 'not found' alarms from the fido_snmp test 
Doc ID:
38256
Owned by:
Michael H. in Network Services
Created:
2014-03-08
Updated:
2024-01-16
Sites:
Network Services, Systems & Network Control Center