FIDO: Impact, Time of Day, Hold Down attributes

FIDO: Impact, Time of Day, Hold Down attributes

Alarms will be present in the FIDO database, but may be presented to the user in a different manner.  Optionally, alarms can be auto commented and correlated based on normal FIDO operation [such as topology or comment correlation].

Alarms are matched in many ways, either by the fully qualified alarm name, IP subnet, or alarm attribute.  valid operators are 'defined', 'equal' or 'match' [perl regexp].

You can determine the fully qualified alarm name this for an active alarm by examining the FIDO database, accessible via the web or CLI, or for an old alarm by examining the CLI fido_events log.  Here is an example of how you find the fully qualified alarm name from the fido_events.log file.  The key/value pair of importance is the 'UnmodifiedItem' key/value.

Jan 14 10:35:03 nibbler fido.pl[15107]: {"_log_line":{"parent":"20140114.409","length":19,"unmodified_item":"s-ex4200-lab-24f.wiscnet.net jnxOperatingState Fan 1-jnxOperatingState","item":"s-ex4200-lab-24f.wiscnet.net jnxOperatingState Fan 1","action":"Add","_alarm_data":{"info":null,"failures":1,"test":"jnxOperatingState","event_id":"20140114.410","time":1389717284,"status":"OK","file":"snmp","device":"s-ex4200-lab-24f.wiscnet.net","state":"runningAtFullSpeed","iso":4,"snmp_instance":"4.1.1.1","updated":1,"needed_failures":"1","start":1389717284,"start_text":"2014/01/14 10:34:44"},"event":"20140114.410"}}

  • A <valid> tag can be used to describe when the exception should be considered invalid.  As of 2014/01/17, only 'start' and 'end' are valid tags and are in reference to time.
  • <match> and <key_match> can have multiple values.  An n^2 like search is used to find a match and the evaluation is treated as an OR
  • All subrules under <matches> must match [treated as an AND]


Examples:

* For time of day ignoring, you can list multiple time ranges, comma separated.  The 'ParseDate' module is used.
* Holddown time is in minutes or absolute time.  Absolute time takes priority over a minute based holddown.
* Impact values are:
1 = high priority
2 = normal priority
3 = low priority
4 = informational


RCS/fido_attributes.config,v  -->  standard output

IPs are matched against a trie

<ip>
   <2600:1d00::53:1000>
        <valid>
start = 8am
end = 9am
</valid>
        <fido_holddown>
        time = 5
        reason = uwmadison anycast event
        </fido_holddown>
   </2600:1d00::53:1000>

   <195.160.236.10>
        <fido_impact>
        value = 4
        reason = renesys route reflector peering
        </fido_impact>
   </195.160.236.10>

    <2607:f388:0:2200::/64>
        <fido_help_files>
            reason = MadIX
        </fido_help_files>

        <fido_impact>
            value = 4
            reason = madIX peering
        </fido_impact>
   </2607:f388:0:2200::/64>
</ip>

Attributes are matched in numerical order

<attributes>
   <100>
        <matches>
        <10>
        match = uwcrock
        key_match = device
        </10>
        </matches>

        <fido_time_of_day>
                time = 8am-8:50am
                reason = tod_test
        </fido_time_of_day>
   </100>


   <1000>
        <valid>
start = 12am
end = 1am
</valid>

     <matches>
        <10>
        match = fido_juniper_cpu
        key_match = test
        </10>
     </matches>

     <fido_holddown>
     time = 5
     reason = Juniper CPU
     comment = $reason
     </fido_holddown>
   </1000>

   # UPS alarms
   <1025>
     <matches>
        <10>
        <match>
        ArgusdcPwrSysBattAlrmIntegerValue Battery On Discharge =
        ArgusdcPwrSysRectAlrmIntegerValue Rect. AC Mains Fail =
        ArgusdcPwrSysRectAlrmIntegerValue Rect. Major Fail Count =
        ArgusdcPwrSysRectAlrmIntegerValue Urgent Rect. AC Mains Fail =
        ArgusdcPwrSysMajorAlarm Major Alarm =
        ArgusdcPwrSysMiscAlrmIntegerValue System Major =
        </match>
        <key_match>
                descr =
        </key_match>
        </10>
     </matches>

     <fido_impact>
     value = 1
     reason = power issue
     </fido_impact>
   </1025>

   <1050>
     <matches>
        <5>
        equal = jnxOperatingState
        key_match = test
        </5>

        <10>
        equal = runningAtFullSpeed
        key_match = state
        </10>

        <20>
        match = jnxOperatingState Fan
        key_match = descr
        </20>

     </matches>

     <fido_holddown>
     time = 10
     reason = Juniper Fan
     comment = $reason
     </fido_holddown>

     <fido_impact>
     value = 4
     reason = Juniper Fan
     </fido_impact>

   </1050>



   # attempt to hit all lab alarms
   <1100>
     <matches>
        <10>
        <match>
           -lab- =
           -lab.uwsys.net =
        </match>
        <key_match>
                device =
                descr =
        </key_match>
        </10>
     </matches>

     <fido_impact>
     value = 3
     reason = tech lab
     </fido_impact>
   </1100>

   <1200>
     <matches>
        <20>
        key_match = ___infohash___Descr
        <match>
                :RI: =
        </match>
        </20>
     </matches>
     <fido_impact>
     value = 3
     reason = management network
     </fido_impact>
   </1200>

   <9000>
     <matches>
        <10>
           <match>
              ^b-.+-hub.uwsys.net-snmp_node = 3
           </match>
           key_match = item
        </10>
     </matches>
     <fido_impact>
     value = 3
     reason = DC plant
     </fido_impact>
   </9000>

   <10000>
     <matches>
        <10>
           <match>
              .+-hub.uwsys.net-snmp_node = 1
           </match>
           key_match = item
        </10>
     </matches>
     <fido_impact>
     value = 1
     reason = hub node
     </fido_impact>
   </10000>

</attributes>






Keywords:FIDO: Impact, Time of Day, Hold Down attributes   Doc ID:38253
Owner:Michael H.Group:Network Services
Created:2014-03-08 15:16 CSTUpdated:2018-07-03 11:22 CST
Sites:Network Services, Systems & Network Control Center, University of Wisconsin System Network
Feedback:  0   0