FIDO: Correlation

FIDO: Correlation

FIDO performs multiple types of correlation.

1) Topology based

* unreachable IPs or nodes based on layer 3 topology [traceroute]
* service IP alarms to ICMP unreachable alarms
* ICMP unreachable alarms to known device interfaces
* LLDP/CDP based correlation, node based or interface based
* port channel/aggregated ethernet members

Correlation can occur in either a module or fido.pl itself.  fido.pl correlation rules are described in utils/FidoCorrelation.pm



2) Comment based [aka human intervention]

* items that are tagged with the same comment are group together



3) Alarm attribute based

* As of 2014/03/08, only applies to alarms that have not been correlated based on topology or comment based.
* Attributes of alarms can be examined and if a positive match is made, the alarm is added to the evaluated group.  An alarm can be a part of multiple groups; groups are then evaulated in priority order.  

example:


<attributes>
   # correlate not_found alarms for a single device; use case scenarios including FEX removals which can create dozens of alarms
   <100>
     <matches>
        <10>
        defined = 1
        key_match = device
        </10>

        <20>
        equal = not found
        key_match = state
        </20>
     </matches>

     group_eval = "return 'not_found_' . $$item_ptr{'device'}"
   </100>

   # correlate admin down alarms for a single device; use case scenarios are mass interface shutdowns after a repoll cycle
   # that go uncommented for a lengthy period of time by operators
   <110>
     <matches>
        <10>
        defined = 1
        key_match = device
        </10>

        <20>
        equal = admin down
        key_match = state
        </20>
     </matches>
     group_eval = "return 'admin_down_' . $$item_ptr{'device'}"
   </110>

   # group the 'info' column together; this way interfaces that go back to a common node might get grouped.
   <120>
     <matches>
        <10>
        defined = 1
        key_match = info
        </10>

     </matches>
     group_eval = "return 0 if $$item_ptr{'info'} =~ m/^source: /; \
                   return 'info=' . $$item_ptr{'info'}"
   </120>

   # group icmpv4/icmpv6 based on domain info/dns
   <130>
     <matches>
        <10>
        defined = 1
        key_match = descr
        </10>

     </matches>
     group_eval = "return 'descr=' . $$item_ptr{'descr'}"
   </130>



   # group alarms across modules; for example, admin down creates a DOM threshold
   <135>
     <matches>
        <10>
        defined = 1
        key_match = interface
        </10>

        <20>
        defined = 1
        key_match = device
        </20>

     </matches>
     group_eval = "my $device = $$item_ptr{'device'}; \
                   $device = $1 if ($device =~ m/(.+?)\./); \
                   my $interface = lc( NS::StringUtils::sanitizeRRDTarget( $$item_ptr{'interface'} )); \
                   my $group = 'device=' . $device . '_interface=' . $interface; \
                   return $group"
   </135>

   # below will group things together such as in/out bandwith alarm on same node.
   # but it can snarf up other unwanted stuff, so it is low priority
   <220>
     <matches>
        <5>
        defined = 1
        key_match = device
        </5>

        <10>
        defined = 1
        key_match = state
        </10>

        <20>
        defined = 1
        key_match = test
        </20>

     </matches>
     group_eval = "return $$item_ptr{'device'} . '_' . $$item_ptr{'state'} . '_' . $$item_ptr{'test'}"
   </220>

    # try to group alarms for a device together.
    # observed after 2014/06/17 power outage.
    <300>
        <matches>
            <5>
              defined = 1
              key_match = device
            </5>
        </matches>
     group_eval = "return 'device=' . $$item_ptr{'device'}"
    </300>

    # try to group alarms for the same class of devices in a room together.
    # observed after 2014/06/17 power outage.
    <400>
        <matches>
            <5>
              defined = 1
              key_match = device
            </5>
        </matches>
     group_eval = "if ($$item_ptr{'device'} =~ m/^(\w+?-\w+?-\w+?)-\d+?-.+/) { \
                        #print "class/room found = '$1'\n"; \
                        return 'class/room='. $1; \
                   } else { \
                        #print "no class/room for $$item_ptr{'device'}\n"; \
                   }"
    </400>

    # try to group alarms for the same room together.
    # observed after 2014/06/17 power outage.
    #<410>
    #    <matches>
    #        <5>
    #          defined = 1
    #          key_match = device
    #        </5>
    #    </matches>
    # group_eval = "if ($$item_ptr{'device'} =~ m/^\w+?-(\w+?-\w+?)-\d+?-.+/) { \
    #                    #print "room found = '$1'\n"; \
    #                    return 'room='. $1; \
    #               } else { \
    #                    #print "no room for $$item_ptr{'device'}\n"; \
    #               }"
    #</410>


    #<400>
    # <matches>
    #    <5>
    #    defined = 1
    #    key_match = descr
    #    </5>
    #
    #    <10>
    #    defined = 1
    #    key_match = state
    #    </10>
    #
    #    <20>
    #    defined = 1
    #    key_match = test
    #    </20>
    #
    # </matches>
    # group_eval = "my $possible_building = $$item_ptr{'descr'}; \
    #               $possible_building = $2 if ($possible_building =~ m/^\w+?-(\w+?)-/) ; \
    #               return $possible_building . '_' . $$item_ptr{'state'} . '_' . $$item_ptr{'test'}"
    #</400>



    # try to group alarms for a building together, based only on the descr [PTR]
    #<400>
    # <matches>
    #    <5>
    #    defined = 1
    #    key_match = descr
    #    </5>
    #
    #    <10>
    #    defined = 1
    #    key_match = state
    #    </10>
    #
    #    <20>
    #    defined = 1
    #    key_match = test
    #    </20>
    #
    # </matches>
    # group_eval = "my $possible_building = $$item_ptr{'descr'}; \
    #               $possible_building = $2 if ($possible_building =~ m/^\w+?-(\w+?)-/) ; \
    #               return $possible_building . '_' . $$item_ptr{'state'} . '_' . $$item_ptr{'test'}"
    #</400>

</attributes>




In a 2017/09 wan-routing meeting while discussing FIDO we came across the fido_correlation.config <device_topology_priority> stanza. It currently has two uses
1) [./bin/update/update_icmp_ips.pl]: when selecting a PTR for an IP, if there is more than one, the device with the highest priority wins.
2) [./lib/FidoCorrelation.pm]: When doing topology correlation I try to rely on l2 (CDP/LLDP) or l3 traceroute data based on distance from root node.  If there is a break in topology continuity (for example, not CDP path from root to device but there is between two devices) the topology importance dictates which alarm should be the parent vs child from a correlation perspective. 
This value is calculated by $FIDO/bin/update/update_topology_info.pl.



Keywords:FIDO: Correlation   Doc ID:38262
Owner:Michael H.Group:Network Services
Created:2014-03-08 18:35 CDTUpdated:2017-09-06 09:44 CDT
Sites:Network Services, Systems & Network Control Center, University of Wisconsin System Network
Feedback:  0   0