FIDO: File Formats
FIDO: File Formats
Description of the Status Files written by FIDO tests
Want to make your own test? Here is the format of the status files.All tests, including FIDO proper itself, will write out a status file that contains information about the test itself and objects related to the test.
<item data> <failed-object [object name]> # appended with the test type to avoid collisions candidate # CALCUATED BY FIDO: Only relevant for fido_node alarms. List of contributing alarms item_data s-hotspare-b360-17-hotspare-fido_node candidate [s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node]
correlation # OPTIONAL item = [string] # this object is related to an upstream object reason = [string] # text description of why this correlation was madedeleteme # OPTIONAL: [boolean] # delete this from the FIDO report file even if I didn't repoll a status update for everything this time arounddeleteme_instance # OPTIONAL: [interger] # delete this from the FIDO report file even if I didn't repoll a status update for everything this time around, but only this instancedescr # CALCULATED BY FIDO: [string]: FIDO's calculated description of an item
device # OPTIONAL string. Used by the correlation engine event_id # CALCULATED BY FIDO: [number]: unique identifier for this event failures # CALCULATED BY FIDO: [integer]: number of times the update was determined to have failed file # CALCULATED BY FIDO: [string]: ingress filename for given alarm
help # OPTIONAL: will be displayed in the gui. If you don't supply, the test filename will be used key1 = help file name [string name] key2 = help file number [integer] holddown # CALCULATED BY FIDO: matches = [hash] # describes why the item in on holddown reason = [string] time = [integer] # time in minutes to hold down alarm based on alarm start time absolute_time = [string] # time in text to hold down alarm.
impact # OPTIONAL: assumed 2 if not present value = integer reason = OPTIONAL [string]
info
# OPTIONAL [string]. gui mouseover or 'wideinfo' shows this information
infohash # OPTIONAL: successor of 'info' key examples: infohash additional_info [122 accepted, 25 dropped, 1000 threshold] infohash icmp_source [fido_icmpv4_manual] infohash interface [port-channel109] infohash status [No response from '146.151.144.148' infohash timestamp [2020/04/01 09:20:00] infohash AS [65011] infohash Descr [Goodnight Hall/Communicative Disorders] infohash IP [192.168.1.1]
interface
# OPTIONAL [string] used by the correlation engine
ip # OPTIONAL [ip address] used by the correlation engine and also used for DNS lookups ipam # OPTIONAL: CALCULATED BY FIDO: ipam circuit_id [value undefined] ipam connector [value undefined] ipam subnet [146.151.144.148/32]
iso # [integer] used for correlation. The lower the value, the more important to the correlation engine. example, ip=3, service=4, so service is correlated to IP, not vice versa.
keepme # OPTIONAL: [boolean] # keep this in the FIDO report file even if it isn't in the current status file # this is to support objects that aren't polled every cycle logged # CALCULATED BY FIDO: [boolean]: whether the item has been logged needed_failures # OPTIONAL [integer]: defaults to one. number of consecutive poll failures before the object shows in the gui rtt # OPTIONAL [number] supplied by some tests to describe how long it took to verify this particular alarm. snmp_instance # OPTIONAL [integer]: for debugging source # CALCULATED BY FIDO.PL: federation information source fido-animal.net.wisc.edu [1] source fido-cssc.net.wisc.edu [1]
start # CALCULATED BY FIDO.PL: time in ticks alarm was first seen start_text # CALCULATED BY FIDO.PL: time in string alarm was first seen
state
# REQUIRED: [string]: state of the tested object
status # CALCULATED BY FIDO.PL: whether or not the item is stale, ok, etc suppressed # OPTIONAL: [integer]: when suppression is being used, how many items are not shown because of this parent item. Supplied by fido_snmp.pl test # REQUIRED: [string], the type of failure, displayed in the 'type' column, often used for correlation time # REQUIRED: [interger]: time in ticks that the object failed during the current test period
time_of_day # CALCULATED BY FIDO: OPTIONAL matches = [hash] # describes why the item in on holddown actual_match = [string] reason = [string] time = [string] # time in text to hold down alarm.ts_device # OPTIONAL string. Used by the correlation enginets_urls # OPTIONAL: key/value of url_key and destination ts_urls gnmis [http://stats.net.wisc.edu/cgi-bin/gnmis.fcgi?device=^s-hotspare-b360-17-hotspare.net.wisc.edu]
updated # CALCULATED BY FIDO: [boolean]: This is used to determine if an object should be removed from the report file. The item is removed if the item itself isn't updated but the file it came from has been.urls # CALCULATED BY FIDO: key/value of url_key and destination urls __alarm_info [https://fido-cssc.net.wisc.edu/cgi-bin/fido_alarm_info.cgi?item=2607%3Af388%3A2%3A6002%3A%3A3-icmpv6&test=icmpv6] urls rrd [<a href=http://stats.net.wisc.edu/cgi-bin/genstatspage.fcgi?rrdfile=/data/mrtg/icmpv6/ip=2607:f388:2:6002::3_family=ipv6_icmpv6.rrd> cgi: rrd <IMG SRC="http://stats.net.wisc.edu/cgi-bin/gengraph.fcgi?rrdfile=/data/mrtg/icmpv6/ip=2607:f388:2:6002::3_family=ipv6_icmpv6.rrd"></a>]
via_connected # CALCULATED BY FIDO: correlation item_data 2607:f388:e:101::10-icmpv6 via_connected subnet [2607:f388:e:101::1/64] < /item data > grep "added: item_data" output.txt | egrep -v " (correlation|device|help|impact|info|infohash|interface|ip|iso|needed_failures|state|suppressed|test|time|url|start|start_text|status|source|event_id|failures|file|logged|urls|updated|descr|ipam|ts_device|ts_urls|candidate|via_connected|snmp_instance|deleteme|deleteme_instance) "
< status > alldown # OPTIONAL: [integer]: false if everything isn't down, otherwise it will be the number of down items contributors # OPTIONAL: unique to updateType [scraper], lists health of downstream contributors contributors ban-arp state [OK] contributors fp-arp state [OK] contributors fp-systemEnvironmentals state [OK] data_files # OPTIONAL, but really good idea: source data to show in fido_info.cgi data_files gMegaFlow_fido_icmp_ips.bin [] data_files gMegaFlow_fido_icmp_ips_verbose.bin [] items # REQUIRED: [string] : total monitored items, not a count of what's down dhcp_leases items [820 rrd files [731 file(s) ignored]] last_cycle_time # REQUIRED: [integer] (in seconds) last_poll # OPTIONAL: unique to updateType [scraper], informational/debuglast_update # OPTIONAL: [integer] (in seconds), last time a data source was updatedpolled # OPTIONAL [array of polled items]. A list of 'test' types that had been updated during this most recent cycle. For tests that may not be testing every test type all of the time fido.status: added: snmprrd polled ADVAopticalIfDiagInputPower [array_string] fido.status: added: snmprrd polled ADVAopticalIfDiagOutputPower [array_string] fido.status: added: snmprrd polled AlphadataNumberValue [array_string] ... report # REQUIRED: [integer]: always increases by one, make sure we don't miss a reportreport_time # REQUIRED: [integer]: time this report was writtenrestart # OPTIONAL: [string]: used in conjunction with 'user' for fido_watchdogstale # REQUIRED: [integer]: seconds after last update before this data is stalestart # OPTIONAL: [integer]: ticks when this daemon started status # CALCULATED BY FIDO: [string]: OK, stale, etc threads # OPTIONAL: [integer]: how many thread the test is configured to use updateType # OPTIONAL: [string]: scraper_alarm has a unique update methodology. This is a trigger to fido.pl to behave differently url # OPTIONAL: [string]: points to file that contains a list of items, display in the FIDO status page uc_esx url [/uc_esx.txt] user # OPTIONAL: [string]: to be used by fido_watchdog < /status > status_output.txt | egrep -v " (items|last_cycle_time|polled|report|stale|threads|url|status|report_time|last_update|data_files|user|start|restart|contributors|updateType|last_poll) "
Description of fido.report.dat
The main report file that is read by the CGI'comment_correlation' => 'comment' => # a comment 'best' => # the best item for this comment, as determined by FIDO. # best is defined by being the item with the most things # correlated to it. Other tiebreakers are also used. 'items' => # a list of items with the above comment 'item with the comment mentioned above' comment_correlation 'cr 90035' best [s-bocklabs-254-1-access.local.net.wisc.edu ifOperStatus Gi1/0/1-ifOperStatus] comment_correlation 'cr 90035' items 's-bocklabs-254-1-access.local.net.wisc.edu ifOperStatus Gi1/0/1-ifOperStatus' [1] comment_correlation 'cr 90035' items 10.151.40.134-icmpv4 [1]
'comments' => # where the comments are kept # they are kept separate from the items themselves as we keep comment information for items that get deleted.# comment for this item comments '$itemName' comment [Problem 15824] # old comment data gets turned into a hash array and appended when the comment changes comments '$itemName' comment_history [{"discovery":"pending","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575"},{"discovery":"1575917448","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575"},{"comment":"Problem 15824","date":1578317204.17587,"discovery":"1575917448","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575","user":"aamohamed3"}] [array_jsonified]# [integer, ticks]; When this comment was last updated comments '$itemName' date [1578317204.17587]
# when the item was first monitored by FIDO comments '$itemName' discovery [1578410731]# [integer, ticks]; time the alarm was added comments '$itemName' first_seen [1578300837]
#[integer, ticks]; time the alarm was removed comments '$itemName' last_seen [1584513560]#[integer, ticks]; length in seconds that this alarm has been historically alarming. Only present for items that have been removed at least once comments '$itemName' length [6200691]
#[integer]; number of times the alarm has been seen comments '$itemName' occurrences [115]# can be used to link to an event in the sql log DB [if applicable] comments '$itemName' recurring_event_id [20200303.50091]
# can be used to link to an event in the sql log DB comments '$itemName' related_event_id [20200106.3575] # who commented the item comments '$itemName' user [aamohamed3]'correlated' # summary of known network topology correlations as determined by fido.pl examples:correlated 'r-wa222-12-2-radial.net.wisc.edu ifOperStatus Vl2671-ifOperStatus' 146.151.178.213-icmpv4 [1] correlated 's-chmbrln-1327-1-radial.local.net.wisc.edu ifOperStatus Po7-ifOperStatus' 's-chmbrln-1327-1-radial.local.net.wisc.edu ifOperStatus Te1/0/7-ifOperStatus' [1] correlated 10.151.129.46-icmpv4 2607:f388:f:1c::2-icmpv6 [1] correlated s-hotspare-b360-17-hotspare-fido_node s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node [1] correlated s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node 146.151.144.148-icmpv4 [1]'event_id' [string] #next event will be assigned this ID event_id [20200401.31941]'fido_start_time' [integer, ticks] fido_start_time [1585746923]'group_correlated' # hash structure defining correlation groupings based on alarm attributes group_correlated 2607:f388:2:6002::3-icmpv6 group [group='ipam descr=DoIT Platform Non Restricted Data'] group_correlated 2607:f388:2:6002::3-icmpv6 members 2607:f388:2:6002::2-icmpv6 [1]
'last_cycle_time' => [string] # how long it took to create the last fido report last_cycle_time [<1]
'report' => [integer] # fido report number, increases by one every time a new report is written. report [220]'report_time' [integer, ticks] When the report was written report_time [1585751633] }