FIDO: File Formats
FIDO: File Formats
Description of the Status Files written by FIDO tests
Want to make your own test? Here is the format of the status files.
All tests, including FIDO proper itself, will write out a status file that contains information about the test itself and objects related to the test.
<item data>
<failed-object [object name]> # appended with the test type to avoid collisions
candidate
# CALCUATED BY FIDO: Only relevant for fido_node alarms. List of contributing alarms
item_data s-hotspare-b360-17-hotspare-fido_node candidate [s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node]
correlation
# OPTIONAL
item = [string] # this object is related to an upstream object
reason = [string] # text description of why this correlation was made
deleteme
# OPTIONAL: [boolean]
# delete this from the FIDO report file even if I didn't repoll a status update for everything this time around
deleteme_instance
# OPTIONAL: [interger]
# delete this from the FIDO report file even if I didn't repoll a status update for everything this time around, but only this instance
descr
# CALCULATED BY FIDO: [string]: FIDO's calculated description of an item
device
# OPTIONAL string. Used by the correlation engine
event_id
# CALCULATED BY FIDO: [number]: unique identifier for this event
failures
# CALCULATED BY FIDO: [integer]: number of times the update was determined to have failed
file
# CALCULATED BY FIDO: [string]: ingress filename for given alarm
help
# OPTIONAL: will be displayed in the gui. If you don't supply, the test filename will be used
key1 = help file name [string name]
key2 = help file number [integer]
holddown
# CALCULATED BY FIDO:
matches = [hash]
# describes why the item in on holddown
reason = [string]
time = [integer]
# time in minutes to hold down alarm based on alarm start time
absolute_time = [string]
# time in text to hold down alarm.
impact
# OPTIONAL: assumed 2 if not present
value = integer
reason = OPTIONAL [string]
info
# OPTIONAL [string]. gui mouseover or 'wideinfo' shows this information
infohash
# OPTIONAL: successor of 'info' key
examples:
infohash additional_info [122 accepted, 25 dropped, 1000 threshold]
infohash icmp_source [fido_icmpv4_manual]
infohash interface [port-channel109]
infohash status [No response from '146.151.144.148'
infohash timestamp [2020/04/01 09:20:00]
infohash AS [65011]
infohash Descr [Goodnight Hall/Communicative Disorders]
infohash IP [192.168.1.1]
interface
# OPTIONAL [string] used by the correlation engine
ip
# OPTIONAL [ip address] used by the correlation engine and also used for DNS lookups
ipam
# OPTIONAL: CALCULATED BY FIDO:
ipam circuit_id [value undefined]
ipam connector [value undefined]
ipam subnet [146.151.144.148/32]
iso
# [integer] used for correlation. The lower the value, the more important to the correlation engine.
example, ip=3, service=4, so service is correlated to IP, not vice versa.
keepme
# OPTIONAL: [boolean]
# keep this in the FIDO report file even if it isn't in the current status file
# this is to support objects that aren't polled every cycle
logged
# CALCULATED BY FIDO: [boolean]: whether the item has been logged
needed_failures
# OPTIONAL [integer]: defaults to one. number of consecutive poll failures before the object shows in the gui
rtt
# OPTIONAL [number] supplied by some tests to describe how long it took to verify this particular alarm.
snmp_instance
# OPTIONAL [integer]: for debugging
source
# CALCULATED BY FIDO.PL: federation information
source fido-animal.net.wisc.edu [1]
source fido-cssc.net.wisc.edu [1]
start
# CALCULATED BY FIDO.PL: time in ticks alarm was first seen
start_text
# CALCULATED BY FIDO.PL: time in string alarm was first seen
state
# REQUIRED: [string]: state of the tested object
status
# CALCULATED BY FIDO.PL: whether or not the item is stale, ok, etc
suppressed
# OPTIONAL: [integer]: when suppression is being used, how many items are not shown because of this parent item. Supplied by fido_snmp.pl
test
# REQUIRED: [string], the type of failure, displayed in the 'type' column, often used for correlation
time
# REQUIRED: [interger]: time in ticks that the object failed during the current test period
time_of_day
# CALCULATED BY FIDO: OPTIONAL
matches = [hash]
# describes why the item in on holddown
actual_match = [string]
reason = [string]
time = [string]
# time in text to hold down alarm.
ts_device
# OPTIONAL string. Used by the correlation engine
ts_urls
# OPTIONAL: key/value of url_key and destination
ts_urls gnmis [http://stats.net.wisc.edu/cgi-bin/gnmis.fcgi?device=^s-hotspare-b360-17-hotspare.net.wisc.edu]
updated
# CALCULATED BY FIDO: [boolean]:
This is used to determine if an object should be removed from the report file.
The item is removed if the item itself isn't updated but the file it came from has been.
urls
# CALCULATED BY FIDO: key/value of url_key and destination
urls __alarm_info [https://fido-cssc.net.wisc.edu/cgi-bin/fido_alarm_info.cgi?item=2607%3Af388%3A2%3A6002%3A%3A3-icmpv6&test=icmpv6]
urls rrd [<a href=http://stats.net.wisc.edu/cgi-bin/genstatspage.fcgi?rrdfile=/data/mrtg/icmpv6/ip=2607:f388:2:6002::3_family=ipv6_icmpv6.rrd> cgi: rrd <IMG SRC="http://stats.net.wisc.edu/cgi-bin/gengraph.fcgi?rrdfile=/data/mrtg/icmpv6/ip=2607:f388:2:6002::3_family=ipv6_icmpv6.rrd"></a>]
via_connected
# CALCULATED BY FIDO: correlation
item_data 2607:f388:e:101::10-icmpv6 via_connected subnet [2607:f388:e:101::1/64]
</item data>
grep "added: item_data" output.txt | egrep -v " (correlation|device|help|impact|info|infohash|interface|ip|iso|needed_failures|state|suppressed|test|time|url|start|start_text|status|source|event_id|failures|file|logged|urls|updated|descr|ipam|ts_device|ts_urls|candidate|via_connected|snmp_instance|deleteme|deleteme_instance) "
<status>
alldown
# OPTIONAL: [integer]: false if everything isn't down, otherwise it will be the number of down items
contributors
# OPTIONAL: unique to updateType [scraper], lists health of downstream contributors
contributors ban-arp state [OK]
contributors fp-arp state [OK]
contributors fp-systemEnvironmentals state [OK]
data_files
# OPTIONAL, but really good idea: source data to show in fido_info.cgi
data_files gMegaFlow_fido_icmp_ips.bin []
data_files gMegaFlow_fido_icmp_ips_verbose.bin []
items
# REQUIRED: [string] : total monitored items, not a count of what's down
dhcp_leases items [820 rrd files [731 file(s) ignored]]
last_cycle_time
# REQUIRED: [integer] (in seconds)
last_poll
# OPTIONAL: unique to updateType [scraper], informational/debug
last_update
# OPTIONAL: [integer] (in seconds), last time a data source was updated
polled
# OPTIONAL [array of polled items]. A list of 'test' types that had been updated during this most recent cycle. For tests that may not be testing every test type all of the time
fido.status: added: snmprrd polled ADVAopticalIfDiagInputPower [array_string]
fido.status: added: snmprrd polled ADVAopticalIfDiagOutputPower [array_string]
fido.status: added: snmprrd polled AlphadataNumberValue [array_string]
...
report
# REQUIRED: [integer]: always increases by one, make sure we don't miss a report
report_time
# REQUIRED: [integer]: time this report was written
restart
# OPTIONAL: [string]: used in conjunction with 'user' for fido_watchdog
stale
# REQUIRED: [integer]: seconds after last update before this data is stale
start
# OPTIONAL: [integer]: ticks when this daemon started
status
# CALCULATED BY FIDO: [string]: OK, stale, etc
threads
# OPTIONAL: [integer]: how many thread the test is configured to use
updateType
# OPTIONAL: [string]: scraper_alarm has a unique update methodology. This is a trigger to fido.pl to behave differently
url
# OPTIONAL: [string]: points to file that contains a list of items, display in the FIDO status page
uc_esx url [/uc_esx.txt]
user
# OPTIONAL: [string]: to be used by fido_watchdog
</status>
status_output.txt | egrep -v " (items|last_cycle_time|polled|report|stale|threads|url|status|report_time|last_update|data_files|user|start|restart|contributors|updateType|last_poll) "
Description of fido.report.dat
The main report file that is read by the CGI
'comment_correlation' =>
'comment' => # a comment
'best' => # the best item for this comment, as determined by FIDO.
# best is defined by being the item with the most things
# correlated to it. Other tiebreakers are also used.
'items' => # a list of items with the above comment
'item with the comment mentioned above'
comment_correlation 'cr 90035' best [s-bocklabs-254-1-access.local.net.wisc.edu ifOperStatus Gi1/0/1-ifOperStatus]
comment_correlation 'cr 90035' items 's-bocklabs-254-1-access.local.net.wisc.edu ifOperStatus Gi1/0/1-ifOperStatus' [1]
comment_correlation 'cr 90035' items 10.151.40.134-icmpv4 [1]
'comments' => # where the comments are kept
# they are kept separate from the items themselves as we keep comment information for items that get deleted.
# comment for this item
comments '$itemName' comment [Problem 15824]
# old comment data gets turned into a hash array and appended when the comment changes
comments '$itemName' comment_history [{"discovery":"pending","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575"},{"discovery":"1575917448","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575"},{"comment":"Problem 15824","date":1578317204.17587,"discovery":"1575917448","first_seen":1578300837,"occurrences":1,"related_event_id":"20200106.3575","user":"aamohamed3"}] [array_jsonified]
# [integer, ticks]; When this comment was last updated
comments '$itemName' date [1578317204.17587]
# when the item was first monitored by FIDO
comments '$itemName' discovery [1578410731]
# [integer, ticks]; time the alarm was added
comments '$itemName' first_seen [1578300837]
#[integer, ticks]; time the alarm was removed
comments '$itemName' last_seen [1584513560]
#[integer, ticks]; length in seconds that this alarm has been historically alarming. Only present for items that have been removed at least once
comments '$itemName' length [6200691]
#[integer]; number of times the alarm has been seen
comments '$itemName' occurrences [115]
# can be used to link to an event in the sql log DB [if applicable]
comments '$itemName' recurring_event_id [20200303.50091]
# can be used to link to an event in the sql log DB
comments '$itemName' related_event_id [20200106.3575]
# who commented the item
comments '$itemName' user [aamohamed3]
'correlated'
# summary of known network topology correlations as determined by fido.pl
examples:
correlated 'r-wa222-12-2-radial.net.wisc.edu ifOperStatus Vl2671-ifOperStatus' 146.151.178.213-icmpv4 [1]
correlated 's-chmbrln-1327-1-radial.local.net.wisc.edu ifOperStatus Po7-ifOperStatus' 's-chmbrln-1327-1-radial.local.net.wisc.edu ifOperStatus Te1/0/7-ifOperStatus' [1]
correlated 10.151.129.46-icmpv4 2607:f388:f:1c::2-icmpv6 [1]
correlated s-hotspare-b360-17-hotspare-fido_node s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node [1]
correlated s-hotspare-b360-17-hotspare.net.wisc.edu-snmp_node 146.151.144.148-icmpv4 [1]
'event_id' [string] #next event will be assigned this ID event_id [20200401.31941]
'fido_start_time'
[integer, ticks]
fido_start_time [1585746923]
'group_correlated'
# hash structure defining correlation groupings based on alarm attributes
group_correlated 2607:f388:2:6002::3-icmpv6 group [group='ipam descr=DoIT Platform Non Restricted Data']
group_correlated 2607:f388:2:6002::3-icmpv6 members 2607:f388:2:6002::2-icmpv6 [1]
'last_cycle_time' => [string]
# how long it took to create the last fido report
last_cycle_time [<1]
'report' => [integer]
# fido report number, increases by one every time a new report is written.
report [220]
'report_time'
[integer, ticks]
When the report was written
report_time [1585751633]
}
