FIDO: Recurring items and comment stickiness
FIDO: Recurring items and comment stickiness
When an alarm is added to the FIDO database, FIDO keeps track of when the item was first and last seen. If the alarm clears but recurs before enough time passes, the alarm is considered to be recurring. The FIDO daemon occasionally purges non-recurring alarms from its database based on fido_attributes.yaml matches
As of current implementation, when an alarm is =removed= from the active FIDO database, a YAML calculation is made and the results are stored in the "comments" area of the FIDO database. This means that you cannot see the actual recurrence length for a given alarm until after the alarm is removed from the database. Because of this, it is difficult to use GUI tools to view this data, but it is present in /var/local/fido/reports/archive/ which can be examined with jq or json_dumper. An example:
[@fido-cssc archive]$ json_dumper.pl 2023-01-26/fido.report.bin-2023-01-26-09-22-22.gz | less
10.130.49.3-icmpv4:
comment_history:
- discovery: pending
first_seen: 1674742587
occurrences: 1
related_event_id: '20230126.2011'
discovery: '1591967697'
first_seen: 1674742587
last_seen: 1674746362
length: 401
occurrences: 2
recurring:
matches:
'10':
defined: '1'
key_match: test
reason: default
time: '60'
until: 1674749962
recurring_event_id: '20230126.3625'
related_event_id: '20230126.2011'
You =can= take the alarm name and plug it into fido_alarm_info.cgi to see this in the GUI.
Example: https://fido-cssc.net.wisc.edu/cgi-bin/fido_alarm_info.cgi?item=10.130.49.3-icmpv4&test=icmpv4
Despite this caveat, this is more informative than the old method, which was internally calculated within fido.pl with no external visibility to the user possible. We could change fido.pl to calculate/update the recurring value on every loop while the alarm is present, but it would be at the detriment of CPU cycles and scalability.
Example from fido_attributes.yaml
---
attributes:
130000:
fido_recurrence:
reason: daily
time: 1440 # 1 day
matches:
'10':
key_match: test
equal:
PoEAllocatedPower: ''
fido_campus_errors: ''
fido_dhcp_usage: ''
upsAdvBatteryTemperature: ''
130010:
fido_recurrence:
reason: daily
time: 1440 # 1 day
matches:
'10':
key_match: test
equal: AlphaalarmState
'20':
key_match: descr
match: Battery Temperature (High|Low)
match_re: 'true'
# matching default
139999:
fido_recurrence:
reason: default
time: 60 # 1 hour
matches:
'10':
key_match: test
defined: '1'
Legacy [as of 2023/01/26] Implementation
When an alarm is added to the FIDO database, FIDO keeps track of when the item was first and last seen. If the alarm clears but recurs before enough time passes, the alarm is considered to be recurring. The FIDO daemon occasionally purges non-recurring alarms from its database based on a regular expression match against the alarm name.
These values are configured in the fido.yaml
comments_cleanse:
'3600':
.*: ''
.*: ''
'86400':
PoEAllocatedPower: ''
fido_campus_errors: ''
fido_dhcp_usage: ''
upsAdvBatteryTemperature: ''
PoEAllocatedPower: ''
fido_campus_errors: ''
fido_dhcp_usage: ''
upsAdvBatteryTemperature: ''
Currently this feature works on the "comments" area of the FIDO database. The "comments" area has no access to alarm attributes after an alarm is cleared. For each iteration of the saved FIDO database, for each comment where an alarm is no longer active, FIDO performs a calculation to see if the comment should be forgotten from a recurrence perspective. This method allow us to change the formula [as needed] for comments_cleanse after an alarm has cleared.
Let's say in the future we wanted to be able to calculate the comment_cleanse time based on an alarm attribute in the FIDO database or GNMIS attribute for an associated RRD [if applicable]. We would move the calculation of comment recurrence expiry to when the alarm clears from the FIDO database. This would provide us access to the full alarm attribute list. The downside to this approach is that once this value is calculated it cannot be changed, since the calculation would be tied to when the alarm was removed. This is probably not a big deal, but something to consider.