Issues with pinging campus routers
Network Services has found that a few campus customers are using pings (often times automated via tools like Nagios) to their LAN's gateway address to verify connectivity to the campus network. However, there are some caveats with this approach that you should know about to prevent false alarms.
- Understanding traffic destined through vs to a campus router
Nearly all traffic through a campus network router is processed in hardware ASICs at an extremely fast rate, usually incurring a delay of less than 0.2 ms. This hardware accelleration is how we are able to build a campus network backbone with many 10 Gigabit links.
However, some traffic needs special attention by one of the router's CPUs. The most common traffic that would need special attention are packets destined to the router directly, such as pinging the router. Other packets that need special attention include packets that require ICMP responses from the router (like destination unreachable), routing protocol communications between routers, and network monitoring and management.
- Traffic to a campus router is rate limited and prioritized
Because the CPU resources on the routers are finite, packets destined to the router's cpu are put into a queue to be serviced. This queue is rate limited to help protect the CPU against network storms and denial of service conditions.
In addition to the rate limits, the queues are serviced in a priority order. When push comes to shove, the router would rather service routing protocol updates and other management communication to ensure that normal traffic passing through the router can continue to be hardware accelerated. Responding to pings and traceroutes are not a very high priority in the grand scheme of things.
- Campus router CPU priorities
The router's CPU runs a dedicated operating system. Different processes that run on the CPU are assigned a priority. Again, the router would rather service routing protocols and management traffic than respond to pings. These priorities can be changed somewhat in the operating system, but this can be dangerous and is a bit of a dark art.
- There may be more than one router on your LAN.
On most campus network LANs routed by DoIT, there are two routers on your LAN, and three IP addresses in use. By our convention, the lowest usable ip address on your LAN is the default gateway the hosts on the LAN use. This ip address is actually virtual and can change onto a different router if need be. The next two ip addresses allow the backup router and the primary router, respectively, to talk to each other. Note: LAN's spanning across campus do not have this redundancy feature.
The downside to pinging this gateway address is that it only verifies that a router exists on your lan, and does not verify which router you are currently using, or that the router is actually servicing traffic through it.
- Network Operations Center Monitoring
The Network Operations Center uses tools to monitor the campus network on behalf of our customers. It is the NOC's goal to recognize outage situations and start the process to resolution before the customer notices. Currently the NOC is monitoring over 5,000 devices containing 10,000 IP addresses and about 30,000 ports. Typically our management systems actively go out and probe parts of the network every 90 seconds in addition to receiving near real-time reporting from the network equipment.
When the NOC notices an issue, they determine the scale and relative severity of the issue, prioritize it, and dispatch field technicians and/or network engineers for followup. The most severe cases, designated "Impact 1" are posted on a webpage and RSS feed available through the DoIT Helpdesk.
- Known Issues
At this time, one of the campus routers CPU's is often extremely busy and has a reputation for dropping pings and traceroutes to it. However, this does not affect any production traffic through the router. This router is r-cssc-b280c-1-core.net.wisc.edu. The reason it is busy is because of the sheer number of interfaces and hosts connected to it and the rate our internal tools collect data from it (for example, AANTS NetWatch).
Facilitated by fiber optic infrastructure projects in Fall 2007 and Spring 2008, this router will have about 7/8ths of core routing transitioned off of it, and about 3/8ths of the customers directly routed on it will be migrated to a new router being installed. This will work to mitigate the high CPU load on this router.
For troubleshooting purposes, it is valuable to ping a host's gateway to see if the host is properly connected to the network. In addition, try pinging hosts on the same LAN, and hosts on a different LAN (for example, www.wisc.edu).
However, Network Services discourages running automatic monitoring tools that routinely ping campusrouters, unless the customer is willing to deal with false alarms that might arise from using such tools.