Network microburst detection - pathsolutions voip network troubleshooting blog post

What is a Microburst Link Flood?

You have gigabit and better speed links throughout your network, and your 5-minute utilization graphs all show that you have gobs of spare bandwidth on your links. You might think that your network is perfectly healthy due to all of this excessive bandwidth. In many cases, you might be right. In certain cases, your network might be significantly more sluggish than it has to be.

If your network monitoring solution watches link utilization every five minutes, you end up with a very vague understanding of the usage of the link.

If you increase the polling frequency to every 15 seconds, you may still not see what’s happening, as most traffic spikes occur within a few seconds and then disappear. It is also not advisable to poll a large number of links on a frequent basis, as it increases network and device loading.

Here is a sample scenario: If your 5-minute utilization shows 20% utilization on a gigabit link, you might think that there’s a lot of available bandwidth before you have to consider adding bandwidth to that link (80%).

What if that link was doing 120% utilization for 60 seconds, and then completely quiet for 4 minutes? The link would transmit 100% utilization, and then start buffering additional packets until the buffers were full and then start discarding traffic.

Standard monitoring solutions would be blind to this issue, yet if you looked at what happened, you would see a lot of buffered packets (dot3StatsDeferredTransmissions), followed by a ton of discarded packets (ifOutDiscards).

Thus, if you see the deferred transmissions counter increase, and also see an increase of discarded packets, you might be suffering from Microburst Link Floods. This is typical on many networks and can indicate that a link is flooded for a short period of time, and packets are discarded as a result.

This is a more accurate method of determining if additional bandwidth is needed on a link or not, as it shows where the network is suffering under load.


Read more