Packet Loss and its causes
Simply put, Packet loss is when packets of data being transmitted through a network get “knocked off” before reaching their destination. Every network encounters issues with packet loss on occasion. However, these issues do not typically have much of a negative impact on network performance. When it does have an effect, you should know how to find the cause and resolve it.
We will look at a few reasons why packet loss happens, standard diagnostics and common fixes.
- Packet Loss Effects
- Why Packet Loss Happens
- Detecting Packet Loss
- Diagnosing Packet Loss
- Finding Packet Loss
Packet Loss Effects
Packet loss affects different applications in different ways. For browsing and downloading data files, it will cause slowdowns. In many cases, the slowdowns may not be noticeable, as a 10% packet loss might just add 1 second to a 10 second download if you are working with a low-latency link. If the packet loss rate is higher, or there is high latency (like when browsing a website internationally) it can cause the slowdowns to be worse. For example: If you have 10% loss and 500ms of latency, a normal 10 second download might end up being over 20 seconds due to the number of packets that need to be retransmitted and the request time to have them re-sent.
For realtime applications like voice and video, the packet loss may be far more severe, as 2% packet loss is typically noticeable to a listener/viewer and can be irritating for conversations to occur without some amount of “what did you say?” questions being exchanged.
The impact of packet loss differs depending on the protocol/application. TCP is typically designed to handle packet loss because if a packet is lost, meaning it wasn’t acknowledged, it will be retransmitted. UDP does not have inbuilt retransmission capability, however, and does not handle packet loss as well. Regardless of the protocol/application, though, too much packet loss is a problem.
Typical examples of packet loss experienced by the end user are performance issues with Voice over IP and video calls. You have likely been on a Skype call or other type of virtual meeting, and there was a noticeable performance issue, like robotic sounding, or missed audio. This was probably the result of packet loss.
Why Packet loss happens
One of the common causes of packet loss is referred to as link congestion. Think of rush hour traffic when there are too many cars on the road, and no one can get anywhere or when a four-lane highway in a high-volume area merges down to two lanes. A very similar situation happens when more packets are arriving at a link than it was designed to handle.
Some links have been configured to drop packets after a certain limit though they can technically handle more traffic. Let’s say the link from your ISP can technically handle 100 Mbps, but you have only purchased 50 Mbps; the ISP will configure your devices to ensure you will only be able to push 50 Mbps worth of traffic. Anything more than that 50 Mbps will typically be dropped.
Another cause of network congestion is when an ISP intentionally oversubscribes a link. The thought process is that not all subscribers will be using the link simultaneously. What happens, however, is during peak periods when many people are using the service at the same time, there will be packet loss due to the congestion.
This oversubscription can happen in enterprise networks as well depending on the design of the network and the application use. If end users have 1gig connections to a 24port switch at the edge, and there are 10gig connections upstream, if more than 10 end users flood their links, the upstream trunk can become fully saturated.
Similar to network congestion is overburdened devices. This is a device forced to operate beyond its capacity. Packets transmitted in a network may arrive faster than they can be processed. Usually, devices will have buffers that can hold packets temporarily until they can be processed and sent out, but if a device is overburdened, or the device’s backplane is saturated, the buffers can fill up too quickly, and excess packets will be dropped.
Take, for example, a Cisco ASA firewall designed to handle up to 750 Mbps of throughput. Using such a device as an edge device for an organization pushing more than 750 Mbps will most certainly cause packet loss issues. What typically happens is that the device will perform well enough during off-peak hours, but during peak times, there will be a perceptible drop in performance.
Faulty hardware can also cause packet loss. A failing network interface, old CAT3 Ethernet cable, or fiber optic cabling with too tight of a turn radius can all cause packet loss.
Wired vs. Wireless networks
The type of network can also be the cause of packet loss. Wireless networks, due to their nature, suffer more impediments than wired networks. Radio frequency interference, weak signal, and distance limitations are all causes of packet loss on wireless networks while on a wired network, the most common reason is faulty cables. If a cable isn’t terminated properly or damaged, it can inhibit the electrical signals that flow through it carrying data.
A standard example of faulty configuration would be a speed and/or duplex mismatch between two devices. Let’s say one device was configured for half-duplex while the other for full-duplex, it can cause collisions as well as FCS errors resulting in packet loss.
Detecting Packet Loss
Detecting packet loss may be a simple as pinging the remote endpoint and seeing how many pings come back. If one or more pings fail to respond, then packets were lost somewhere between the two endpoints.
There are Application Performance Monitoring (APM) solutions on the market that can aid in detecting packet loss.
For example: PathSolutions’ Call Simulator can be used to quickly confirm the presence or absence of packet loss between two endpoints.
Diagnosing Packet Loss
In order to diagnose where the packet loss is coming from, you will need to find out which parts of the connection are stable and show no loss, versus the parts of the network that are experiencing loss. A two-step process is recommended:
- Run a Traceroute to the remote endpoint to determine the router hops to the destination.
- Perform a ping test to each of these router hops to see if the network is stable to a specific location, or if loss starts to occur at or beyond a certain point.
If everything is completely healthy to the 4th hop router, but encounters significant packet loss to every router beyond that hop, you can be confident that the problem lies just beyond the 4th hop router.
Note: Doing this type of testing may or may not expose the problem. If the problem is transient and disappears before you are able to complete your tests, it may not disclose the problem. Additionally, the problem may not expose itself due to the fact that ping packets do not create the same footprint on the network as the data or VoIP traffic that is actually suffering.
PathSolutions’ Call Simulator has the ability to do this test in an automated method, with the exact same footprint on the network as your applications.
Fixing Packet Loss
Resolving packet loss on a network can be as straightforward as collecting and analyzing the clues and finding a solution for that cause.
Network equipment vendors collect and store hundreds, sometimes thousands of error counters that can report conditions and types of packet drops detected on interfaces. These error counters can tell a story of what is occurring on an interface so problems can be remediated.
For example: If you see FCS errors on an interface as well as Alignment Errors, but no Collission errors, it means that there is a duplex mismatch and this interface is running full-duplex, and the other end is running half-duplex.
If you see FCS errors and no Alignment or Collision Errors, you may have a cabling fault.
If there are Symbol errors, then there is definitely a cabling fault, but the Ethernet chipset has been able to repair the problem temporarily and no packet was lost (but cabling should be checked/repaired before it becomes an FCS error).
You can have links that show less than 5% utilization that can have oversaturation problems. Imagine if you had a 5-lane freeway that was completely empty most of the time. If you checked the status of the freeway every hour on the hour and saw it as mostly empty, you would think that there were no issues. However, if you had a rock concert let out at 2:30pm that had the freeway massively overloaded for 20 minutes with lots of people complaining, that might mean you have a capacity issue. This type of even is called a Microburst Link Flood. If you see a number of Deferred Transmission errors, followed by a number of Outbound Discards, that will tell you that packets are being buffered and then discarded.
There are many other scenarios that should be considered based on the error counter interpretation.
Depending on the size of your network, manually checking every involved link and device can be tedious or even impossible and after spending the time to check every single one, the problem can mysteriously disappear leaving you no closer to a solution.
PathSolutions TotalView® software offers the ability to discover what each link and device was doing at the time of the event and with enough detail to determine why the packets were lost, all in a few seconds.