Your network infrastructure is very similar to our road infrastructure: Sometimes, your daily commute may be perfect with no traffic jams or problems. At other times, it may be incredibly slow. On a rare occasion, you may not even make it to your destination at all due to an accident or road closure.
The same problems happen on networks where it may sometimes be slow, and other times packets end up never reaching their destination due to a network impairment.
For data packets, it’s easy to re-transmit the data if it was lost along the way, as TCP/IP is designed specifically to accomplish the goal of retransmitting lost data in an efficient manner. As a result, if a file transfer normally takes 10 seconds, but there is 10% loss, it will now take 11 seconds. In this case, nobody cares or even notices the problem.
With real-time protocols like VoIP, UC, and video, 10% loss would equate to a poor experience. Imagine missing one out of every ten words from a conversation. So much understanding would get lost that most people would give up on the conversation and hang up the call.
When considering latency, jitter, and loss metrics required for real-time protocols, loss is the number one problem that affects VoIP/UC quality, as it affects quality the most dramatically.
What is Packet Loss?
Packet loss is when a transmitted packet does not make it successfully to the destination.
|Note:||For real-time protocols like VoIP, UC, and Video, if packets arrive at their destination out-of-order, the codec must discard the packet because it is out of time sequence and cannot be spliced back together. Many monitoring and troubleshooting solutions may not view this as a dropped packet, but the codec will consider this a lost packet regardless, as it did not get received successfully in proper order. For more information about this problem, read Do Out-of-Order Packets Affect VoIP.
Where Can Packet Loss Occur?
Packet loss can happen virtually anywhere in the network: any interface, any device. This is the part that makes it hard to troubleshoot the root-cause source of the problem, as even small networks can still be quite complex with many different interfaces, switches, and routers that are traversed to reach the destination.
|Note:||Packet loss can even occur inside the destination device. If a PC runs out of receive buffers, it may drop the packet before it is able to send the packet to the softphone.
Additionally, if a PC is running VPN software and has a slower CPU and is running a complex high-security VPN encryption, it may drop packets as it is unable to keep up with the encryption/decryption.
Where Else Can Packet Loss Happen?
Something that should not be overlooked is if the problem happened outside of your domain. For example: If a user complains about a call quality problem, it may be that the other user was on a cell phone, or the problem occurred on the other company’s VoIP/UC phone system.
Loss can also occur between a phone or softphone and a wireless Bluetooth headset. This is not part of the IP network, but is a network regardless and can have packet loss problems. These problems usually relate to one of the following issues:
- RFI interference problems: If the work environment has a lot of devices that operate in the 2.4GHz frequency range, then there may be problems, as this frequency is a shared resource and will encounter loss as a result of this limitation. For example: If you have 200 agents working in small cubicles next to each other all with Bluetooth headsets, this creates a lot of contention for this frequency. Other possible problems are devices that spill over into this frequency range. Microwave ovens and high-voltage AC busses that drive elevators or HVAC systems may also interfere with headset sound quality.
- Distance problems: Class 2 Bluetooth is the most frequently implemented variety, and has a maximum range of 33 feet. As the distance gets closer to the edge, signal strength weakens and Bluetooth packets will be lost.
- Headset low-battery problems: When wireless headsets run low on batteries, they may not transmit with enough signal strength and packets will be lost. As a result, a low battery might lead to sound quality impairments.
What Does Packet Loss Sound Like?
Packet loss affects real-time protocols like VoIP and video by creating drop-outs, clipped words, and video artifacts. Entire phrases might also be missing from the conversation.
If the packet loss gets really bad, one side of the audio stream can completely disconnect causing one-way audio, or if packet loss is bi-directional, the entire call can drop.
Which Side Heard the Problem?
When a user complains about VoIP/UC or video problems, you want to determine who heard the problem. If the reporting user heard the drop-outs, that means they had problems receiving the audio information from the sender. If the reporting user said the other person could not hear them, then it is a problem where their transmission did not make it to the receiver.
In some cases, both parties may have problems, but only one person heard or complained about the problem.
Something else to consider is when you dial into a meeting but say nothing the entire meeting, you might have call quality problems where nobody could hear you speaking if you did say something, but since you didn’t say anything, nobody detected the problem.
What is Acceptable Loss?
Most video and audio codecs can deal with a certain amount of loss, and some are better at handling loss than others.
In general, packet loss should be below 1% when received by the endpoint.
Certain codecs can work better on lossy connections, as they will work to duplicate audio information in successive packets. This means that you may be able to have more than 1% loss and still have acceptable call quality, but only if the lost packets are not sequential.
The problem is that most packet tends to be sequential:
Communications might be perfect for 10 minutes with no loss, but then 10 packets in a row get dropped because of congestion on a link for a few seconds during a large file transfer. No codec can repair this problem because too much information was lost.
The above example also shows that packet loss tends to be sporadic.
What Causes Loss?
Packet loss can happen for several different reasons:
- Resource limitation: If there is not enough bandwidth on a link, packets are buffered. If the device runs out of buffers, the packets are then discarded. If a device runs low on memory, it may not have resources to move packets to and from the backplane.
- Misconfiguration: If an interface has a duplex mismatch, it may be dropping packets when collisions occur. If a routing table is misconfigured, it may create a routing loop that prevents the packet from reaching its destination. If a device has buffers misconfigured where it does not have enough to queue packets, they may be dropped.
- Security configuration: If a firewall or ACL is configured to drop the packet, then it is not permitted to reach its destination.
- Performance limitation: If an interface is configured with a rate limit, it will discard additional packets above and beyond that limit.
- Broken hardware: If a switch or router port has bad hardware, or software driver, or there is a flaw or fault in the cabling, the packet may be dropped.
Many additional reasons are possible depending on your network’s configuration and capabilities.
How Do You Detect VoIP Packet Loss?
Loss can be detected in a variety of methods:
- Packet capture tests: Software like Wireshark can detect loss by looking at the sequence numbers of the RTP packets. If there is a missing packet in the sequence, it will flag that. The drawback of using packet capture is that if the capture is performed in the middle of the conversation, and the loss occurs closer to the destination, the capture may not see or detect any problem.
- CDR records: CDR records on phone systems may show lost packets. Sometimes they show “total packets” and “received packets” and you subtract the two to determine how many packets were lost during the conversation. The drawback to this method is if the conversation was fine for 10 minutes, yet there was a problem for 10 seconds where quality suffered, the statistics might show that this call only had 0.2% overall loss, yet the user will say that the call was “terrible for the last 10 seconds before they gave up and hung up”.
- RTP-XR reports: If your phone system supports RTP-XR (RFC-3611), then the RTP-XR records will report real-time drops. The drawback here is that not all devices support RTP-XR, and it requires a lot of configuration to get phones to properly report this information. Additionally, it may require licensing to enable the feature.
- Call simulation: If you run a call simulator across a network, it should show loss between endpoints. In many cases, this is the simplest and most reliable way to detect loss, as it can be directly seen during the simulation test. If your call simulator can test to midpoints in the network, it can dramatically speed troubleshooting as you can quickly determine what part of the network is healthy versus not and narrow down the location of the loss./
Proprietary mechanisms like Cisco’s IPSLA may also be employed to detect packet loss across specifically defined segments.
How Do You Resolve VoIP Packet Loss?
Finding the specific location where packets are getting lost and fixing the problem can be hard due to the number of interfaces and devices that exist between a pair of IP phones.
A network engineer would need to determine the switch and port where the first phone is connected, and then map out the interfaces and switches that the call would traverse on its way to the remote phone.
Documentation may not be helpful here if STP (Spanning Tree Protocol) is enabled and there are different possible layer-2 paths that may be crossed as well as different routing paths for layer 3.
Once all of the interfaces and devices are mapped out, each of these elements need to be interrogated for errors to see if they dropped or buffered any packets for any reasons. Most switches and routers have a large number of error counters that are tracked, so each one should be evaluated to make sure that they are not incrementing.
In addition, if any of these interfaces are configured for QoS, the queueing should be evaluated to see if it is configured properly and the queues are showing proper utilization.
This process may take hours to accomplish on even the smallest networks due to the number of interfaces, devices, error counters, and QoS configurations that need to be evaluated TotalView's path mapper will identify every link, switch, and router used to connect two IP endpoints and present the historic health, performance, and QoS configuration of every element along that path.
Troubleshooting VoIP Packet Loss Problems
Troubleshooting where loss is coming from can be challenging due to the number of network interfaces and devices that need to be checked.
Troubleshooting loss can be a two-step process:
- Narrow the scope of the problem:You can use a single-ended Call Simulator (like the Call Simulator included with PathSolutions TotalView®) to perform test calls to midpoints in the network to determine if loss is present or not. For example, if you test to the far-end of your MPLS link and determine that there is no loss that location, yet shows significant loss when you test to the next-hop router and every device beyond that point, a simulator is quick to determine that the loss is coming from the network segment between the MPLS and the far-end router.
- Investigate the involved network devices for packet loss counters, QoS, free RAM, and CPU spikes: Each of the involved devices in this segment should be checked to see if they have any error counters, CPU spikes, utilization problems, or slow interfaces that have incorrectly configured QoS.
|Note:||If packet loss is 100% and a firewall is involved, it should be suspected as a firewall rule misconfiguration.|
Since loss is a transient problem that may occur sometimes and not other times, you may not be able to find the source of the problem if you don’t have a historical perspective of the above information. See also
See also this related blog: Diagnosing and Fixing Packet Loss in Your Network
Remember that Loss Values are Additive
A little bit of loss detected on a single router or switch may not be of much concern, but consider its affect throughout the entire network. If a little bit of loss happens at each step across the network, you may end up with some very large packet loss totals at the remote end of the network.
Additionally, if you reduce loss in the parts of the network that you control, it will help reduce the effect of loss added in parts of the network that you do not control: SIP Trunks, and VPN tunnels.
Overall, loss can be prevented if the right information is brought to bear about your network’s performance.
Preventing Packet Loss
Preventative steps can be taken to avoid creating packet loss in networks:
- Enable QoS on bandwidth constrained links: If you have network links that are 100mbps or slower, QoS should be enabled to give priority to VoIP/UC/Video packets. VoIP and UC packets are usually DSCP tagged with decimal 46 for Express Forwarding or ‘EF’. The QoS should respect these tags and always give these packets priority by sending them before any data packets are sent.
- Eliminate all half-duplex links: This problem continues to permeate networks, yet they continue to exist on many networks. Network professionals don’t know if they have any half-duplex links in the network due to a lack of visibility by their network monitoring tools. There is no reason for half-duplex links to exist in a modern network.
- Track all SNMP error counters across the network: If a switch or router drops a packet, it is typically tracked in one of many different error counters on the device. These counters would typically include FCS Errors, Alignment Errors, Frame Too Longs, MAC Receive Errors, Symbol Errors, Collisions, Carrier Sense Errors, Outbound Errors, Outbound Discards, Inbound Discards, Inbound Errors, and Unknown Protocol errors. If you know when, where, and why the packets are being lost, you can work to fix the problem before users are affected.
- Track router CPU utilization on all VoIP/UC involved routers and firewalls: If a router slows down because its CPU is spiked for a few milliseconds, that can create jitter problems downstream.