5 Reasons Why Troubleshooting VoIP Problems with Wireshark Doesn’t Work
Let me start out by saying that Wireshark is a fantastic tool. It’s able to:
- Look at individual packets and show you everything contained inside the packet in complete detail.
- Filter and analyze individual conversations to help determine what conversations look like between two endpoints.
If you need to do either of the above, Wireshark is the tool to use. However, if you’re looking for Wireshark—or any other packet analyzer—to resolve your VoIP problem, here are five reasons why you might want to reconsider your choice:
1) Problems are confirmed, but not resolved.
Wireshark cannot tell you anything about the missing pieces of the conversation, only confirm that there were missing packets.
For example: If your capture shows packet 1, 2, 4, and 5, it does nothing to identify where or why packet 3 is missing – just that it wasn’t there as part of the conversation. That packet might have gone missing in any number of switches, routers, or links on the network that were involved in the conversation, and for any number of reasons, such as buffer drops, collisions, cable faults, misconfigured QoS drops, etc.
This does not help solve the problem, just confirms the existence of the problem.
2) Narrowing down the location of a problem is hit or miss—mostly miss.
Depending on where you sniff the packets, the conversation might be perfect. If the problem is with QoS across a WAN link later in the call path, you’ll never see it. This means you will have to capture traffic from multiple locations and compare the conversations to try to narrow down where the problem is occurring. This can be very time consuming as you must not only set up the captures but also review each of the captures to determine where the problem is introduced.
After all that work, it’s more than likely you’ll confirm a problem exists but not be able to identify the exact source or cause of the problem.
3) SPAN ports can mask the problem’s root cause.
SPAN ports have a number of limitations for a wide variety of vendors. Most of these limitations arise from the fact that the switch’s hardware is designed to forward frames to the destination via hardware ASICs as fast as possible (wire speed). Having a SPAN port also receive a copy of the traffic going in/out of a monitored port incurs a performance cost on the switch. When switch performance becomes degraded, the production traffic will continue to be forwarded, but the SPAN traffic may be discarded as it is a lower priority process. This may make you think there is packet loss, when there isn’t.
Other limitations of SPAN ports relate to the problem with bandwidth limitations on the analyzer port. If you have a monitored trunk interface that is running 1gig at 100% transmit and 100% receive, the switch will try to send 2gigs of data to the 1gig analyzer port and end up dropping 50% of the packets.
Microbursts of 1gig transmit + 1gig receive can frequently happen on low utilization links because the burstiness of most traffic tends to be high. It is important to realize that a 1gig port that hits 100% utilization for 10seconds and then is quiet for 4 minutes and 50 seconds will show as 3.33% utilization on most monitoring platforms with a 5-minute poll. This would normally not cause anyone concern but the fact that the interface hit 100% utilization for 10 seconds means that there is a significant chance of SPAN packet loss during this event.
As a result, you may see missing packets in a capture that weren’t missing from the real conversation.
4) SPAN port latency & jitter cannot be trusted.
Because SPAN ports have to copy both transmitted and received packets down the same pipe, these packets must be sequenced to be transmitted in-line. The problem is that the packets may not arrive at the SPAN port in the same time sequence that they passed through the monitor port because of the transmit/receive interleaving. This can be further exacerbated by queueing set up on the monitored interface that might change the order of the packets transmitted or received, and this reshuffle would not be presented in the same order as the analyzer port.
This may lead you to believe that there are latency and/or jitter problems when they do not exist.
5) SPAN port out-of-order cannot be trusted.
Besides SPAN reasons three and four (see above), Wireshark may see out-of-order packets in a conversation. These packets might not be out-of-order for the actual conversation but might end up being out-of-order as they flow into the SPAN port due to queueing and lower prioritization during burst loads.
This is why you might see a SYN/ACK packet before a SYN packet in a TCP session setup on a capture when it didn’t occur, leading you to believe that you have out-of-order packets when there are none.
How to Identify the Root-Cause of a VoIP Problem
To find the source and cause of a problem, you want to know where packets were missing on the network and why. The switches and routers in your network have hundreds of error counters that track when, where, and why packets were buffered or dropped. Interrogating these error counters along the path that the call took is the best way to find out what happened to the call.
For example: If you investigated all of the links, switches, and routers along a call’s path and determined that one interface connecting to a VoIP gateway was dropping 3% of its packets due to collisions and the interface was showing that it was running at 100meg half-duplex, it would be an easy fix.