It’s a given: If you’re responsible for the stability and performance of a network, no matter the size, you have deployed some kind of monitoring software. As a long time network and UC geek, I view these types of solutions as IT must-haves. But don’t fall into the trap of thinking that a monitoring solution is all you’ll need to troubleshoot issues: you’ll also need to mind the network gaps.
Monitoring solutions help you identify major failures and outages in your network as well as problems with applications and resources on servers. There are many different solutions that focus on the monitoring side of the house. Some are comprehensive—monitoring servers, network devices, links, gateways, and applications—and some focus on only one or two aspects.
So what is a network gap? It’s the most granular level of your network, the device-level which is made up of routers, switches, cables, hubs, firewalls, and the like. For example, a single half-duplex connection somewhere in your network might be causing all sorts of problems. Monitoring solutions alone can’t help find or diagnose those problems because they monitor the state of the environment. They aren’t designed to troubleshoot the cause of the problems. To troubleshoot the cause of the problems, you need to understand what is happening in the network gap.
When it comes to ensuring the stability and reliability of your network, you’ll need more than monitoring solutions to get the job done. There are a number of tools available that, depending on the issue you are troubleshooting, you can use to address network gaps. At minimum you should have tools that do the following:
- Analyze bandwidth usage
- Analyze packets
- Provide troubleshooting information
Depending on the needs of your organization, you may also want tools that:
- Analyze application performance
- Diagram and map networks
- Monitor security
The following sections take a more in-depth look at each type of tool.
Bandwidth Usage Analysis Tools
Bandwidth usage analysis tools identify who is using bandwidth on a specific network link. Typically, these tools are deployed on critical Internet or WAN links in an organization to determine how these bandwidth-limited resources are being used. There are a number of different technologies that can be deployed to solve this problem:
- Flow-based solutions. These types of solutions have routers and switches that send “flow records” with usage information back to a centralized flow collector for analysis and reporting. There are various types of flow technology, such as IPFix, NetFlow, SFlow, or JFlow.
Benefit: You can get flow records out of many different types of devices and it’s relatively easy to set up.
Drawback: Flow records may report “after the fact” usage (after the slowdown is over, you’ll see the information), so it may not be as up-to-date as desired.
- Packet analyzer solutions. These tools copy all traffic sent to an analyzer port and then analyze traffic to determine how a link is being used. Usually you need to set up specialized hardware and configure a SPAN port on the switch so it may or may not be appropriate for all environments.
Benefit: The analyzer receives a “live feed” of everything that is going over the link, so it’s pretty much able to determine what is happening in almost real-time.
Drawback: The specialized port configuration and localized hardware required to do the analysis.
- In-line hardware solutions. This is essentially a hardware box that provides usage information. There are currently two primary types of inline tools. One type is associated with firewalls and provides advanced reporting on usage statistics. The other type is network compression edge devices that include reporting.
Packet Analysis Tools
Packet analyzer tools look at the individual packets that traverse a network interface and are typically used to determine if there are problems with individual packets. To use a packet analyzer, you need to set up a SPAN port and then capture the packets from the analyzed port with software, like Wireshark. This can help identify out-of-order packets, packets with incorrect QoS tagging, retransmits (indicating packet loss elsewhere in the network), and all sorts of other problems.
Benefits: It’s excellent for locating problems at Open System Interconnection (OSI) layers four through seven, transport to application. A packet analyzer is also useful for locating differentiated services code point (DSCP) tagging problems and VoIP codec issues. It can tell you about conversations that exist between agents and what ports are used.
Drawbacks: The problem is that packet analyzers can only see valid, healthy packets. All bad packets get discarded by the switch or router before they make it to the packet analyzer, which means you only see part of the picture when assessing your network’s conditions. It can’t identify where or why packets went missing – it can only confirm that there are missing packets. As a result, it’s poor at locating problems at layers one through three of the network, or the physical to network layers.
Network Troubleshooting Tools
Troubleshooting problems relating to packet loss or misconfigurations in the network is something that can be both difficult and tedious depending on the size and scope of the network. Manually attempting to find and resolve problems can end up being too labor intensive as networks grow both in size and complexity.
You could use a CLI (Command Line Interface) tool to check or change an individual device’s configuration or view basic statistics related to its health and performance. However, if you don’t have a basic understanding of how your network devices typically perform, then there is nothing of which to compare the information in the CLI. What’s more, this process takes a great deal of time as it may require logging into 10 or more devices and trying to find the problem on each one.
My advice is to use the CLI as it’s designed to be, which is as a helpful tool for performing device configuration. Automation should be brought in to cover the network-level and prevent teams from spending weeks trying to track down a single bad cable or VLAN tag misconfiguration in the infrastructure. Solutions in this area include PathSolutions TotalView, Statseeker, and InfoBlox NetMRI.
Application Performance Measurement (APM) Tools
APMs measure the responsiveness of applications across the infrastructure to ensure that users continuously experience strong functionality. It does this by recreating and monitoring the user experience, making requests and then reporting how fast responses come back from remote resources. This type of solution should be used by management to determine the effectiveness of their team’s ability to deliver on an SLA. However, APMs have limited troubleshooting capability because they cannot locate where or why an application’s performance is suffering, only confirm that it is having problems.
Benefits: Just like a packet analyzer, this type of tool is great for reporting problems. It will allow you to collect data to determine if applications are responding as expected on the network.
Drawbacks: Unfortunately, an APM will do little to help you determine where or why a problem is occurring. To find the cause, you will need to dig deeper into your network, applications, firewall and any other number of locations where errors can be lurking. This can be incredibly time-consuming, as it can take days to get to the root cause of an issue while your applications continue to suffer and productivity slows to a crawl.
Diagramming and Mapping Tools
A picture is worth a thousand words, and the same is true for networks. These tools provide accurate updated maps of the entire organization’s network (not just partial point-to-point diagrams) which allow for complete understanding of an environment. Some recommended solutions for this include Cacoo, NetBrain, and LucidChart.
Security Monitoring Tools
Security monitoring tools are focused on one of two primary functions: Intrusion Detection or Intrusion Prevention
- Intrusion Detection System (IDS) tools. These tools monitor the environment looking for misbehaving communications between devices and then alert administrators of the aberrant behavior.
- Intrusion Prevention System (IPS) tools. These tools monitor the environment for misbehaving communications and terminate the communications, thus preventing the activity from continuing.
Mind Network Gaps to Maximize Capabilities
If you are dealing with a smaller network infrastructure, you may be able to get away with a basic monitoring solution. Since your network is less complex, it follows that the time and resources you spend on troubleshooting is minimal. However, as your network grows in breadth and complexity, you will need to move from monitoring the state of your environment to troubleshooting the cause of problems. More tools are required to provide visibility into bandwidth issues, perform deep packet analysis, and troubleshoot the cause of problems. The more complex the network, the more important it is to mind the network gaps in order to maximize your team’s capabilities.