The history of technology is replete with stories of continuous improvements that aim to make the technology seamless with regular life.
The Model T car had a hand starter and required a lot of fiddling with the choke and throttle to get it to work. You then had to understand that the transmission had two gears, and an additional set of gears on the transaxle to properly shift and get moving at a decent clip. In many cases, the driver had to be a mechanic to get everything working correctly.
In the near future, you will be able to summon a car to take you autonomously to your destination, no engineering understanding or work required. The technology that makes this happen will be completely invisible, and vehicles no longer require a mechanic on board.
The Sad State of Network Performance Monitoring
Why do we lack automation in the world of network performance monitoring and management?
Many existing network monitoring systems require a database to be licensed and configured, then a separate server configured to operate as a collection engine. Sometimes, a front-end webserver needs to be configured, equating to three or more VMs to complete the solution. Each of these virtual machines may require many multiple CPUs and gobs of memory to prevent them from slowing down. This adds an invisible cost to the deployment that must be borne by the business.
At this point, a seasoned network engineer must start configuring the solution. This is not an automated process, as devices and OIDs must be manually configured to be monitored. On even medium-sized networks, this configuration effort may be enormous and require professional services.
At the end of the configuration effort, there may still be key elements that are not monitored or tracked.
The next problem is that networks don't stay static, they change—sometimes on a daily basis. Most monitoring platforms just can't keep up with dynamic changes in the environment, and require many man-hours of configuration effort to keep current with reality. This part is often overlooked, and the value of the solution tends to diminish as it loses visibility into new and key parts of the network. Additionally, when the engineer goes to troubleshoot the problem, it requires specific expertise to determine what it means when you see FCS errors and symbol errors on an interface. Sometimes, the engineer must resort to many hours or days of research to learn what that combination of errors means, and to achieve resolution.
The Wishlist for Network Visibility Automation
If you are responsible for the network and your network devices know of a problem, yet you don't, you have recipe for professional embarrassment.
If a switch is dropping 12% of its packets due to FCS errors and symbol errors, yet the engineers are unaware of this due to their monitoring software not collecting or analyzing this information, then you have to wait for a user to complain before investigation starts. At this point, you're searching for a needle in a haystack.
A more modern solution would have these features:
- Automatically query devices for all of their performance and health metrics in a sensible fashion - don't flood the device with a ton of queries, but do a continuous slow collection.
- Include a heuristics analysis engine that assembles the pieces together to arrive at plain-English answers to detected problems.
- Eliminate the tool's care and feeding:
- Make the database and supporting libraries fully integrated/compiled into the solution (no separate "we need to upgrade the database, but that breaks the collector, so we need to upgrade each collector in sequence")
- Remove dependencies on specific versions of Java or DOT.NET libraries
- Be incredibly scalable:
- Additional servers do not need to be deployed
- Reduce the operational cost by reducing the number of CPU cores & RAM needed
You'd want a solution that tells you exactly what is broken in your environment so you can easily find and fix problems, as well as proactively fix problems before users are even aware of them.
When we decided to build TotalView, this was the exact same wishlist that we came up with. Rather than building a solution assuming that "engineers can set up monitoring of what they want", we decided to build a solution that automatically monitors and communicates "what does the network know that the supporting engineers should know".
This means that within minutes of deploying TotalView, you can learn more about your network's operation at a deeper and broader level than with any other solution.
Network troubleshooting problems can be prevented if the right information is brought to bear about your network's performance and configuration.