Network monitoring is a core requirement for the day-to-day operations of any ISP. It is important to have excellent performance monitoring in order to catch issues before they happen, respond to outages, and to provide historical time series data for tracking changes over time.
With Alpha|Stack, Network Monitoring comes integrated out of the box, keeping with our core philosophy that everything your business needs to run smoothly is available in one place, with information shared to all relevant parties seamlessly. Customer support representatives have access to charts, graphs, alerts, and customer statuses without having to file support requests with the network team, and the Alpha|Stack network monitoring system contains everything your core network team has come to expect with a whole lot more than what they thought was possible!
Letting support staff view statistics on devices from the same platform they’re doing all their other work saves time, shortens calls, and improves customer satisfaction, but we’re here to talk about the technical parts. How does it work?
Meet the Alpha|Stack Poller: GoPoll!
The majority of network monitoring performed today relies on two protocols — SNMP and ICMP — to collect data about network performance. While there are other options out there like TR-069 and IPFIX, SNMP and ICMP have been the default options for a very, very long time.
ICMP allows us to collect data about packet loss and latency of devices. While there are a variety of data points that can be collected using ICMP, measuring latency and calculating packet loss tend to be the main ones. The Alpha|Stack poller uses ICMP (specifically fping) to collect these metrics. This allows us to track the history of a device’s performance.
The other metric we can calculate by collecting this data is jitter. Jitter is the difference in delay between packets, and variable jitter can cause issues with real time communications — voice being the most noticeable one, but it can also be very impactful to online gamers, or other people engaging in real time applications that are heavily reliant on a consistent flow of data.
In the Alpha|Stack graph shown above, the grey portions show the jitter over time, whereas the colored line shows the median latency. As you can see, by only looking at the colored line, the connection looks very stable, but by including the range of response times, we can see there is significant variation at some points.
SNMP (Simple Network Management Protocol) has been around for a very long time, and the overwhelming majority of ISP level network devices support SNMP. From a monitoring perspective, SNMP is typically used to collect data from devices (SNMP polling) or to be alerted of events (via SNMP traps.) Alpha|Stack relies on SNMP polling to collect data about devices, and we allow Alpha|Stack users to define any type of polling they like. With SNMP polling, it’s up to the device manufacturer to expose different metrics, and some will give access to a very large array of data, whereas others will limit the information that can be collected.
Typically, a device will expose some standard parameters (for example, the throughput and error rate of physical interfaces, CPU usage, or uptime) and a variety of proprietary information that is relevant to the device in question. For example, a UPS may expose the remaining battery life, or whether or not it’s currently receiving power from the grid. Alpha|Stack allows users to both collect this data, and alert on it.
GoPoll! also collects a number of other metrics automatically to drive things like network discovery or re-discovery. Our goal when collecting data is to make as few requests as possible, as the most time intensive part of network monitoring is requesting and receiving data back from the network. This means, for example, using things like SNMPBULKWALK when we can, or performing multiple SNMPGET requests at once, rather than running them sequentially.
The vendor’s authoritative identification of the network management subsystem contained in the entity.
This value is allocated within the SMI enterprises subtree (220.127.116.11.4.1) and provides an easy and unambiguous means for determining "what kind of box" is being managed. For example, if vendor "ACME, Inc." was assigned the subtree 18.104.22.168.4.1.424242, it could assign the identifier 22.214.171.124.4.1.424242.1.1 to its "ACME-1000 Router".
Once we’ve determined the type of device, we can then collect more specific data from it to aid in providing more detailed information within Alpha|Stack.
Back into Alpha|Stack
Once Alpha|Stack receives the data from the poller, it has to process it before displaying it. The basic metrics that are being collected via SNMP and ICMP are aggregated and stored in an auto-scaling PostgreSQL DB to be presented in the user interface.