Network monitoring provides valuable information about your servers’ performance.

To get the most out of it, you need to understand the performance metrics which it reports.

Some metrics apply to any server which provides data; others refer to special server types, such as SQL and Web servers.

Any server has maximum rates at which it can send and receive data. This is often called “bandwidth,” but a more precise term is channel capacity. (Bandwidth, strictly speaking, is an analog property.) No matter what you do, you won’t send and receive data faster than the channel capacity. It’s rare that you’ll even come close to it.

The highest data rates that you actually achieve are a more meaningful metric. Many factors will affect the practical maximum: the other devices on the network, the quality of the transmission medium, the software in use, the block size, the degree of disk fragmentation, and so on. If data rates decrease over time, it’s worth investigating the cause. If the data rate often approaches the channel capacity, the server hardware may be limiting its performance.

Latency is a separate issue; it’s the measure of how long the server takes to get a response to a requester. For some applications, such as one-way streaming, latency isn’t important. For real-time voice communication, action games, and remote desktops, keeping latency down is very important. No matter how fast the server’s data rate is, some applications just don’t work right if the latency is more than a small fraction of a second.

The count of requests per second indicates how busy the server is. In itself, it doesn’t tell you much; as long as the server is able to handle that many requests per second, it’s fine. If periods of poor performance are associated with more than a certain number of requests per second, a more powerful server may be necessary.

The performance metrics for an SQL server get more technical, and experts debate which ones are the most useful. Here are a few which are often cited:

  • Buffer cache hit ratio. This is the ratio of requested data pages which are found in memory to total requested data pages. The ideal is 100% after initial cache loading, meaning no disk reads at all are required. It’s reachable only with databases that are small enough to fit in memory, or ones where a significant fraction of the data is never used. It the ratio is too low, though, performance will suffer.
  • Lock waits per second. A server sometimes needs to lock resources to resolve concurrent requests. The best performance happens when there are no lock waits. If the number is consistently high, the database may be badly designed. Response will be slow because one request has to wait for another one to finish.
  • Memory usage by ad hoc queries. A database that gets a lot of one-time queries isn’t running efficiently. If their memory usage is high, the cause may be poorly designed software accessing the database.

Website performance is vital to many businesses, and web server metrics can help to diagnose poor response. Again, there are many possible metrics, and these are just a few:

  • Average HTTP response time. This is the average amount of time to deliver a response to an HTTP request. What’s acceptable depends on how interactive the site is. A site that’s delivering static pages should have a response time of a fraction of a second. Responses to AJAX requests within a page need to be especially fast.
  • Average page response time. The response time for individual requests may be fine, but a site with excessively complicated pages could still be too slow. If a site can’t handle the average page request in a second or less, users may consider it slow. If the time gets up to three seconds or more, users will start giving up on the page.
  • Error rate. If a site returns error codes on more than 1% of the HTTP requests it gets, that suggests a problem. There could be dead links or buggy software.

Network monitoring helps your business to measure its servers’ performance and find ways to improve it. The more you understand about the metrics it reports, the more you can get out of it.