Monitoring systems have been around for as long as there's been on-line systems.
In my view they serve two purposes:-
- to alert you, as soon as possible, if the monitored application fails
- to identify if response times are slowing to an unacceptable level
Traditionally this has been done by one and the same monitor. The monitor would record lots of information about the profile of the performance it measured so you could analyse the data and generate trend graphs to see if there was a general increase average response times. In essence it helped, a little, to predict performance and helped, a little, in capacity planning. Typically these monitors would run at frequencies of every 5 or 10 minutes. Which when monitoring internal applications was fine. The main role they played was identifying the performance degradation as if the system actually went down you'd pretty soon get an irate phone call. What the monitor was meant to do was to identify a situation where performance was degrading so you could investigate the problem as it was happening. If the problem was that the machine was maxed-out there wasn't much you could do about it but it gave you the chance of trying to identify what was using all the resources in case there was a problem with the code or database schema that could be improved.
We now work in a different environment these days. Even internal systems have a much larger customer-facing role and therefore performance and availability are much more important than they were. Everyone must have experienced the frustration of dealing with a call centre where performance has been slow and there are long periods of embarrassed silence while they wait for the next screen to pop up with the info you want or they ask you if they can call you back as their system is down at the moment. Both these circumstances potentially loose you to the competition.
With the Web you just don't get a second chance. 85% of respondents said they would have reservations about buying from a business with a poor quality Website
30% of respondents refuse to give even their favourite Websites a second chance if they hit problems
72% of respondents said they are unlikely to buy from a Website with poor performance
So we need to rethink the way we monitor Websites. Firstly, we need to monitor availability so that we are informed pretty much as soon as the website fails which means monitoring at much higher frequency levels than every 5 minutes. The problem with just increasing the frequency of traditional monitors is the additional load the monitor puts on the website and the amount of analysis and recording that a traditional monitor does would put too high a load on the servers running the monitors.
When we looked at this we decided that the only approach was to create an availability monitor whose sole purpose was to check that a website was able to serve up content. It wouldn't check the performance. It wouldn't check that all the images appeared OK, It wouldn't check and links on the page worked. It wouldn't record loads of statistics. It would simply make sure that the website responded to a GET request for the Base Page by serving it up. That way it could run at very high frequency levels of, say, every 15 seconds without putting an undue load on the website or on the monitor servers. It's difficult to increase the frequency levels beyond this as once you issue a GET request you need to wait a reasonable amount of time for the Website to respond before deciding it's down otherwise you risk generating false positives when it's just that the website has slowed down rather than gone down. It's a fine balance between alerting too quickly and waiting too long before deciding it's down.
Match this with a monitor running against your key landing pages on the site and runs, say, every 15 minutes that checks performance, functionality, accessibility, code quality, spelling, metadata, PDFs etc etc and we have a monitoring system that addresses today's needs in a fully integrated and familiar system.
Heartbeat Monitor is in the final stages of testing and should go into Beta testing on the first sites this week.
Leave a comment