Recently in Monitoring Category

Sitemorse is all about building your Web Confidence - and an important part of the latest version of our software is around web security and giving you the confidence to know you're not unwittingly linking to dangerous sites or 'malware'.

There were red faces in high places recently when a government-backed website designed to champion UK's start-up businesses inadvertently linked users to malware, according to security firm Sophos and the BBC.

Soaring malware loads and social networking scams were described as 'concerning' by Cisco in the company's latest Global Threat. Cisco collected more than 105,000 unique malware samples in March, showing a sharp rise over the previous quarter and a 46 per cent rise since January.

At Sitemorse, our job is to make you aware of dependencies, compliance issues and risks. While 'phishing 'and malware links are not always the work of a malicious agent (we found at least one client recently linking to a 'suspect' site inadvertently by mistyping a web address), these 'dodgy' sites try hard to mimic their genuine counterpart in the appearance of the URL.

Here's areas where you might need our help:

• When you embed code to third party websites in your pages (for example, links via advertising banners), you are relying on the third party being responsible for their own security. Suppose the target site that the banner link takes you to were to be replaced with a link to a malicious piece of software, seemingly carrying your seal of approval?

• Not all website owners have the time and resources to moderate user comments to blog articles or in community areas. Should a link to malware or a phishing website creep in, our tools can immediately alert you to the problem.

Our software guards you in the following ways:

• If we find links to a page that are suspected to lead to a phishing or malware address, or links to a known staging server, we will publish the details on the 'Site Links Inventory and Review' page inside your Version 7 Sitemorse report.

• This is a comprehensive page grouping links by their hostname (the web address), as well as identifying new links that have not been encountered before, links that were listed in the previous report and finally, links we found in the previous report that we did not find this time.

• For links suspected of being phishing and malware we provide the key information you need - including the line in the code that each link appeared and a link to Snapshot to view the page as it was when we tested it.

Using this information you can quickly see which external sites you are dependent upon. To make it even easier we list the links to external sites in three categories: those that are links to sub domains of the URL we ran the test on, those that use a name or URL that is similar to that of your site, and finally, all other external links outside of your domain or organisation.

How does the Sitemorse Version 7 software do it?

• As we scan your site, we run each and every off-site link against the Google Safe Browsing database. This list is used by Firefox and Google Chrome to protect millions of users every day from suspected phishing and malware pages. Our own cache is continually updated from Google so that it always contains the very latest known phishing and malware sites.

• To detect links to staging servers we compare each off-site link with those on a list of known staging and development servers; this list is also kept up-to-date and personalised - we will ask for a list of your internal server URLs during the implementation of your service.

Embarrassment all round as a government-backed website designed to champion the UK's start-up businesses has inadvertently linked users to malware, according to security firm Sophos and the BBC.

They should have talked to us first, as the new version of Sitemorse software - Sitemorse V7 - protects our clients against just these kinds of issues, scanning links for malware, virus links and 'phishing', as well as making sure links and domains are secure against would-be saboteurs.

StartUp Britain, launched on Monday, apparently linked to a page hosting fake anti-virus programs.

A BBC online news story cited two other recent examples of this type of problem:

  • Last month, the London Stock Exchange hosted booby-trapped adverts that asked visitors to download similar fake security software. 
  • And this week, music streaming service Spotify apologised after 'malverts' were served to some of its users. 

So-called 'malvertising' is becoming a growing problem for businesses, but smart users can help guard against it.

Links:

Sitemorse website

BBC article: Government-backed website in malware scare

 

We are always coming across comments from prospective customers about the vast capabilities of CMS products,  Mostly it's about how they don't need to use Sitemorse because their CMS produces error free, accessible and standards compliant pages.  Most of us know that that isn't the case and that however good a CMS is there will still be problems to resolve that Sitemorse is ideally suited to help with.

Recently I've seen performance mentioned as something that certain CMS vendors are claiming they can help monitor.  Some CMS vendors (for example, PaperThin and Sitecore) provide built-in reporting capability for determining time-to-render for various content elements.  Now that seems very strange to me.  And the biggest problem I have with the claims is that render times is a small part of the performance of a Webpage and something that is running within your infrastructure can't get a "REAL" picture of what performance an end user of your site is experiencing.  You need to be out there on the internet to experience that.  So for me they could, at best, only tell you a small part of the story - one which if you take a look at the performance of your website from your office PC you'd get the same view.

The reality is that your website visitors don't live in your Data-centre and so aren't sitting with a high speed LAN connection to your Webservers.  They use a variety of technologies to connect to your website all of which can have an influence on THEIR experience of the performance of your website.

Website performance.png

Diagram courtesy of Gomez Inc. 

As you can see the further away from your Data-centre (where your CMS is located) the more, potentially dodgy, technologies are used - like mobile devices or dial-up connections -  more issues start to arise with regard to both performance and availability.  So, just as a CMS can't offer you the whole solution to your quality and compliance issues it can't offer you the whole story when it comes to measuring your website's performance.

I spent a day at the eCommerce exhibition and conference last week. One of the companies exhibiting was a leading provider of Website monitoring services with a world wide presence and monitoring points located all around the world.  I decided to have a long discussion with them about some of the doubts I have about the benefits of such a service and was handed over to the technical guru manning the stand.

Now, in the past, I've worked for companies that have focused on Website performance on various levels including companies that resold similar products. I was never really convinced of the benefits of monitoring from multiple points around the globe and even less convinced of the benefits of monitoring from different networks within a country and was very sceptical of the benefits of monitoring from different networks in countries other than where you are hosted.

Why ?

Well for me information is king. And if you're an organisation that provides information to a customer it is imperative that the information that you give is useful and timely. So, in the context of all this remote monitoring from all these different points, how does the data stand up to my tests of useful and timely ? (And let's always keep in mind that most organisations only deal with clients within their own country or those countries which border them)

In my discussions on their stand I created a few scenarios, firstly:- You're site is hosted in the UK and you have a monitoring service that monitors from locations around the world. The data provided tells you that the performance of the requests from their Point of Presence (PoP) on the SingTel network in Singapore is 30% slower than other PoPs in the region.  What are you going to do with this piece of information ?

Their answer to this was that you could decide whether the number of clients you had in Singapore warranted having a mirrored site located in or around Singapore or using Edge Services from someone like Akamai.

Well that's a fair answer. But once I've found out that information and acted on it do I need to continue monitoring from Singapore every 15 minutes ?  Probably not.  And indeed when I know what my typical response times are from these remote locations why do I need to monitor at anything more than a couple of times a day ?  Or maybe just test it for a couple of hours every month ? Or why bother at all ?.

So I ventured a scenario that is relevant to the vast majority of companies in the UK.  If they monitor from PoPs on six different UK networks and you're hosted on the PSI network and you are told that the PoP on the BT network reports performance 30% less than the others - what should you do.

Now this is where the answers slipped into the Pythonesque. The suggestion was that you could contact your clients and suggest they switch to a different network !  They did agree that this was an unlikely outcome but pointed out that at least you were in a position to show you were aware of the issue and proffer a solution.  Rolling this back to the Singapore situation - if the response time from the PoP on the alternate network in Singapore performs as well as the other PoPs in the region would hosting your site locally benefit local users as much as you thought ?

So how did they stand up to my test ?  Well some of the information provided is of some limited use but it failed the timely criteria as I don't need it repeated incessantly.  And the UK information was totally irrelevant as there is nothing I can do about it.  So it's neither relevant nor timely.

What most organisations are looking for is to know how their key pages perform and if the site is down.  For most organisations in the UK you can do this quite adequately from a single PoP on a single network. 

I then asked about the detailed level of information where for every item on the page you get timings for every aspect of the communication between the browser and the Website. So you get DNS lookup time, connection time, time to first byte, content download time etc.  It's kind of useful. But DNS lookup time is mostly going to be very small and certainly less than 100 ms. Likewise connection time. In fact when he was showing me the system they don't bother to show any figures less than 10ms, and looking at one of the example sites they were monitoring, a large percentage of the times fell into this category leaving only time to first byte and content download time of any significance. So why bother with all the other data ? Why not just present it under exceptional circumstances where the times are significantly higher than the norm ? No sensible answer was forthcoming. But perhaps I just gave him their next big idea for a new feature !

The things you can actually have an influence on is your time to first byte (by making sure your servers are performing well under load) and content download time (by making sure you have adequate bandwidth to cope with your peak levels of demand). I'd suggest that any other info was surplus to requirements.

We often talk to customers and prospects about 3rd part code or content on their website.  Whether it's like the Sitemeter product mentioned below or it's a 3rd party company hosting some of their content or simply a link to a 3rd party site when things go wrong it can go disastrously wrong.

Imagine waking up and your site is inaccessible for no apparent reason. If this happens, site owners could spend a ridiculous amount of time trying to figure out what the problem is. Well welcome to that reality. Thousands of site owners have experienced this on August 2nd after Sitemeter brings their sites to a halt.

Some very popular websites learned the hard way that placing third-party code in your website's pages is a liability.

SiteMeter, a widely-used "counter and statistics tracker," made some changes to its "back-end system" that made it impossible for visitors using certain versions of Internet Explorer to load pages containing the SiteMeter code. When users would visit any sites using Sitemeter, they would be presented with an error message pop-up:
 

Internet Explorer cannot open the Internet site http://www.sitename.com

 

Operation aborted

SiteMeter has since resolved the issue and in a blog post (http://weblog.sitemeter.com/2008/08/02/sitemeter-ie-issues-resolved/ - now off-line), explained the situation and apologised. That probably, however, didn't do much to quell the anger of webmasters using SiteMeter.

A simple "back-end" change made by a third-party whose JavaScript code you use on your pages, for instance, can render your website unusable to a large percentage of your visitors, as it did for SiteMeter users.

The SiteMeter problem highlights the fact that when you place third-party code in your website's pages, you are potentially making your website's operation wholly-dependent on a third-party.  Likewise if you use a 3rd party to host some of your content as ITV did for their CEO's pod-cast on their financial year end reports.  Sadly the 3rd party relocated the content without telling ITV resulting in a IE "404 Not found" message - not a great impression to give your potential investors or shareholders.

While some focus on the way companies like SiteMeter respond to incidents like this, pragmatic webmasters will always take control of their own fate and make decisions that reduce their vulnerability.  Those not as tech savvy as some of us were probably hit the hardest as they searched for a solution to a problem that they couldn't readily identify.

Unfortunately, this is a difficult task when it comes to third-party code because it's hard not to use third-party code on your website today. For instance, many of our clients use at least one service like SiteMeter or Google Analytics.

But you can take steps to mitigate your risk of falling victim to a SiteMeter-like problem.

Here are some tips:

  • Don't use any third-party code that absolutely isn't needed. While some types of services (like analytics) are often too expensive to bring in-house, avoid using third-party services that you don't need when they require you to put code in your website's pages. I see a lot of websites that are making use of cutsie "widgets" that are superfluous. That's risky and you need to recognise the risks.
     
  • Compare different services. If you need some solution that requires you to place third-party code in your website's pages, compare the services that provide that solution. Have any experienced major problems in the past? What are the policies of each? Don't hesitate to contact the services and ask them how they maintain the integrity of their code and what testing they do before rolling out new code.
     
  • Consider free versus paid. While SiteMeter does have a premium service in addition to its free service, a lot of webmasters prefer to use free services for obvious reasons.

    Yet a worthwhile piece of legal advice is this - when you have paid a third-party to provide a service, you have a better ability to recover damages in the case that the third-party fails to deliver or harms you because of some negligent act. I'm not litigious but if your business depends on your website, you don't want to be dependent upon third-parties that have no real legal obligations to you.
     
  • Monitor your website. While automatically catching errors generated by buggy JavaScripts can be difficult, you should be monitoring your website as it will pick up most problems that your users see and will allow you to be proactive in resolving issues rather than realising there's a problem when your revenue income falls off the scale.
After some recent questions about how best to configure an alert recipient we added the following article to our Knowledge Base: http://www.sitemorse.com/kb.html?q=1320212048

I've discussed at length the different types of monitoring and why you might consider using them  http://blog.sitemorse.com/2008/06/why-monitor-websites.html and in my posting on monitoring in today's eCommerce market http://blog.sitemorse.com/2008/06/website-monitoring-in-todays-h.html I discussed our new Heartbeat Monitoring service.

i am pleased to say that Heartbeat is now well into the Beta Testing phase and doing well.

Heartbeat is designed to monitor websites for availability.  In today's highly competitive and therefore high pressure market website managers need to know as soon as their websites are down or performing so badly hat they appear to be down.  Existing Monitors typically run at frequencies of several minutes leaving a window when the site is down but the monitor hasn't picked it up yet.

So we need to rethink the way we monitor Websites.  Firstly, we need to monitor availability so that we are informed pretty much as soon as the website fails which means monitoring at much higher frequency levels than, say, every 5 minutes.  The problem with just increasing the frequency of traditional monitors is the additional load the monitor puts on the website and the amount of analysis and recording that a traditional monitor does would put too high a load on the servers running the monitors.

When we looked at this we decided that the only approach was to create an availability monitor whose sole purpose was to check that a website was able to serve up content.  It wouldn't check the performance.  It wouldn't check that all the images appeared OK,  It wouldn't check that links on the page worked.  And it wouldn't record loads of statistics.  It would simply make sure that the website responded to a GET request for the Base Page by serving it up.  That way the monitor could run at very high frequency levels of, say, every 15 seconds without putting an undue load on the website or on the monitor servers.  It's difficult to increase the frequency levels beyond this as once you issue a GET request you need to wait a reasonable amount of time for the Website to respond before deciding it's down otherwise you risk generating false positives when it's just that the website has slowed down rather than gone down.  It's a fine balance between alerting too quickly and waiting too long before deciding it's down.

When Heartbeat hits a problem it immediately runs another test to ensure that the problem is real and then alerts all those people that are defined to the system as Alert Recipients for a "Site Down" situation via their defined means - either SMS or email.

Match this with a monitor running against the key landing pages on your sites and runs, say, every 15 minutes that checks performance, functionality, accessibility, code quality, spelling, metadata, PDFs etc etc and you have a monitoring system that addresses today's needs in a fully integrated and familiar system.

We'll be announcing Heartbeat shortly so keep your eyes peeled for announcement emails, Newsletter items and probably a call from your Account Manager.

Monitoring systems have been around for as long as there's been on-line systems.

In my view they serve two purposes:-

  • to alert you, as soon as possible, if the monitored application fails
  • to identify if response times are slowing to an unacceptable level

Traditionally this has been done by one and the same monitor.  The monitor would record lots of information about the profile of the performance it measured so you could analyse the data and generate trend graphs to see if there was a general increase average response times.  In essence it helped, a little, to predict performance and helped, a little, in capacity planning. Typically these monitors would run at frequencies of every 5 or 10 minutes.  Which when monitoring internal applications was fine.  The main role they played was identifying the performance degradation as if the system actually went down you'd pretty soon get an irate phone call.  What the monitor was meant to do was to identify a situation where performance was degrading so you could investigate the problem as it was happening.  If the problem was that the machine was maxed-out there wasn't much you could do about it but it gave you the chance of trying to identify what was using all the resources in case there was a problem with the code or database schema that could be improved.

We now work in a different environment these days.  Even internal systems have a much larger customer-facing role and therefore performance and availability are much more important than they were.  Everyone must have experienced the frustration of dealing with a call centre where performance has been slow and there are long periods of embarrassed silence while they wait for the next screen to pop up with the info you want or they ask you if they can call you back as their system is down at the moment.  Both these circumstances potentially loose you to the competition.

With the Web you just don't get a second chance.  85% of respondents said they would have reservations about buying from a business with a poor quality Website
30% of respondents refuse to give even their favourite Websites a second chance if they hit problems
72% of respondents said they are unlikely to buy from a Website with poor performance

So we need to rethink the way we monitor Websites.  Firstly, we need to monitor availability so that we are informed pretty much as soon as the website fails which means monitoring at much higher frequency levels than every 5 minutes.  The problem with just increasing the frequency of traditional monitors is the additional load the monitor puts on the website and the amount of analysis and recording that a traditional monitor does would put too high a load on the servers running the monitors.

When we looked at this we decided that the only approach was to create an availability monitor whose sole purpose was to check that a website was able to serve up content.  It wouldn't check the performance.  It wouldn't check that all the images appeared OK, It wouldn't check and links on the page worked. It wouldn't record loads of statistics.  It would simply make sure that the website responded to a GET request for the Base Page by serving it up.  That way it could run at very high frequency levels of, say, every 15 seconds without putting an undue load on the website or on the monitor servers.  It's difficult to increase the frequency levels beyond this as once you issue a GET request you need to wait a reasonable amount of time for the Website to respond before deciding it's down otherwise you risk generating false positives when it's just that the website has slowed down rather than gone down.  It's a fine balance between alerting too quickly and waiting too long before deciding it's down.

Match this with a monitor running against your key landing pages on the site and runs, say, every 15 minutes that checks performance, functionality, accessibility, code quality, spelling, metadata, PDFs etc etc and we have a monitoring system that addresses today's needs in a fully integrated and familiar system.

Heartbeat Monitor is in the final stages of testing and should go into Beta testing on the first sites this week.

There seems to be a growing confusion about how best to monitor Websites.

There are three main categories of monitors out there:

  1. Pure availability monitors
  2. Availability and Performance monitors
  3. Availability, Performance and Quality Assurance monitors

Simply put -

type 1 will do little more than Ping your Web server to make sure it's up and running
type 2 will issue GET requests and make sure your website can server up content and that the performance is acceptable
type 3 will do what the type 2 monitors do but will also check the quality of the web page to make sure it Functions, is Standards Compliant and meets Accessibility Standards.

Most of the main players in the market offer type 2 - Availability and Performance monitoring.  They'll check a given URL to make sure it responds to requests and that the responses perform acceptably.  (definition of "acceptable" performance can usually be defined by the customer as a threshold and alerts are sent if it is exceeded).  They will also make a lot of noise about how they can test the designated URL from multiple locations over your choice of geographic locations.

The level of detail provided goes all the way down to the DNS lookup time for a GET request for an image on the page.  So I can check the performance of individual  GET requests from multiple locations around the world and then analyse the data and identify any "slow" spots.

I've been working in the Website performance arena since 1996.  I've worked with companies that have resold these services.  I have to tell you that I'm just not convinced by the arguments.

Let's take my example of the level of detail that's available.  I can measure the DNS lookup of individual items on one of my web pages from multiple locations around the world.

The DNS lookup is done by the first router involved in sending the GET request.  It takes longer if this is the first request for the URL in a while as it has it wont have it cached.  So when you issue the first GET request for the URL it may take, say, 2 seconds to do the lookup.  That's quite an interesting piece of information but as the monitor service is going to issue the same request 5-10 mins later the DNS lookup will probably be much quicker the next time as it'll be cached.  (of course it may be a different router this time so it may be 2 seconds again, however pretty soon all the routers nearest the server issuing the GET requests will soon "recognise" the URL and instantly resolve it to the IP address)  So this information very quickly becomes meaningless.

The route the data being sent back from your Web server in London to the Monitoring Server located in, say, Los Angeles will not take the same route each time.  Some of these routes may have different latencies or may have a greater error rate on them causing a greater proportion of data packets to be re-sent thus making some responses longer than others.  Again quite an interesting piece of information but what on earth are you going to do about it ?

In fact I'd argue that testing form multiple parts of the world is full of interesting information but is telling you about things that you have minimal control over.  You can't ring up your ISP and ask them to make sure that data you send out to a request from Ulan Bator bypasses Uzbekistan's routers because they are a bit slower than the ones in Kazakhstan.  (please don't assume that I'm sad enough to have checked these suggested routings !)

Even if you bring this closer to home, what are you going to do if you find that responses are marginally slower from a monitor making calls from a point of presence on the BT network compared to one on the Cable and Wireless network when you're hosted on L3.

Analysing things down to this level of granularity means that it takes a while to do each monitor and you gather a relatively larger amount of data each time.  So most of the vendors can only offer a monitoring frequency of every 5 minutes.  That number hasn't changed in a decade yet the importance of your website has increased by many orders of magnitude.  This level of delay before being made aware of a problem is becoming less acceptable.

The information you get from a website monitor should be about those things that YOU can influence and in a time frame that allows you to maximise your availability.

You need to focus your view on what you should be monitoring for - what's actually important to your business.

It's all very well having a web page with startlingly fast response and download times but if a partially sighted person can't read it, or it doesn't render in Firefox version 3 or the link to an important page doesn't work - well you might as well have not bothered. 

Sitemorse's current monitoring service offers type 3 coverage.  So it'll give you a view of your availability and performance (alerting you by email or SMS if there's a problem) but will also run all the Sitemorse tests against the page and allow you to set thresholds against them and generate alerts.  That way you'll know that the web pages are available, performing well, are functioning, are abiding by coding standards, are accessible, meet your metadata requirements, are free from spelling mistakes etc. etc.

Watch out for announcements over the coming weeks about new monitoring options from Sitemorse.  But just to let you know - if you want to measure your performance from Ulan Bator I'm afraid you'll  be disappointed by our announcements !

Recent Entries

Can your CMS keep you informed about website performance ?
We are always coming across comments from prospective customers about the vast capabilities of CMS products,  Mostly it's about how they…
Guard against 'Phishing' and dangerous links with Sitemorse
Sitemorse is all about building your Web Confidence - and an important part of the latest version of our software…
Heartbeat Monitor is in Beta Testing
I've discussed at length the different types of monitoring and why you might consider using them  http://blog.sitemorse.com/2008/06/why-monitor-websites.html and in my posting on…
How much detail do you need from a monitoring system ?
I spent a day at the eCommerce exhibition and conference last week. One of the companies exhibiting was a leading…
Oops - government-backed site inadvertently linked to malware
Embarrassment all round as a government-backed website designed to champion the UK's start-up businesses has inadvertently linked users to malware,…