[Articulate Presenter 5]
Understand Your Web Stats
Tom Arah investigates the art and science of web stats analysis.
Web analysis is essential to help you to understand your site and to keep it growing.
There’s no question that the Web is an amazing publishing medium, but at first sight it seems to have one major drawback. Post your website and it’s immediately accessible to a massive global audience but, by default, you have no feedback on just how many readers you are really attracting - it could be hundreds of thousands or none! If you don’t do something about it, in many ways it’s like publishing into a black hole. And if you don’t know who is reading your site and how, you don’t have the necessary feedback to help make your site better and so to drive the figures higher.
Of course there are solutions. The most obvious is to add a “hit counter” to your pages to record just how many visitors have accessed them. This counting involves work on the server side so – unless, for example, you’re using FrontPage and your host supports the FrontPage extensions - you ’ll need a bit of external help from one of the many free third-party solutions. All you need to do is a quick Google search to find a provider and then add a small snippet of their code to your pages.
But hold on. Things are not as straightforward as they seem. To begin with, all these systems involve hitting an external server to retrieve graphics to represent the digits on your counter which can slow down the responsiveness of your site or, worse, result in missing images which looks totally unprofessional. Come to think of it even when they do work, most counters still look dreadful. More to the point why should your end users be interested in your site’s apparent popularity in the first place – especially as most counter systems let you cheat by setting a starting figure? On the other hand, if the figure is worryingly low there’s no doubt that many users will take this as a cue to look elsewhere. To top it all, you’ll usually find that the free service is just a taster for a paying version or, worse, that the counter is accompanied by a small ad so that, by trying to measure them, you actually end up losing visitors, page views and hits!
Despite these disadvantages, it’s so dispiriting to have absolutely no idea about your site’s usage that many authors will still decide to go down the counter route. If you do, don’t assume that all systems are the same. With www.statcounter.com, for example, you have the option of completely invisible tracking. As well as avoiding disfiguring your site with ugly counters and irrelevant ads this has other advantages. In particular it enables surprisingly advanced stats with detailed breakdowns of pageloads, unique and repeat visitors, popular pages, referring links – you can even drill-down to see how an individual user moves through your site. Best of all, because you have to visit the statcounter site to view your stats, the service is able to pay for itself through selling advertising on its own site rather than on yours. Another alternative is to join up to an advertising scheme such as GoogleAds or any number of affiliate schemes - you’re effectively gaining access to an automatic page counter as well as the chance to make some money.
Not all third-party counters are the same.
Really though there should be no need to rely on a third-party approach to monitor your stats as a whole host of information is automatically recorded by your hosting server in its log files. To an extent this data depends on the server set-up on which your site is hosted but all the main systems record the date and time (plus offset from GMT), requesting computer’s IP address ( eg 220.127.116.11), file requested, status/error code ( eg 200 for a success, 404 for a missing file) and the number of bytes involved in each transaction. In addition the majority of servers also record the address of the referring page and details of the user agent ie the end user’s operating system and browser version. And, crucially, all this information is recorded for every single request for data that your site receives.
The result is reams of data (my daily log file is around 50MB) and it would take a very special kind of person to read through it directly. Clearly what is needed is for the data to be mined for the hard and meaningful information that it contains and these days most hosting providers do just that, either emailing the results to their customers or posting them for secure web browsing. Most hosts but not all – so make sure that you put good web stats reporting near the top of your requirements list when looking for a provider.
Server log files are packed with invaluable information.
Assuming your host provider does provide stats, what are the figures to look out for and what exactly do they mean? The largest and most practically important number is the total number of bytes transferred as it’s this bandwidth requirement that actually costs money. Obviously though it’s a figure that depends more on your site’s content and in particular its mix of text, graphics and media files than it does on the amount of traffic you’re generating. Instead it’s the second highest figure, the total number of hits, that grabs most attention. It’s this number that newspapers always report when describing the popularity of a site and I have to admit that I’ve spent many happy hours boring friends with each month’s new figures.
It’s a slightly guilty pleasure however as the term “hit” is so misunderstood and misused. To the uninitiated it tends to mean either the number of visitors or pages served – an idea reinforced by the third-party “hit” counters which can’t actually measure hits at all. In fact each hit is simply a request received by your server. This means that a text-only page counts as one hit while a page containing 10 graphics (and with image table-based navigation that’s not unlikely) counts as 11. The figure isn’t only inflated and hugely variable - it’s open to fixing. If you’re wanting to boost your hit count all you need to do is add hundreds of 1-pixel transparent GIFs to your pages and your stats will soar.
On the other hand you could argue that the total number of hits is actually a hopelessly low underestimate of your site’s real usage due to the effect of caching. When your visitor clicks through to a second page on your site, for example, all those navigation GIFs aren’t downloaded again but are instead retrieved from your browser’s cache. With Proxy Servers used by larger organizations and major ISPs such as AOL the effect of caching is even more dramatic as, if another user has recently viewed a page from your site, your server won’t be troubled at all. Of course this is a good thing: caching boosts the apparent responsiveness of your site, cuts your data transfer costs and, without it, the Web would grind to a halt – some estimate that 40% of the Web’s total hits are cached in this way. However it does mean that the number of hits can only be used as a broad indication of the amount of traffic your site generates.
So are there any more accurate and more useful figures to look out for? Compared to the deceptively shifting sands of hits, the total number of page impressions gives a significantly better indication of actual traffic and usage. But again “accurate” isn’t quite the word. On the positive side, each page impression – ie hit on an HTML, ASP, PHP file etc - only counts once no matter how many graphics are involved and the effect of caching is inherently lower so there’s generally less scope for variability. However there are still complicating factors such as the use of frames and iframes that can inflate the total. More fundamentally, some sites choose to split large articles into screen-sized chunks so that the same content would generate one page impression on one site and many on another.
Rather than hits or page impressions maybe it’s better to concentrate on visitors -after all it must be easy to track the log file’s IP address information and the results certainly aren’t so easily fixed. Again though it’s not as straightforward as you might think. The main problem is the dynamic IP allocation that most ISPs employ. This means that the same IP address can represent multiple different visitors at different times. And of course even where the IP address is fixed the same individual can, and hopefully will, return repeatedly so over time each IP address will represent multiple separate visits.
So can any meaningful visitor information be extracted? The bottom line is the total number of unique IP addresses recorded in the stats each of which represents a distinct host served and at least one separate visitor. In fact even this isn’t completely accurate as a proportion will be search engine spiders rather than real people, but it’s the lowest total headline figure and the least variable and most unfixable so it’s the number potential advertisers are most interested in (it’s also the figure to ask for to take the wind out of the sails of any web bore banging on about their millions of hits). By its nature though it’s always going to be a serious underestimate of the real number of visitors and visits.
The main traffic stats cover data transfer, hits, pages and visitors.
As such another figure representing the total number of visitors/sessions is often used. This is determined by clocking the time as well as the IP address for each request and, whenever there is a gap of more than 30 minutes between requests from the same IP address, recording this as a new visit. To my mind this industry-standard half hour gap is too low as it’s such common practice to shift-click on a link to explore one browsing avenue in a new window, then to close that window to continue again where you left off. Having said that, the fact that the figure is almost certainly an overestimate helps balance the underestimating effect of proxy server caching!
In many ways, the more that you look into the main web statistics, the less clear they become – and it’s important to realize that you aren’t dealing with hard accurate figures. However, when you understand where the main figures for data transfer, hits, page impressions and visitors come from and therefore what they mean – and what they don’t mean - they do provide an invaluable insight into how your site is performing.
To really get the benefit from web stats though you need to go deeper and to analyze the log files yourself. To be able to do this you’re first going to need to get your hands on the log files themselves – preferably in ZIP format to save massively on download time - and again this is something to check that your hosting provider offers. You’re also going to need an analyzing package. There are no shortage of options here ranging from freeware (check out the admirable WebLog Expert Lite from www.weblogexpert.com which can even work with zipped logs) through to solutions such as WebTrends costing thousands of pounds. The main package I use is Mach5 Analyzer (previously FastStats Analyzer) from www.mach5.com (included on the cover CD) which these days comes in three variations: a free taster version limited to 5000 lines, the main Regular version for $99 and a Gold version for $199 which adds in a number of additional in-depth capabilities as well as the ability to export reports to HTML.
So, bearing in mind that it’s dealing with exactly the same log file data, what more information can a dedicated analyzing package provide beyond the main totals for data, hits, page impressions and visitors (and the averages derived from them)? The answer is an almost unbelievable array of new statistics that really do reveal the inner workings of your site. With Mach5 Analyzer this extra information is organized into five main sections (other analyzers tend to follow broadly similar schemes) accessed through the hierarchical Report Listing panel running down the left of the screen.
The level of detail can be amazing – here the session details per referrer.
The first section is Access Statistics where you can investigate the bandwidth, hits, page and visitor figures broken down per day, and the hits further broken down per day of the week and hour of the day, so that you can see how your site usage varies over time (in each case the figures are presented as graphs by default to help you visualize underlying trends). You can also view lists of the most requested individual pages, files, images and directories which is invaluable as it shows you what your visitors are actually interested in as opposed to your assumptions. It also lists the least requested pages, images and so on which is again useful as these are often orphaned files that need to be tidied up. Crucially, you can also track the usage of any particular page, or file, or group of pages or files, by specifying these as part of your analysis project profile before you generate the report.
The second major category is Visitor Information. Here Mach5 Analyzer uses Smart DNS (Domain Name System) lookup and caching to resolve actual domain names from the numerical IP address information in your log files so that you can see those domains that access your site most regularly. It also uses DNS lookup and its own IP address database to break down your site visitors geographically though again this should only be seen as a useful indicator rather than completely accurate (a co.uk extension is clearly British but a .com or .net could be based anywhere). More useful are the breakdowns of operating system and especially of web browsers (where this information is present in your log files) as this is invaluable when deciding whether the majority of your site visitors are ready for new functionality such as CSS-based formatting or positioning that only the more recent browsers support.
The third major new category of information is Referrer Information (again assuming your log files record such information). Here you will find lists of the most common referring domains and referring pages while the Individual Page Referrers statistics list the most linked-to pages on your site and who linked to them (the entries are live so that by clicking on them you can quickly see what other sites are saying about yours). Naturally the most common referrers will be search engines and their performance is covered separately along with lists of the most common keywords and phrases that led users through to your site.
Most mundane but perhaps most immediately useful is the fourth additional category, Technical Information, which lists any requests that are generating errors. The most likely are the common 404 errors indicating missing pages or page elements. These are usually the result of mistyped links and can be easily fixed as Mach5 Analyzer also provides a list of 404 page referrers so that you can track them down (again clicking on an entry opens the offending page into your browser).
The fifth section of statistics, the HyperLink TreeView, is unique to Mach5 Analyzer and shows the benefits of its live data approach rather than the direct output to HTML that most analyzing software takes. Essentially this lets you graphically view the aggregated traffic flow across your site, showing you which parent pages led through to the currently selected target page and which child pages they left it for, along with the percentages of total traffic in each case. Click on any parent or child page, or type a page name into the TreeView Toolbar, and this becomes the new central target page and the display and percentages update accordingly.
Mach5 Analyzer lets you track movement through your site.
Mach5 Analyzer’s HyperLink TreeView provides an amazing amount of detail about the movement of traffic through your site, but if you want even more then you need to upgrade to the Gold version. To begin with, this offers an entirely new category of statistics, Site “Stickiness” which lists which pages most users arrive at and leave from, together with in-depth analysis of stay length, the number of page views per session, the number of sessions per user and so on. In addition it offers the ability to add tags to your Hyperlink TreeView to monitor the percentage of visitors that visit a particular resource, or enter on a particular page, or from a particular referrer for each node in the view (great for tracking the success of an advertising campaign). You can take things even further with Mach5 Analyzer Gold’s excellent Scenario capability which lets you track the percentage progress of your users through specific stages of your site, say hitting your home page and then the special offers page and then the buy page.
All told the sheer scope of web stats analysis is extraordinary, offering everything from the big picture of overall site traffic right down to the fine detail of users’ movement through your site. When you realise just what information is available - and how to go about uncovering it, interpreting it and using it - it becomes clear that the ability to monitor not just who is visiting your site but how is actually another huge advantage of web publishing. In fact the biggest danger is that you can be so drowned in information that you’ll have little time left to update the site!
Hopefully you've found the information you were looking for. For further information please click here.
For free trials and special offers please click the following recommended links:
For further information on the following design applications and subjects please click on the links below:
[3D], [3ds max], [Adobe], [Acrobat], [Cinema 4D], [Corel], [CorelDRAW], [Creative Suite], [Digital Image], [Dreamweaver], [Director], [Fireworks], [Flash], [FreeHand], [FrameMaker], [FrontPage], [GoLive], [Graphic Design], [HTML/CSS], [Illustrator], [InDesign], [Macromedia], [Macromedia Studio], [Microsoft], [NetObjects Fusion], [PageMaker], [Paint Shop Pro], [Painter], [Photo Editing], [PhotoImpact], [Photoshop], [Photoshop Elements], [Publisher], [QuarkXPress], [Web Design]
To continue your search on the designer-info.com site and beyond please use the Google and Amazon search boxes below:
|designer-info.com: independent, informed, intelligent, incisive, in-depth...|
All the work on the site (over 250 reviews, over 100 articles and tutorials) has been written by me, Tom Arah It's also me who maintains the site, answers your emails etc. The site is very popular and from your feedback I know it's a useful resource - but it takes a lot to keep it up.
You can help keep the site running, independent and free by Bookmarking the site (if you don't you might never find it again), telling others about it and by coming back (new content is added every month). Even better you can make a donation eg $5 the typical cost of just one issue of a print magazine or buy anything via Amazon.com or Amazon.co.uk (now or next time you feel like shopping) using these links or the designer-info.com shop - it's a great way of quickly finding the best buys, it costs you nothing and I gain a small but much-appreciated commission.
Thanks very much, Tom Arah
[DTP/Publishing] [Vector Drawing] [Bitmap/Photo] [Web] [3D]
[Articles/Tutorials] [Reviews/Archive] [Shop] [Home/What's New]