Summary

Web Analytics Tutorial

 

Appendix B – Technical Details of Metrics Accuracy

IN THIS APPENDIX
* Limitations of Metrics Accuracy
   Visit Detection
* Proxies, Caches and Firewalls
   Proxies
   Caches
   Controlling Caching
   Firewalls
   Proxy Sharing
* Validity of Agent Data
   User Agents
   Referrers
   Hosts
   Validity of Reports
* Visit Time Issues
   View Time
   Visit Duration
* Advanced Solutions
   Cookies
   Session Keys in URLs
   Client-side applets

Limitations of Metrics Accuracy

Throughout the lessons we have qualified our statements about the validity of metrics, often with references to caches and proxies. At several times, we have stated that the numbers given do not necessarily correlate to any real-world values. Because of the way the web was designed and the systems installed to make it more efficient, the values that we track and metrics that we report on may not correlate to real world quantities that you want to know. When reading web analytics reports, you should understand the definitions of the metrics and the limitations of them in regards to any expected quantities that you may want.

For example, while we can count hits or requests for advertisements on the server, this may not be the same as the number of impressions the advertisement had. The reason we cannot count the exact number of impressions is because when a visitor browses your site your server may never get a request for the content she views. HTTP protocol (which is how web browsers and web servers communicate) involves distinct requests for individual items from a server. This is why a “request” for the home page of your site, for example, can result in multiple requests (or hits) to the server (one for the page and additional requests for each graphic or other item referenced by it).

In order to improve performance, all modern browsers implement a technique called caching. The images that you get when you go to your home page are often the same ones you get on other pages of the site. The browser detects this and rather than asking for the image across the (possibly slow) Internet connection, it grabs a copy that it saved on your computer. This makes it much faster to load subsequent pages from the website. In fact, most users depend on this feature to make web surfing manageable and enjoyable. Major Internet Service Providers, to improve performance even more, add caching on their service. The affects requests from all their users, across session, as we will discuss shortly.

However, this means that your web server only sees the request for the graphic the first time it is used. A typical visit from a user may span four or five pages, and load ten graphics (rather than the forty or fifty that would be loaded without caching). Remembering our discussion of units in Lesson 1, you may think “OK, but Page Hits are what’s important for visitor analysis – graphics hits only count for bandwidth, which will be accurate.” The sticky part comes when a visitor navigates to a page that has already been viewed (by hitting the back button, for example). The browser loads the version of the page that is cached and you never get any indication that he viewed it again. If you are looking at counting advertising impressions or want to know how many home page views you had, you are not getting a complete count.

Visit Detection

As we mentioned in Lesson 1, there is no absolute way of knowing what requests comprise a single visit to your site. Each request made by a browser is an independent transaction. In order to track visits, Summary uses an advanced analytics algorithm that estimates which requests each visitor made. To define a visit, Summary looks for all requests from a given host (or cluster, where appropriate) that are bounded by 30 minutes of inactivity. This means that if a visitor browses your site, then leaves his computer for a half hour and comes back and browses again, it will be counted as two visits. On the other hand, if the user leaves his computer for 25 minutes, it will be counted as a single visit. You can set the number of minutes of inactivity that must bound a visit in the Summary configuration, but no matter what you choose there may be instances where what Summary counts as two visits a visitor may consider one (or vice versa.)

On top of that, if you factor in the fact that some requests are never received by the server (because they are cached), it is conceivable that a visitor could “browse” your site, entirely out of his cache for a half hour or more and register two visits when he never left your site at all. It is also possible that the visitor browsed your site all day from his cache and never made a single request to your server. No visit will be registered at all in your logs. Finally, some users will keep windows open to multiple sites and switch between them. This can affect the visit because there could easily be half-hour gaps between logged requests from such ‘multitasking’ users.

MORE ON
Units


Table of Contents | 1: What is Web Analytics? | 2: Where are My Visitors Coming From? | 3: Search Engines | 4: Advertising | 5: Revenue Modeling | 6: Design Considerations | 7: Determining Visitor Behavior Patterns | 8: Examining Subsets of Traffic  | 9: Incorporating Business Goals | 10: Bandwidth Management | 11: Site and Server Diagnostics | 12: Investigating Troublemakers | Appendix A: Making Reports More Usable | Appendix B: Technical Details of Metric Accuracy

Copyright 2002 by Summary.Net - Updated 16.Apr.2002