Summary

Web Analytics Tutorial

 

Lesson 8 – Examining Subsets of Traffic

IN THIS LESSON
* Introduction
* Filtering Visits
   Filtering by Referrer
   Filtering by Browser
   Request or Content Filters
   Filtering Authenticated Users
   Host Filtering
   Filtering by Cookie
   Multiple Filters
* Filtering Non-visitor Traffic
   Removing Robot Traffic
   Removing Employee Traffic
   Removing Automated Traffic
* Advanced Data Analyis
   What-if Scenarios
   Pivot Tables

Filtering Non-visitor Traffic

Removing Robot Traffic

Some of the traffic in your web logs is not generated by real visitors to your site. To make your reports reflect your visitors’ behavior more accurately, you may want to apply filters to remove particular traffic from the reports, rather than isolate particular traffic. The most obvious of these unwanted hits is probably traffic from robots. While the Known Robots report can tell you when your site was last crawled by each search engine, beyond that the traffic patterns that robots generate do not really help you reach your business goals. In Summary, you can simply elect to exclude all robot traffic from all your reports (or in a particular subreport in Summary SP.) You accomplish this be selecting a check box in the Filtering section of the configuration.

Removing Employee Traffic

The other large group of traffic that does not represent your customer behavior is that of your own employees. Internal traffic can sometimes add up to a significant portion of your total traffic and can skew the analysis. Most companies have an IP address (such as a firewall or proxy) through which all internal traffic travels. You can simply filter traffic to exclude this host. If you do not have a single point of Internet connection, or your web server is locate inside your firewall, you can exclude the range of IP numbers assigned to your company or that you use for your internal network.

Some companies do not have particular IP addresses that they can exclude, either because they connect through a dial-up service or because they have lots of telecommuting employees (or contractors) who use dial-up services that assign new IP numbers at each connection. In this case the easiest solution is to have all employees acquire a cookie from a custom page on your site to tag their browser as ‘internal traffic.’ Then you can simply add a cookie filter to remove this traffic from your reports.

Removing Automated Traffic

There may also be automated traffic that you generate internally or contract to have generated and may want to remove. Load balancers, such as BigIP, are a good example of this – they usually “ping” each web server by loading a sample page on a regular basis to determine the response rate. You will want to remove all requests to this page from your reports. The same is true for any other network, service or systems monitoring tools you may have installed or may have outsourced that requests documents from your web server. Finally, if you have an on-site search engine, there is likely an indexing robot that crawls your site to index the content. Summary’s robot exclusion filtering, described above, may not automatically remove your internal search engine traffic. Therefore, you should to add a filter to remove this traffic based on the user agent of the indexing robot or the host where the indexing server is located.



Table of Contents | 1: What is Web Analytics? | 2: Where are My Visitors Coming From? | 3: Search Engines | 4: Advertising | 5: Revenue Modeling | 6: Design Considerations | 7: Determining Visitor Behavior Patterns | 8: Examining Subsets of Traffic  | 9: Incorporating Business Goals | 10: Bandwidth Management | 11: Site and Server Diagnostics | 12: Investigating Troublemakers | Appendix A: Making Reports More Usable | Appendix B: Technical Details of Metric Accuracy

Copyright 2002 by Summary.Net - Updated 16.Apr.2002