Summary

Web Analytics Tutorial

 

Lesson 8 – Examining Subsets of Traffic

IN THIS LESSON
* Introduction
* Filtering Visits
   Filtering by Referrer
   Filtering by Browser
   Request or Content Filters
   Filtering Authenticated Users
   Host Filtering
   Filtering by Cookie
   Multiple Filters
* Filtering Non-visitor Traffic
   Removing Robot Traffic
   Removing Employee Traffic
   Removing Automated Traffic
* Advanced Data Analyis
   What-if Scenarios
   Pivot Tables

Filtering Visits

While web site analysis reports contain various types of data, they are all derived from a simple set of basic data elements that exist in your log files. When applying filters, you filter one of these basic data types: Referrers (that start visits), Browsers (or other user agents), Requested page or file, Authenticated Users (if your site requires a login), Host (including domains, and country of origin of the visitor), or a Cookie that the visitor carries. Each of these data types is discussed in detail here.

Filtering by Referrer

Referrer filtering can be used to see where traffic from a particular search engine, advertisement or other link goes on your site. When you set up a filter for a specific visit initiating referrer (or set of such), you can use all the path analysis and related reports covered in Lesson 7 - Determining Visitor Behavior Patterns to analyze the traffic patterns of visitors from particular leads.

Figure 1. Referrer Report
Figure 1. This sample Referrer Report shows a recent increase from a widgetnews.net story.

Figure 2. Exit Point Report
Figure 2. From the Exit Point report it
appears these visitors are not likely buyers.
Referrer filtering can also be used as an investigatory tool. For example, perhaps the marketing manager of our widgetmanager.com site sees a peak in traffic one day and, using the Referrer Report, finds it comes from a particular news story at widgetnews.net (see Figure 1). The manager wants to know whether they should place an advertisement on widgetnews.net to attract more visitors on a regular basis. She sets up a filter to look at just the traffic generated by that story. By looking at the Exit Point report she realizes that the significant majority of those visitors ended at a links resource page on widgetmanager.com that provide links to widget manufacturers (Figure 2.) As widgetmanager.com sells a tool to manage widgets (not widgets themselves) she does not think this will be a lucrative lead source – these visitors are likely first-time buyers and not yet in the market for a widget manager.

Filtering by Browser

Browsers (or user agents) can be filtered to analyze traffic from particular types of users based on the software or hardware they used. As previously mentioned, looking at traffic patterns of wireless users can help you decide whether to launch a wireless-specific version of your site and where to focus the efforts in that design or to improve the effectiveness of a wireless interface to your web site. Any user agent can become an important subset of your traffic. Widgetmanager.com distributes a custom software tool that customers can run on their own computers to help in manging their widgets. The program connects to the widgetmanager.com site for registration, to check for updated versions of itself and to download and install updates when they are available.

Figure 3. Page Requests Report
Figure 3. A filtered Page Requests report can
highlight usage of custom user agents.
By configuring a filter to look at just the traffic generated by this software program (a particular user agent), the developers can look at the number of hits to ‘register.html,’ which happens once for each copy, and ‘update_check.html,’ which is requested each time the software is run, to see how often, on average, each copy of the software has been run. In Figure 3, the number of update checks, 17,632, divided by the number of registrations, 4,408, gives an average of four runs per copy sold. The CFO can also compare the registration hits to the sales records, according to the financials, and see if there have been a significant number of pirated copies or other inconsistencies in the registration system.

Request or Content Filters

MORE ON
Path Analysis

When building a request filter, you often use a wildcard pattern to find details of requests to specific sections of the site, say a directory or a file type. This sort of content filtering is similar to the Groups capability in Summary that we discussed in Lesson 5 - Revenue Modeling. Selection of content depends a great deal on the organization of your site. For example, if each section of your site, ‘products,’ ‘services,’ and ‘company info,’ corresponds to directories in your URLs, then you can filter on these directories to cover each particular zone. This kind of zone filtering can also be used to match content with departmental divisions withing your company, e.g. news, sales, marketing, etc.

Figure 4. Other Requests Report
Figure 4. A filtered Other Requests
report can show users’ CGI actions.
Another approach is to filter by content type, based on the extension of each file name. You can look at requests to all *.cgi (or similar extension) files to see how visitors use the dynamic parts of your web site. If you have a particular CGI script for each action that a visitor can perform, such as Figure 4, you can use the Other Requests report to see what actions your visitors are taking on the site.

MORE ON
Groups

You can even use content filtering to match specific reports to specific individuals within your company. If your developers and designers are given particular domains of responsibility, you can make a filter to gather reports for each showing the traffic for his or her area of control. We talk more about making reports for particular viewers in Appendix A - Making Reports More Usable.

Filtering Authenticated Users

Figure 5. Authorized User Report
Figure 5. Widgetmanager.com’s
Authorized User report shows
excessive traffic from two accounts.
If your web site has a login-required section, you can filter by authenticated user name (using “?*” to match any non-blank entry) to see how members travel around the site. If your user names include some organizational data, you can use wildcards to collect traffic reports of particular groups of users, based on their user name. You can also use a user filter for investigative analysis. The widgetmanager.com site requires a login to purchase and download the widget manager software tool, submit bug reports or get technical support. The NOC Manager notices an excessive number of hits from the User Report (Figure 5) by two particular users, ‘sudo’ and ‘ptolemy.’ By creating a filter for only these two users the NOC Manager notices that these “users” have downloaded a large number of copies of the software tool. He suspects that the user names and passwords have been shared or leaked to public lists where people download pirated software. The CFO confirms that the registration count on the web site exceeds the actual revenue. So the NOC Manager removes the two offending accounts from the system and sends a note to customer support to contact the original registrants (if they provided legitimate contact data) and discuss the issue with those individuals.

Host Filtering

MORE ON
Making Useful Reports

Host filtering, like request filtering, is usually done with wildcard patterns. The only time you would want to look at a single host would be to examine unusual access patterns, generally of troublemakers. This kind of investigative analysis is covered in detail in Lesson 12 - Investigating Troublemakers. Host information includes domain and country information, so you can find patters of specific users from particular companies, Internet access providers or countries (with some limitation – many international ISP’s use .com or .net domains now rather than country specific ones like .uk or .jp.) A common practice is look at just the traffic from *.aol.com hosts. This tells you details about how AOL users use your site. If you have a section of your site that is dependent on visual information, this can be especially relevant. AOL will “compress” graphics from your site beyond the compression you have already applied, reducing the quality of the images that visitors see. If your AOL reports show that AOL users are frequenting your product images, for example, you may want to add a note to the site instructing them how to disable this “compression” so they see your product shots as they were meant to be seen.

Filtering by Cookie

MORE ON
Investigative Analysis

Figure 6. Weekly Report
Figure 6. Cookies tracking promotion responders
show interesting patterns in repeat traffic.
Cookies can be very useful when you want to track user information, especially across sessions. By adding cookie filters, you can see traffic patters of almost any subset of users you choose to follow. Let us assume widgetmanager.com runs a Christmas special offering 10% off one of their products. They have a special web page that they send users to who click on one of their web advertisements or that users can type in when they see the ad in print media (this kind of link tracking is also covered in Lesson 4 - Advertising.) Every user who goes to that page gets a cookie that tags them as having responded to the promotion. Now the sales manager sets up a filter showing traffic patterns of visitors who have the promotion cookie. Looking at the Weekly Report for the last six months (Figure 6) she discovers that while the majority of hits from the promotion was the week before Christmas, there was also a significant spike in traffic for the week including the 14th of February. Apparently this particular widget manager product is not just a popular Christmas gift, but also a romantic Valentine’s gift. It looks like widget managers (at least for existing customers) seem to signify true love! This information allows her to capitalize on this pattern the following year, by sending a Valentine’s promotion to customers who purchased the Christmas special.

Multiple Filters

Figure 7. Page Requests Report
Figure 7. Without enough filters, unimportant data
can obscure the important details in reports.
You can apply more than one type of filter at a time. In the example on browser filtering, we suggested adding a filter to track traffic from the widget manager software tool. If widgetmanager.com releases frequent software updates, it is possible that there are a large number of requests for each update and they fill up the reports, making it hard to find the registration page hit count (see Figure 7.) By adding a request filter to match just the registration page and the update check page, the site manager can "clean up" the report to just show the information that answers the original questions raised by the developers and CFO.

MORE ON
Link Tracking


Table of Contents | 1: What is Web Analytics? | 2: Where are My Visitors Coming From? | 3: Search Engines | 4: Advertising | 5: Revenue Modeling | 6: Design Considerations | 7: Determining Visitor Behavior Patterns | 8: Examining Subsets of Traffic  | 9: Incorporating Business Goals | 10: Bandwidth Management | 11: Site and Server Diagnostics | 12: Investigating Troublemakers | Appendix A: Making Reports More Usable | Appendix B: Technical Details of Metric Accuracy

Copyright 2002 by Summary.Net - Updated 16.Apr.2002