Summary

Web Analytics Tutorial

 

Lesson 12 – Investigating Troublemakers

IN THIS LESSON
* Unusual Access Patterns
   Drilling Down
* Bad Robots
   Denial of Service Attacks
   Worms
   Content Mirroring
* Digging Deeper
* What to Do About Troublemakers
   Responsible Parties
   Validity of Information
   Counter Measures
   Limiting Robots
   Limiting Mirroring Tools

Unusual Access Patterns

Figure 1. Sample Pages Over Time Report
Figure 1. The Hits Over Time column can reveal unusual traffic peaks.

In the course of running and managing a web site or server, you may come across traffic patterns that stand out in relation to the regular traffic. This can occur as spikes in the time reports, like Daily Report or Weekly Report, or in the Hits Over Time column in a report like the Pages Over Time report or an excessive number of hits from a single host or range of hosts. Often these will be explainable, such as the /ads/promotion.html peak at the promotion date in Figure 1. Other times you won't immediately recognize what caused the change in pattern. Given the depth of information available in web analytics reports, you generally will only notice these patterns on the higher-level reports, first, and will need to do some investigatory processing to find out what is going on.

Figure 2. Sample Hourly Report
Figure 2. An unusual few hours in the Hourly Report could indicate an attempted attack.

When you dig deeper into an unusual access pattern you may realize that it was an intentional attack by a malicious visitor. People can cause all sorts of problems on your site if they want to. Most commonly, you will be dealing with some sort of Denial of Service attack, or similar brute force attack that causes excessive loads on your server (this is often when the attacks or trouble becomes apparent.) As a common example, let us assume that our widgetmanager.com site’s manager notices an extremely large volume of hits for a couple hours one day in the Hourly Report, Figure 2. By digging into the details of those hits, she can find out where they came from, what period they covered, and sometimes, what the troublemaker was trying to do.

Drilling Down

To find out more about particular subsets of traffic you create a filter, as previously covered in Lesson 8 - Examining Subsets of Traffic. If the traffic set you want to focus on is all from a given host, then you can build a host filter to watch just that traffic. If it is from a particular user agent or browser, you can filter by that. (With Summary SP you can apply these filters on a subreport basis.) When working with time periods, Summary will only allow selection on a global basis. So you will either need to do your investigation when no one else needs the reports and then restore the original period range or else get a separate copy of Summary to use for investigation.

Figure 3. Sample Hosts Report
Figure 3. The Hosts Report for the period
indicates trouble from two hosts.
Because widgetmanager.com’s site manager noticed the hit spike during a particular day, she limits the reports to that day in the Summary configuration and reprocesses the logs to get just that traffic. Then, using the Host Report, she can see which particular host or hosts show peak traffic for the day she has selected. The Host Report in Figure 3 shows that two hosts, 192.168.0.100 and 192.168.0.12, have significantly more hits than any other hosts.

Figure 4. Sample All Requests Report
Figure 4. The two hosts made requests
to the secured section of the site.
The site manager then filters traffic by just these hosts. She processes the logs again to build the new reports and looks at the All Requests report to see which items these users were requesting. Figure 4 shows that the majority of hits from these two hosts went to the page /members/ which is a secure access page to allow downloading of their commercial software tool. From this she deduces that this was a simple brute force attack to try to get into the members section of the site by someone who did not have a registration login. To verify this, she checks the Authorized User Report for these particular hosts and discovers that these hits tried many different user names over the course of the day.

Based on her analysis, she can take cautionary steps to help improve the security of the site or to try to confront the individuals who were attempting to break in. For example, she might recommend to the developers that they implement a limiting algorithm on the login code so that if a given host fails to login after ten attempts, then the host will be blocked from login attempts for the next hour. In the final section of this lesson we will discuss in detail other actions that you might want to take with the data you have uncovered.

MORE ON
Subreports


Table of Contents | 1: What is Web Analytics? | 2: Where are My Visitors Coming From? | 3: Search Engines | 4: Advertising | 5: Revenue Modeling | 6: Design Considerations | 7: Determining Visitor Behavior Patterns | 8: Examining Subsets of Traffic  | 9: Incorporating Business Goals | 10: Bandwidth Management | 11: Site and Server Diagnostics | 12: Investigating Troublemakers | Appendix A: Making Reports More Usable | Appendix B: Technical Details of Metric Accuracy

Copyright 2002 by Summary.Net - Updated 16.Apr.2002