|
|
Interfacing to Content Delivery Systems
Some web sites are implemented with content delivery systems that make all
requests run through a script or dynamic page. This can make your report results
very limited. As a simple example, assume a site has a dynamic page,
/navigate.asp that formats all the content before delivering it to
the user. A typical stream of requests would then look like this:
/navigate.asp?section=home&page=index.xml
/images/logo.gif
/images/banner.gif
/images/home.gif
/navigate.asp?section=products&page=index.xml
/images/products.gif
/navigate.asp?section=products&page=AS001.xml
/images/products/AS001.gif
|
Figure 6. The Page Requests report only
shows the wrapper page by default. |
Under Summary’s default configuration, the Page
Requests report would look like Figure 6 – all requests are to
the single page /navigate.asp. This does not tell you anything about
how the site was used. You can easily change this by enabling the “Include
query string in requests– setting on the Options configuration page.
Summary will then list each of the page requests above as a separate listing in
the report, showing what section and page each produced.
|
Figure 7. The Directory report is not very useful
when all pages are processed by a wrapper. |
Even if you have query strings enabled, the Directory
Report would still look like Figure 7, showing all the pages in
the root directory. While this is accurate, as far as the requests look, the
people reading the report would expect the last two pages requested to be in the
/products/ directory. You can use Summary’s Request Aliases
to change these requests to something that looks more like the report readers
will expect. For this example, you would set up the alias below to convert
dynamically generated requests like
/navigate.asp?section=products&page=index.xml to
something that resembles a directory structure,
/products/index.xml (see the Summary manual for details on
setting up Aliases): /navigate.asp?section=*&page=* /$1/$2
Not all content delivery systems are so simple. In fact some of the more
powerful ones such as IBM WebSphere, Vignette Content Suite and BEA WebLogic use complex request structures to allow them to delivery personalized
content or offer alternate editions (e.g. different languages or wireless vs.
HTML.) Summary’s Aliases support Regular Expressions, which are a very
powerful tool for matching text. Using regular expressions you may be able to
construct aliases to handle even these complex URLs.
|
|
Sometimes content delivery systems may add more data than you want to
analyze. For example, some systems may redirect the user three or four times
before settling on the location of a file (especially when they provide load
balancing.) These redirections do not provide much useful information when you
are analyzing your visitors’ traffic patterns. You can use filters,
covered in Lesson 8 - Examining Subsets of Traffic, to
remove this kind of traffic from your reports.
The content delivery system may affect other parts of the log file line than
the request – status code, host, user, etc. Sometimes request streams can
be more complex than the way the user sees them. In these cases you will need to
use another tool to pre-process the logs before Summary reads them. Some common
tools for text manipulation are sed, awk, and perl. These are all standard on
most Unix systems. Perl is also available for most other platforms. You can find
more information on them and regular expressions at links provided below:
- Perl regular expressions
- The reference page for the regular expression language that Summary
supports.
- O’Reilly
Perl.com
- The most common source for information on the Perl scripting language.
Includes links to downloads for many platforms.
- O’Reilly
sed & awk, 2nd Edition
- A popular guide for the sed and awk text manipulation languages
|
|