Summary.Net Archives
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Summary-Talk] Visitor Estimating



Counting human visitors to your site is a vexing problem. Nothing you do
will *ever* get you an accurate number of human visitors to your web
site (at least for any "normal" web site). Despite this, others will
require that you provide an estimate, and will then treat that estimate
as if it were a true count of human visitors. The only comfort is that
everyone else has the same problem.

The best you can do is to get a number that is approximately
proportionate to the number of human visitors. The best way to do that
is to turn on "Ignore known and likely robots" and "Combine proxy
clusters into one host", while turning off "Use session IDs to determine
visits" and then use Visits as your proxy for human visitors.

When I say "approximately proportionate", I mean that when Visits goes
up by 10% you can be reasonably confident that human visitors went up by
something quite close to 10%. What you can never know is if human
visitors was higher or lower than Visits or by how much.

The settings I suggested above eliminate as many of the correctable
visit counting problems as currently possible. Over time we update
Summary to take into account newly discovered correctable errors, to
keep Visit counts as close to reality as we can. But because of the way
the web works, there will always be some remaining uncorrectable
miscounting. There is currently no way of estimating the magnitude, or
even the sign, of the uncorrectable counting errors for a particular site.

Tests using artificially created traffic suggest that the number of
human visitors will often be somewhere between Visits and Unique Hosts,
though it varies quite a bit. Because several of the factors that
determine where in that range a particular web site falls depend on
details of your visitor's ISPs network setup, in ways that we don't
currently know how to model, we have no way of knowing where in between
those two numbers it actually falls for your site, or if it is instead
outside of that range. Averaging Visits and Unique Hosts doesn't give
you a more accurate guess.

Likewise, using cookie tracking doesn't particularly improve things.
There will always be some number of people who refuse to accept cookies,
and cookie based counting tends to significantly under count them.


When calculating multi-page visits remember to keep in mind that there
can be zero page visits. Zero page visits happen when someone requests
only graphics, ie no pages. Also, calculating multi-page visit
percentages when you are not filtering robots is potentially very
inaccurate. There is no reason to believe that the distribution of robot
visit durations will be similar to the distribution of human visit
durations.

Jason

-- 
Jason@xxxxxxxxxxx
--
Dr. Seuss books . . . can be read and enjoyed on several levels. For
example, 'One Fish Two Fish, Red Fish Blue Fish' can be deconstructed
as a searing indictment of the narrow-minded binary counting system.
   -- Peter van der Linden, Expert C Programming, Deep C Secrets
-------------
Go to <http://summary.net/list.html> to update subscription info.