How Many Users Does it
Take to
Change
a Web Site?
(SIGCHI
Bulletin May/June 2001) |
Well, naturally this is a trick question, but it has recently
become a little trickier. Ever since Nielsen and Landauer's InterCHI
1993 paper, the answer has been "about five". This
is based on a mathematical model that
projects the number of new problems that each additional usability
test subject will find. Five users should find about 85% of usability
problems with significantly diminishing returns thereafter.
But by the time you read this, Jared Spool and his colleagues
will have presented a short paper at CHI 2001 entitled Five Users
is Nowhere Near Enough, reporting on a study that failed to find
even half of a web site's predicted usability problems with a
whopping 18 users. So where does that leave us with our little
conundrum?
Let's consider why this kind of discrepancy might exist. Traditionally,
usability testing has been performed using well-defined tasks.
Spool's team, on the other hand, conducted what might best be
described as "goal-directed" tests. In these studies
rather than giving users very specific scenarios, they were asked
simply to buy something they needed (CD's and videos for example).
This meant that users were formulating their own sub-goals and
tasks. It also meant that the detail with which they described
their goals could have varied from something vague like "a
CD my mom will like" to a specific requirement for "the
Erin Brockovich video" (hopefully spelled correctly).
Given these potential variations, it is likely that users would
have been testing different parts of the same web site. Some
users may have required the search facility, others might have
tried browsing to find that CD for mom. The issue here, to use
the conventional software testing term, is coverage. The Nielsen
and Landauer model only works because users are exposed to the
same aspects of the system under test. Regrettably, given the
complexity of some web pages, we cannot even be sure which aspects
of the system users are being exposed to. Parts of a page that
one user may dismiss as irrelevant might be examined by another
in minute detail as they see a vital clue in the pursuit of their
goal.
All of this has some worrying implications for web site usability
testing. On the one hand, insisting that users focus on a well-defined,
detailed task usually only requires about five subjects per test.
On the other, is this a realistic way to determine the overall
usability of site? As if to pre-empt this question, Jakob Nielsen
has recently published an Alertbox called Success Rate: The Simplest
Usability Metric. In it he makes the point that the best methods
for usability testing (with small numbers of users) conflict
with the demands of getting an overall measure of a site's usability.
The latter can only be done (with confidence) by measuring the
success rates of larger numbers of users in achieving their goals.
The consequence of this fits the five-versus-eighteen-plus puzzle
together rather nicely. We can use small numbers of subjects
in detailed and well-focused tests, ensuring that coverage is
the same within a series. On large or complex sites we will need
more test series to improve coverage. However, to get a global
picture of a site's usability, we need to measure the success
rate of real users in real situations.
Footnote
The mathematical model used by Nielsen and Landauer
was devised by Bob Virzi at GTE. (Thanks to Wayne Gray for pointing
out this oversight in the original article.)
R.A. Virzi (1992). 'Refining the test phase
of usability evaluation: How many subjects is enough?' Human
Factors, 34(4), 457-468.
The Author
William Hudson is principal consultant for Syntagm Ltd, based
near Oxford in the UK. His experience ranges from firmware to
desktop applications, but he started by writing interactive software
in the early 1970's. For the past ten years his focus has been
user interface design, object-oriented design and HCI.
Other free articles on user-centred design: www.syntagm.co.uk/design/articles.htm
© 2001-2005
ACM. This is the author's version of the work. It is posted here
by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in SIGCHI
Bulletin,
{Volume 33, May-June 2001}
http://doi.acm.org/10.1145/967222.967230
|