How Many Users Does it 
                  Take to 
                  Change
                    a Web Site?
                  (SIGCHI
                  Bulletin May/June 2001)                    | 
               
             
			 			  
            
              Well, naturally this is a trick question, but it has recently
                become a little trickier. Ever since Nielsen and Landauer's InterCHI
                1993 paper, the answer has been "about five". This
                is based on a mathematical model that
                projects the number of new problems that each additional usability
                test subject will find. Five users should find about 85% of usability
              problems with significantly diminishing returns thereafter. 
              But by the time you read this, Jared Spool and his colleagues
                will have presented a short paper at CHI 2001 entitled Five Users
                is Nowhere Near Enough, reporting on a study that failed to find
                even half of a web site's predicted usability problems with a
                whopping 18 users. So where does that leave us with our little
              conundrum? 
              Let's consider why this kind of discrepancy might exist. Traditionally,
                usability testing has been performed using well-defined tasks.
                Spool's team, on the other hand, conducted what might best be
                described as "goal-directed" tests. In these studies
                rather than giving users very specific scenarios, they were asked
                simply to buy something they needed (CD's and videos for example).
                This meant that users were formulating their own sub-goals and
                tasks. It also meant that the detail with which they described
                their goals could have varied from something vague like "a
                CD my mom will like" to a specific requirement for "the
              Erin Brockovich video" (hopefully spelled correctly). 
              Given these potential variations, it is likely that users would
                have been testing different parts of the same web site. Some
                users may have required the search facility, others might have
                tried browsing to find that CD for mom. The issue here, to use
                the conventional software testing term, is coverage. The Nielsen
                and Landauer model only works because users are exposed to the
                same aspects of the system under test. Regrettably, given the
                complexity of some web pages, we cannot even be sure which aspects
                of the system users are being exposed to. Parts of a page that
                one user may dismiss as irrelevant might be examined by another
                in minute detail as they see a vital clue in the pursuit of their
              goal. 
              All of this has some worrying implications for web site usability
                testing. On the one hand, insisting that users focus on a well-defined,
                detailed task usually only requires about five subjects per test.
                On the other, is this a realistic way to determine the overall
                usability of site? As if to pre-empt this question, Jakob Nielsen
                has recently published an Alertbox called Success Rate: The Simplest
                Usability Metric. In it he makes the point that the best methods
                for usability testing (with small numbers of users) conflict
                with the demands of getting an overall measure of a site's usability.
                The latter can only be done (with confidence) by measuring the
              success rates of larger numbers of users in achieving their goals. 
              The consequence of this fits the five-versus-eighteen-plus puzzle
                together rather nicely. We can use small numbers of subjects
                in detailed and well-focused tests, ensuring that coverage is
                the same within a series. On large or complex sites we will need
                more test series to improve coverage. However, to get a global
                picture of a site's usability, we need to measure the success
              rate of real users in real situations. 
              Footnote
              The mathematical model used by Nielsen and Landauer
                was devised by Bob Virzi at GTE. (Thanks to Wayne Gray for pointing
                out this oversight in the original article.) 
              
                 R.A. Virzi (1992). 'Refining the test phase
                  of usability evaluation: How many subjects is enough?' Human
                  Factors, 34(4), 457-468. 
                 
               
              
			  The Author
              William Hudson is principal consultant for Syntagm Ltd, based
                near Oxford in the UK. His experience ranges from firmware to
                desktop applications, but he started by writing interactive software
                in the early 1970's. For the past ten years his focus has been
                user interface design, object-oriented design and HCI. 
              Other free articles on user-centred design: www.syntagm.co.uk/design/articles.htm 
              ©  2001-2005
                ACM. This is the author's version of the work. It is posted here
                by permission of ACM for your personal use. Not for redistribution.
                The definitive version was published in SIGCHI
                Bulletin,
                {Volume 33, May-June 2001} 
                http://doi.acm.org/10.1145/967222.967230  
              
            
             |