Playing your cards right: Getting the most from card sorting for navigation design
(interactions magazine September/October 2005)
Card sorting is a knowledge elicitation technique often used by information architects, interaction designers and usability professionals to establish or assess the navigation hierarchy of a web site. The items are typically menu entries or hyperlinks while the groups are categories or headings. The process involves asking participants to sort items into meaningful groups. In open card sorts the number and names of groups are decided by each participant while in the closed card sorts these factors are fixed by the researcher in advance.
Analysis of card sorting results range from simple counting of the number of times items were grouped together to the rather intimidating monothetic agglomerative cluster analysis (known simply as cluster analysis in most cases). Unfortunately, no single technique provides everything a researcher needs to know, especially if convincing evidence is needed to persuade colleagues or customers of the effectiveness of a proposed design.
The evidence we need falls into three categories:
- Participants. Are these the right participants for our site? Are they all thinking about the items and their groupings in a similar way? Do they have a clear understanding of the card sorting task itself?
- Items. Are the item names well-understood by participants? Are there alternatives that should be considered – perhaps terms users are more familiar with?
- Groups. For closed card sorts, have we chosen the right number of groups and names for each? For open sorts, are participants largely in agreement about the number of groups needed? How well do participants feel the items fit into their groups?
Happily, the answer to this last question – how well participants feel the items fit into their groups – can also help us with many of the other issues listed. Coupled with a few data collection guidelines and alternative presentations of results, we can collect fairly comprehensive evidence about what is and what is not going to work in our navigation hierarchies.
So let’s examine this last question in some more detail: How well do participants feel the items fit into their groups? It is possible to argue that this question is redundant; that the items must fit into their groups relatively well in any given set of results, because that is how the participant decided to group them. However, practical experience says otherwise. Consider the following example that I use as a practice sorting exercise when teaching: participants are given the names of 14 wines and asked to sort them into 3 groups (full-bodied reds, dry whites and sparkling). Participants are instructed to omit any items they feel do not really belong to any of the groups. The cluster analysis dendogram shown in figure 1 is a fairly typical set of results for 12 participants.
Figure 1: Cluster Analysis Dendogram for Wine Example
The dendogram shows the three groups, connected in the characteristic tree-like structure which gives this form of presentation its name. The vertical connections between branches indicate the strength of the relationship between items, with stronger relationships to the right and weaker to the left. So for example, the relationship between Riesling and White Zinfandel is the strongest in this dendogram, meaning that those two items appeared in the same group more frequently than any other pair of items. The relationship between Beaujolais and Claret is only slightly less strong, while the weakest relationship between any single item and its groups is Pinot Grigio.
But for wine lovers, there is something fishy about this result. If you remember, participants were asked to group the wines into three categories, one of which was full-bodied reds. While Beaujolais is a red wine, it certainly cannot be described as full-bodied (there is also a problem with White Zinfandel that I am not going to deal with here – it was a nasty trick played on participants that will become immediately obvious if you actually try to buy a bottle of the stuff – it is rosé, not white). So what went wrong? It seems that participants are sometimes reluctant to admit defeat and omit a card even though they were asked to if they did not think it fitted the groups. So although Beaujolais is not full-bodied, most participants failed to omit the card. Why do I believe this rather than entertaining the possibility that participants were not aware of the difference? Because I asked them to indicate how well each item fitted within the group they chose on a simple three-point scale:
- Fair (1)
- Good (2)
- Perfect (3)
This “quality of fit” measure can be incorporated into the cluster analysis as part of the strength of relationship between items. Figure 2 shows the same card sort results with quality of fit taken into account.
Figure 2: Wine Dendogram with Quality of Fit
We still have the three groups, but now in a slightly different order (which is not important for our discussion). Notice though, that with quality of fit taken into account, the relationship Beaujolais has with other members of its groups has changed from being the strongest with the other red wines to being the weakest. This is because while participants recognized Beaujolais as a red wine, they also knew it was a poor fit for a group labeled “full-bodied”. (In a full-scale exercise, participants would have been invited to create new groups or to annotate the cards, which may have provided a similar result. However, as we will shortly see, quality of fit has other benefits that are worth pursuing.)
I promised earlier that the answer to the “how well do items fit into their groups” question would help us deal with issues surrounding participants and the items themselves. So far, all that we have considered is how the quality of fit measure changes the “proximity matrix” used as the basis of cluster analysis. The matrix shows the strength of relationship between items and is simply the sum of individual matrices such as that shown in figure 3. Without quality of fit, each cell of a participant’s matrix is either 0 (blank in this case) or 1 depending on whether the items in question appeared in the same group. In figure 3 Beaujolais was placed by this participant in the same groups as Cabernet Sauvignon, but Beaujolais and Cava were in separate groups (and so on through each possible pairing).
Figure 3: Proximity Matrix for a Single Participant (without Quality of Fit)
With quality of fit taken into account, the sample matrix changes to that shown in figure 4. Quality of fit has been averaged between items. So, for example, Claret and Beaujolais have been placed in the same group, but Claret was given a “perfect” 3 while Beaujolais had a “fair” 1. The average that appeared in the matrix was a “good” 2.
Figure 4: Proximity Matrix for a Single Participant (with Quality of Fit)
But we can also average quality of fit for each item across all participants. This is the small bar chart that appears between the dendogram and item labels in figure 2, but I have included a larger version as figure 5.
Figure 5: Average Quality of Fit by Item
We can see from this simple analysis that Beaujolais has the lowest quality of fit of any of the items, which should make us a little suspicious of either the group names we have provided users or of their understanding of the item itself. Asking users to annotate the cards would help to clarify this, but if time allows, asking individual users to sort while thinking aloud should also be revealing.
In figure 5 we can see that Pinot Grigio also has a low quality of fit and has the weakest relationship with other members of its group, as shown in both figures 1 and 2. We need to find out why this is so, but the analysis techniques discussed so far have nothing further to reveal in this respect. If we add together all of the proximity matrices from each participant, we get the result shown in figure 6.
Figure 6: Sum of All Participant Matrices for the Wine Example (with Quality of Fit)
It is not easy to see at first glance, but Pinot Grigio (at the end row and column of the matrix) has often been grouped with red wines such as Claret. The use of a simple spreadsheet chart called a “surface map” (in Microsoft Excel) shows the situation a little more clearly:
Figure 7: Surface Map of the Proximity Matrix (Figure 6)
The surface map shows three main groups: full-bodied reds in the top left corner, sparkling wines in the center and dry whites in the bottom right corner. But there are some odd embellishments at the bottom and right edges. These are the consequence of Pinot Grigio being grouped with red wines as well as white. Discussion with the participants revealed that the term “Pinot” was strongly associated with Pinot Noir, hence the tendency for it to appear in both the red and white wine groups. Careful observers will also notice that Muscat was occasionally grouped with the sparkling wines and, as it happens, for a similar reason to the confusion over Pinto Noir. “Muscat” is very similar to “Muscatel”, a popular sparkling wine. Again, this information is present in the matrix itself, but is not visually obvious.
Finally, how does quality of fit help us decide whether participants understand the card sorting process itself? The solution here is to average quality of fit for each participant across all items. This gives us an idea of the degree of confidence participants had in the groupings they made – a factor especially important for open card sorts where the number and name of each group is a matter of individual choice. The scattergram shown in figure 8 is for an open card sort with 18 participants. The axes are average quality of fit (vertical) and average group size (horizontal).
Figure 8: Participant Scattergram by Average Quality of Fit and Group Size
The scattergram shows most participants clustered around the centre, but with one in the top right whose results might be worth investigating to see if they should be excluded from the overall analysis as an outlier.
Naturally, quality of fit in card sorting is not a magic solution and is no substitute for careful planning and qualitative data collection. But coupled with a few data analysis techniques beyond traditional card-counting and cluster analysis I believe we can reach much more robust conclusions in web navigation design.
References
The topics presented here and further extensions to card sorting are discussed at http://www.syntagm.co.uk/design/cardsort.shtml
Tips
Encourage user feedback:
- For one-on-one sessions (single participant and researcher), ask participants to think aloud
- For group sessions, use paper or card for sorting and ask users to make annotations, suggest alternative item and group names and to add groups and items as required
- For on-screen card sorting (web or desktop-based) provide and encourage use of a separate notepad or email facility for participants to make notes and queries
Do not rely on a single method of analysis. Examine the raw data, generate cluster analyses and make sure that unexpected results can be explained.
Exclude results from participants where there is evidence that they did not understand the process or where their results were substantially inconsistent with the majority of participants.
Consider providing a simple trial card sort to give participants practice in the technique. Fruit and vegetables make an easy introduction but may not be appropriate in all cases.
Use open card sorting for exploration and closed for assessment.
The Author
William Hudson is principal consultant for Syntagm Ltd, based
near Oxford in the UK. His experience ranges from firmware to
desktop applications, but he started by writing interactive software
in the early 1970's. For the past ten years his focus has been
user interface design, object-oriented design and HCI.
Other free articles on user-centred design: www.syntagm.co.uk/design/articles.htm
© 2001-2005
ACM. This is the author's version of the work. It is posted here
by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in interactions,
{Volume 12 Issue 5, September + October 2005} http://dx.doi.org/10.1145/1082369.1082410
|