Usability study—index-versus-search

Topic status automatically displays here - do not remove.

Bookmark me! Bookmark this topic Print me! Print this topic

This is an extract of the usability report "Index Versus Full-text Search: A Usability Study of User Preference and Performance " authored by Carol Barnum, Earvin Henderson, Al Hood, and Rodney Jordan in 2004. It has been previously published by the US Society for Technical Communication (STC) in their magazine Technical Communication, Volume 51, Number 2, May 2004, pp. 185-206; extracted on Goliath.ecnet.com, and by one of the authors Al Hood on allbusiness.com Business Exchange newsletter, May 1 2004, with thanks. It is replicated here for reference purposes only, and the right of the original authors as listed above as the original authors is acknowledged.

Index versus full-text search: a usability study of user preference and performance.

By:Hood, Al
Publication: Technical Communication
Date:Saturday, May 1 2004

SUMMARY

* Describes some usability issues resulting from conversion of paper documents to Adobe PDF format

* Reports that users preferred the full-text search tools but got more accurate results with an electronic back-of-the-book index

**********

"Information by itself is not valuable unless it is accessible: Value is created by pathfinders through the information" (Grimstead 2001, p. 13). As technical communicators, content developers, or information architects (depending on the title we use to characterize the work we do), we certainly recognize that access to the information we create, regardless of the medium, is an essential tenet of effective technical communication.

If our specialty is indexing, we understand that a well-composed index is one that matches the thinking process and vocabulary of our users. With books and other print references, such as documentation, the usefulness of the product is frequently dependent on the effectiveness of the index. But does the same hold true when the book is an e-book and the users have access to text-search tools and can quickly search the text for whatever word combination they wish?

Publishers who engage in e-publishing are leading the trend that bypasses the indexer in the belief that readers can function just as well without an index when offered software that allows them to do full-text searches instead. This belief is based on the increasing familiarity users have with text-search tools to look up information and the publishers' desire to better control publishing costs and production schedules.

But, as many who have experienced the results of a search know, search engines frequently produce hundreds of "hits," some of them relevant, many of them not so relevant in the context that the user is looking for. Simple keyword text searching is the type of searching most familiar to most people. Type the desired text string in the dialog box, and the search engine will find every occurrence of the string of characters, regardless of the value or significance to the user.

Search engines are most effective for the typical user when there is a single objective answer to a simple query, as, for instance, is the case with a search for a particular book title on Amazon.com. If you know the name of the book or the author, your search is likely to be fruitful. If you know the topic category--usability, for instance--you are also likely to get a list of books that will be of interest.

More complex queries, however, are another matter. If, for instance, you are searching the Web for index usability, you might get more than 200,000 hits covering topics such as indexing, usability, and indexes of sites on usability. You may not find any studies of index usability, or you may give up exploring the options after examining the first 10 or 20 items in the results list. As well, the order of the list of hits may have no meaning or usefulness to you, and, to narrow your search, you might need to use the dreaded "advanced search" button, which will likely add to your anxiety if you are like most people who know very little about effective advanced searching techniques such as Boolean logic.

In this article, we report on the results of testing two versions of an information product, Usability testing and research: one version, an e-book with an index with the locators hyperlinked to the page reference for each entry; the other version, the same e-book without an index, but with full-text search capabilities. We describe the methodology for testing, the testing results, our conclusions, and implications for future research. Before discussing these issues, however, we summarize the current literature regarding human indexing and information retrieval by machine (search engines).

LITERATURE REVIEW

Human indexing and advances in machine information retrieval

When a business looks to improve its standing in a competitive market, the cost of labor is generally the most significant expense, and therefore receives the focus of management's attention to keep the business competitive. In the area of information retrieval, information science technology is making remarkable advances that promise a cost-effective method to index, classify, and catalog the explosive growth in information available through electronic sources and the Internet. How does this rapidly advancing technology compare with human indexing, and what does the future hold for each? We begin this discussion by reviewing the profession of indexing and the value that a well-prepared index adds to documents and database information retrieval. We will see that an index is much more than just an alphabetical listing of a document's contents and that conformance to a firm set of rules doesn't necessarily make a good index.

Information science experts James D. Anderson and Jose Perez-Carballo (2001a, p. 237) reaffirm what the professional indexing community says about the role of indexers:

   The general consensus among indexers and theoreticians is that human
   indexers perceive (read, view, examine, listen to) a text, interpret
   the message encoded in the text as they understand it (influenced by
   previous experience and current personal knowledge, including their
   interpretations of any instructions given them), and then describe
   their version of the message, plus any important

text or document features, in accordance to rules and patterns for the type of index they are working on. Not much more detail than that is provided by experts in indexing. The authors cite the work of noted indexing experts Nancy Mulvany, Lois Mai Chan, Robert Fugmann, Dagobert Soergel, Hans Wellisch, and others to support their critical assessment of the profession and the underlying reliance on the indexer's good judgment. The authors also point out that "modern indexing algorithms go well beyond simply generating lists of words, and that indeed, judgments are made based on a wide range of criteria, including those encoded in knowledge bases, reflecting the significance of subject area and cultural understanding of their creators. Nevertheless, effective human indexing relies on a very sophisticated use of human intelligence" (Anderson and Perez-Carballo 2001a, p. 238).

Indeed, sophistication is very much a part of human indexing. A well-composed index is the result of a complex thought process whereby the indexer bridges the author's perspective of the subject to the likely keyword that the user will consider. To make the connection between the author and the user, the indexer may use words or "coined modifications" that are not specifically mentioned in the document but that will be recognizable to a majority of users of the index. Weinberg (1996) points out that about 10% of the average index's entries are "coined modifications, formulated by a human

indexer to reflect the text being analyzed."

According to noted indexing experts Ann P. Bishop, Elizabeth D. Liddy, and Barbara Settel (1991), a reader uses a back-of-book (BOB) index for two purposes:

* To identify and locate particular information within the book

* To get an idea of a book's scope and detail, and the nature of a particular subject

However, indexes are not routinely present, even in books that we would normally associate indexes with, such as in the disciplines of humanities, fine arts, social sciences, and science and technology. The authors found that out of the 659 books examined in these disciplines, 117 (17.8%) had no index (Bishop, Liddy, and Settel 1991, pp. 24-25).

With the advent of machine indexing, some worry that the profession of indexing will disappear. Experts in information science don't believe that machine information retrieval is a passing technology, especially with the youngest of users growing up completely comfortable with the Internet, information-searching techniques, and the desire to use the path of least resistance to get what they are looking for (see Peek and Hane 1998). They are, however, concerned with the dispersion of core information from the relatively few quality information sources to an expanding host of sources with diluted information quality, such that if the text query matches something in the document, then the document must be relevant.

Anderson and Perez-Carballo (2001b, p. 267), believe that both types of information retrieval will play a valuable role in the future. Because of the rising cost of human indexing and the amount of new information created everyday, machines are indispensable as information retrieval and indexing tools. However, the day when machines replicate the intellectual processes that human indexers provide isn't foreseeable anytime soon. The question, then, is "Where should the dividing line be for human indexing and information retrieval by machine?"

Research by Anderson and Perez-Carballo

shows that users desire both methods (machine and human indexing), depending on what they are trying to find, and that "users find them, on balance, more or less equally effective" (2001a, p. 233). Weinberg (1996) goes further, stating that "complex information systems required human intermediaries."

Most information retrieval experts agree that just because technology can provide a wealth of information in a few keystrokes doesn't mean that we should use it. Rather, we should be careful with how we apply the technology, because the quality of information will degrade while the quantity of information will continue to grow exponentially. However, the same argument can be made about indexing by humans: "What we cannot afford to continue to do is to treat all documents that enter our collections and our IR [information retrieval] databases as if they were all equally important and equally deserving of our expert analysis and indexing. They simply are not, and to continue to do so is to waste precious resources" (Anderson and Perez-Carballo 2001b, p. 274).

Usability testing of indexes and search engines

How to make back-of-book indexes more efficient and effective for users has been an important subject of researchers for a long time. Evidence of this work is apparent in the available standards on indexing, books on indexing, and style guides on indexing (see Milstead 1990). For example, the chapter on indexes in The Chicago manual of style (University of Chicago Press 1993) provides recommendations on arrangement of entries, of subentries, on locator numbering scheme, information to index, and the use of cross references (that

is, see and see also). Most publishers impose a particular standard or style guide for their indexers, but the content of the index is at the discretion of the indexer. Thus, the experience and objectivity (although not total objectivity) of the professional indexer in creating professional indexes will be extremely beneficial to users, given their time constraints, the space constraints of the document, and the complexity of indexing (University of Chicago Press 1993).

Recent works in usability testing and research of BOB indexes include Susan C. Olason's Let's get usable! Usability studies for indexes (2000); an extensive index quality study of 433 books of various genres (that is, history, literature, science, and technology) by Ann P. Bishop, Elizabeth D. Liddy, and Barbara Settel (1991); and Ryan and Henselmeier's study at Macmillan (2000).

Olason's work assesses the impact, including quantitative results, of the following design features of BOB indexes:

* Run-on versus indented style

* Sub-entries beginning with prepositions or conjunctions

* Other access paths readers use to find information in the book

Her results concerning the first two items in this list correlate well with the indexing recommendations of The Chicago manual of style (University of Chicago Press 1993). She shows that users are well served by the selection of an index's main entries that considers the user's familiarity with the subject.

The indexing study of Bishop, Liddy, and Settel (1991) indicates that there is considerable

variability in indexing among the books of various disciplines reviewed and that the recommendations of the 1993 edition of The Chicago manual of style are not strictly followed.

In the American Society of Indexers' newsletter, Key words, Ryan and Henselmeier (2000) describe the process they used to conduct a usability test of four books (each book for a different user group). Twenty-two participants were instructed to answer questions by looking up information using the index, the table of contents, glossaries, or whatever method they found most useful. The test observers, all Macmillan indexers, "were surprised at what participants looked up," with several observers commenting that participants "searched for terms they would never have thought of including in the index" (Ryan and Henselmeier 2000, p. 201).

The observers also found that some users liked to find a general area in the book and then narrow down their search by skimming pages, while others didn't want to read pages at all, but instead wanted the index to take them directly to the information. This "surprise" factor--what we learn from users--is a common occurrence in usability testing, pointing up the necessity to test all aspects of an interface, including the index, for information on how the interface matches users' own search and look-up vocabulary and strategies.

When it comes to the Internet, however, the studies are fewer. A search of the literature on Internet text searching brought us repeatedly to the Web site of User Interface Engineering (UIE). In their first reports on usability testing, (Spool and colleagues 1997, p. 47), they tested 10 information Web sites and reported that one-third of their users tried "search" as their first strategy for looking for information. Their results showed that more often than not, users were unsuccessful because of two problems:

* They didn't understand the scope of the search.

They had trouble interpreting the search results.

Since those initial tests, UIE has continued to investigate the use of search as a look-up strategy, reporting the results in its e-mail newsletter, UIEtips, and in articles on its Web site (http://www.uie.com) with titles that include "Why on-site searching stinks," "Are there users who always search?" "Users don't learn to search better," and "People search once, maybe twice." The findings from this research reaffirm that users have great difficulty understanding the dynamics of a full-text search, and as a result, they give up easily because "full-text searches are different from looking something up in an index, but users didn't seem to grasp this." (User Interface Engineering 1997).

For example, when users typed in "tire" on the Car Talk Web site (http://www.cartalk.com), they were surprised by the results that contained "entire," and "I'm tired." The search default was set to find partial word matches, and although users could change the default to search for entire words, no one did. If they misspelled or mistyped the word, they got zero results, but they didn't realize that the problem was spelling. With full-text searching, they received results that were clearly irrelevant. In one study, even a seemingly straightforward search task for "return policy" on Amazon.com resulted in 43 books on the topic but nothing on the return policy at Amazon.com (Ojakaar and Spool 2001).

In another UIE study, when users searched for information on dinosaurs in Smithsonian magazine online, the first hit they received was about the American steel industry, described as "one of the great American industrial dinosaurs." As UIE concludes, "An index is a more precise tool. No self-respecting human indexer would have referenced the steel industry under 'dinosaurs.' Good indexing is a skill; humans do it better than machines. We anticipate that professional indexers may become more involved in web site design in the future" (User Interface Engineering n.d.).

Reporting on the results of another study, UIE found that when people search Web sites for content, they often use the search engine. However, they find their target content only 34% of the time. Within this group who used search, 47% tried only once. Another 30% tried twice. Fewer than 25% tried more than twice, despite designers' efforts to encourage more search strategies through tips. Although these results were based on tests of e-commerce sites, UIE asserts that "for years, we've been seeing these results on intranets, corporate and institutional information sites, and any other type of site with a search capability" (User Interface Engineering 2001). Jakob Nielsen (2001) reports very similar results from his studies of e-commerce Web sites. He also reports on intranet studies (Nielsen 2002) in which "poor search was the single greatest cause of reduced usability across intranets."

Seth Maislin, an indexer, information developer, and author of "Building search smarts" (2000), states the problem succinctly: "To succeed, search engines must emulate human judgment." Fred Leise, in "Improving usability with a Website index" (2002), says much the same thing in attributing successful indexes to the fact that "a human has looked at and analyzed the text."

As Weinberg, Spool, Maislin, Leise, and others have pointed out, the current capabilities of software that automates the index preparation process cannot take the place of a human indexer in sorting, organizing,

and even supplying additional words and concepts to help users locate information they need. When indexing software or a search engine is used in place of a qualified indexer, "The burden of effective searching is often on the user, and the user is rarely as familiar with the site structure as the writers, editors, and programmers" (Maislin 2000).

Algorithms that improve search engines are being written, and research is ongoing to understand other ways to present search options to users to improve the result (see Spink 2002). Companies are aware of the value of an effective search engine for their Web sites. Forrester Research reported that 77% of the firms they surveyed rated search as "extremely important," yet only 24% rated their own Web sites search capabilities as "extremely useful" (see Hearst and colleagues 2002).

Google.com, the search engine of choice for many because of its ease of use in search queries, explains its search technology on its Web site. With tongue firmly in cheek, Google reports that its patented technology, based on the work of behavioral psychologist B. F. Skinner is built around low-cost pigeon clusters (PCs) that can

Who am I? > find out more

Usability study—index-versus-search

Index versus full-text search: a usability study of user preference and performance.

See Also