Facets add value to the search user experience by helping users refine the usual ranked, best-first list of documents on a search page. The quality of faceted search, however, is at risk when search result precision is traded for recall. Users should ideally be able to find all documents relevant for a given query (high recall), and nothing other than these relevant documents (high precision). Balancing precision and recall is a constant challenge for enterprise search practitioners, since it’s notoriously difficult to achieve enough of both at the same time. When a trade-off has to be made, it may seem safer to err on the side of high recall. Then nothing of potential interest is left out, at least. But we shall see that high recall plays tricks with faceted search, forcing us to reconsider this assumption.
By design, facets and facet values summarize the entire search result set, assigning hit counts to each unique facet value. A Brand facet may tell you that the search result contains 15 Canon cameras, 14 Nikon and 7 Sony cameras. If you want to refine you search to target a particular brand, you simply choose the corresponding facet value.
It was while reading a recent book on faceted search by Daniel Tunkelang that I came across the answer to a question that has troubled me for some time. What does facet values and their hit counts really say about the corresponding documents and their relevance to your query? How can you know up-front whether you’re making a good refinement choice or not? It defeats their purpose if facets and facet values are poor indicators of relevance, and I would like to know how facet ranking and presentation effects the user experience. Are facet values with high counts always more relevant than those with low counts?
With ranked retrieval (as opposed to set retrieval) documents are scored according to how well they match the user’s query, and the documents in the search result are ranked on this score, sorted from highest to lowest. A query that favors recall may cast a wide net, allowing the user to search freely through all parts of the document, perhaps with linguistic processing and synonym expansion applied to the query. Such a query will hopefully retrieve most of the interesting documents, but it may also dig up a lot of documents that are not particularly relevant.
It’s usually safe to assume that ordinary relevance ranking will banish the lesser relevant documents to the dark depths of the search result, well hidden from all but the most insistent users. Faceted search is played by a different set of rules, however, and low ranking documents may very well contribute significantly to the facets seen by the user. Facet values are usually sorted on hit count (if not alphabetically or hierarchical), but this ranking does not necessarily reflect the relevance ranking of the documents themselves. On the contrary, a facet value with a high count is not more relevant if it represents many irrelevant documents, as the case may be with a query that favors recall.
Knowing about the hidden menace of mindless recall, we can take some measures to ensure a satisfactory faceted search user experience:
- Increase precision by restricting queries to specific document fields, and limit the use of linguistic processing. It may not be necessary to search everywhere with full-blown synonym expansion and phonetic normalization of the query. In short, err on the side of high precision instead of recall if possible.
- Restrict facets to summarize only the most relevant part of the search result. Computing facets from e.g. just the 4000 highest ranked documents may dampen the noise introduced by lesser relevant documents. In FAST ESP, this technique is referred to as shallow navigators. Others may call it hedging.
- Rank facet values according to a relevance score based on document relevancy. I imagine it would be possible to compute a utility measure similar to tf-idf suitable for ranking facets, and that this measure would favor facet values originating mainly from high ranking documents. I admit this is speculation on my part, and mostly off the top of my head, and some readers may even tell me that this feature is out-of-the-box in their favorite enterprise search software.
Finally, Faceted Search: The Book is in my opinion a must-read for all enterprise search practitioners. I give it my warmest recommendations.