At the heart of every great enterprise search user experience lies a clear and concise information architecture. Why – because search without structure is nothing but a disheartening solitary struggle to make sense of chaos. In order to implement a successful and sustainable search information architecture, you need to identify and design around relevant information entities, categories and facets. Categories in particular have a special role to play in the forming of search information architecture, one which reaches far into the following concept development and interaction design.
Why am I so geared-up about categories? I believe that design transcends technology, and that a well-designed (and possibly low-tech) enterprise search provides a better return on investment for businesses aiming to provide employees or customers with the best possible information access. And dividing by categories is one of the most effective ways to design coherent and noise-free search results, addressing problems that can’t be solved with relevancy alone.
Entities are the information archetypes we are seeking, pieces of information that documents are merely containers for. In a corporate setting examples of such entities may be employees, customers, projects, case files etc. Categories are natural and meaningful groupings of these entities, ideally satisfying several criteria that we’ll be looking into shortly. Facets are useful dimensions for refining the selection of entities, and groups of facets usually cluster around categories. Following our corporate examples, facets for employees may be years of employment, department, professional skills etc.
When you’re working to discover a suitable information architecture for an enterprise search solution, be on the lookout for categories that satisfy the following 5 criteria (in decreasing order of importance).
1. Matching the Users Mental Model
As humans we’re masters of categorization. We do it all the time, and it’s how we make sense of the world. When I see a dog, I instantly know that it is also a mammal, that it’s name is Fido, that it has an owner who’s name is Bob and that Bob is also my neighbor. I know this because previous experience has allowed me to form the categories dog, mammal, Fido, dog owner, Bob and neighbor, as well as the meaningful relationships between them, together making up a mental model. (I realize that it may be more correct to call this model an ontology, but I’ll stick to mental model for now.)
Mental models hold true for digital information as well, and search result categories should match the mental model of the user. An information scent is then provided to the user, helping her anticipate the consequence of choosing to see search results from just one particular category, and building her confidence in the effectiveness of doing so. Card sorting is one way to discover with the help from users, what the natural and meaningful categories should be for a particular domain and enterprise search solution.
One good example of search categories have recently been unofficially launched by Google, as an improvement to their also-quite-recent search options. Categories like images, news, books, and maps are easy to comprehend, and it’s a good fit to our mental model of what is out there on the Internet.
2. Effective Disambiguation of Queries
Search terms may be ambiguous, meaning that the user hasn’t provided enough information about the exact intent of the query. Does dolphins refer to Miami Dolphins (an American football team), or marine mammals? In the spirit of HCIR and exploratory search, it makes sense to simply ask the user to clarify the confusion – to disambiguate the query – rather than just guessing. Effective disambiguation is then the starting point for further refinements of the search results.
Bing provides a dynamic set of category refinement for most queries, capturing some of the different word senses for the search terms. The categories are semantically related (mined from search logs I believe) and therefore give rise to meaningful disambiguation.
3. Unique Facet Combinations
Facets describe what is typical about a type of documents or information entities, and which attributes they have in common. All cars have a make, a model, a color, a mileage etc. All employees have a title, a department, a supervisor etc. These attributes represent facets of that it means to be a car or an employee, and they are effective tools for search results refinement and clustering.
Entities described by a distinct set of facets should be separated from other entities described by different sets of facets. Using categories is a natural way to achieve this separation in a search user interface. Cars don’t have a department like employees do, and employees don’t have a color like cars do (it would be highly unethical to filter employees by color, at least). What happens then, if cars and employees are mixed together in the search result, and the user refines by department? All cars disappear, contrary to the impression that cars and employees alike are governed by the same department facet.
It’s better to avoid this possible confusion by separating the categories, and to show unique facets only when the user has chosen a particular category. Common facets can be shown all along, of course.
4. No Need for Cross-Comparison Between Categories
Categories, like any form of tabbed navigation, has at least one major drawback. By physically separating entities in the user interface, comparison across categories becomes difficult. If the user needs to compare features or prices between different makes and models of washing machines, do not organize these items in separate tabs. Items must be shown side-by-side if effective comparisons are to be made. Otherwise the user is forced to pogo-stick between tabs or dialogs.
Same applies for categories in search results. Do not place entities into tabs that need to be compared, like employees working in different departments or customers from different countries. God says that the only right use of tabs in user interfaces are for alternative views of the same information. That makes it sound like tabs should be great for alternative list views and map views in classifieds search results. I won’t dive into that problem right now, and I hope to write more about it some other time.
5. Even Distribution Between Categories
The utility of search result refinements (including categories) increase with even distribution of documents/entities across categories. With an even distribution, each category holds roughly an equal number of items. That makes each category an effective segmentation of the result set. Two categories with a distribution of 99%-1% is not very effective. If you select the first category you still got 99% of the search results to sort through. A 50%-50% (or any other even distribution) is generally preferable, provided that this constraint does not violate the previously mentioned criteria of good match with mental models, effective query disambiguation, unique facet combinations and no need for cross-comparison between categories.











Ontologies vs. mental models
Great blog post, Vegard! Just a small comment on the topic on mental models vs ontologies: You do perfectly right in calling the categories in your mind a mental model rather than an ontology.
While an ontology (along the lines of Gruber’s definition) is a formal, explicit specification of some conceptualization that allows for categorisation, and (if it is any good) some degree of reasoning, our mental models are implicit, flexible and can be re-arranged in an ad-hoc manner any given time. Within our mental models we can “pogo-stick” between our internal tabs in a ridiculously effortless manner. Which is, for example, why we understand jokes (if the joke is any good).
This, needless to say, is why categorisation is so hard and why it so often fails. Which underlines the importance of your first point: Get into your users’ heads and find out how they organise their world.
Great blog post Vegard! I love your thinking about thinking.
Great article Vegard. The idea of identifying structure and scents is crucial in planning for an enterprise search application.
However, applying the model to your data is very often a tough job. How to you tag all those documents, intranet pages, articles and other pieces of information with the right category(ies) without relying on too much manual human effort, and without burdening information producers with the boring task of filling in a ton of meta data (which in practice will not be done according to experience)?
For instance, how do you come up with the disambiguation question for “Dolphins”? Or “Apple”? Manual classification can be a veery time consuming endeavor and even building the rules for automatic classification is most often beyond the clients budgets and expectations.
I’ve found that, for domains with mainly unstructured data, build only a small category tree for categories which can easily and reliably be deduced from the data itself or metadata such as data source, file path etc. For digging deeper, rely on identifying meaningful entities of information and extract those automatically from the text and offer them as facets or tag clouds. Structured, normalized data soruces are of course much easier to deal with.
@JanHøydahl
Excellent insights on working with unstructured data. Keep ambitions to a minimum, apply a few fixed top level categories, and expose more sophisticated (and possibly unreliable) metadata as facets within each category. This is how I percieve Google’s new search options.
The dolphins/apple example is a tougher one, and I expect it will require some kind of clustering or semantic analysis to produce categories suitable to query disambiguation. It’s certainly not a manual effort. I believe Bing does something along these lines.
I try to avoid the model problem and the need for manual tagging by insisting on keeping the model as simple as possible. Better to make something that actually works, rather than something that only looks good on paper. Reality doesn’t always live up to our expectations
@TillCLech
Please do me a favor and write a blog post on ontologies and their role in search. I need to learn from you
Thanks for the excellent explanation!