Search is a wicked problem, with no apparent universal solution in sight. Different technologies and approaches to search exist side by side, serving a multitude of business goals and user needs. In my work with search user experience I find it important to understand the particular strengths and weaknesses of search concepts like Best Bet and Faceted Search, since part of my job is to correctly align goals and needs with available technology. A poor choice of technologies or design patterns could very well cripple the entire user experience. But how do I know which concepts will work?
To answer my own question, I’ve spent some time the last few days putting together a topology of search concepts, which I’m thrilled to share with you now. I’ve made a scatter plot of a handful of search technologies and patterns, with the purpose of revealing some basic structure and similarities. The main elements of this topology are the set of search concepts, the 2 dimensions of the scatter plot, and the descriptions of each quadrant. You can see the result so far in the illustration at the top of this page.
This is still work in progress, so I’m eager to get feedback from you. I may have left things out, plotted things the wrong way, or described things poorly. Whatever you have to say, please leave a comment on this post.
Here is how I define the 2 dimensions of the scatter plot for the purpose of this topology:
- Algorithm vs. User Powered – In the absence of a better name, positioning along the algorithm vs. user powered dimension reflects to what extent human or machine intelligence is responsible for retrieving precise and accurate information in response to the query.
- Information Accessibility – By implementing a search concept for a given information space, information accessibility is a measure of how much easier it becomes to find any document of interest within that space. If the time it takes (and/or the number of steps required) to retrieve a particular document goes down, the general information accessibility goes up. Read more about accessibility in information retrieval.
The scatter plot is divided into 4 quadrants, so that the concepts positioned within each quadrant share a common set of characteristics related to business value, user experience and technological capabilities. Here is how I describe these quadrants:
- Simple Search – When you know what you want, and you can express that need with a few keywords, the simple approach to search fits the bill. It’s by no means trivial to create simple experiences, but the general information accessibility is quite low for these concepts. Google, FAST/Microsoft, Exalead and Lucene are some of the champions of simple search.
- Superficial Search – The superficial approach to search leverages user behavior to feel the pulse of communities, and to surface popular and current material. Superficial search is often very efficient when you don’t need to dig deep into the information space. Amazon, Twitter, PostRank and Twingly are some of the champions of superficial search.
- Ingenious Search – With a model-knows-best approach, ingenious search relies on sophisticated algorithms to determine user intent and content semantics. Clever algorithms can be said to be cost effective, but are comparably mode difficult to implement and execute well. Autonomy, Powerset, Grokker and Wolfram Alpha are some of the champions of ingenious search.
- Diligent Search – The diligent approach to search favors human intellectual effort over clever algorithms. Given an initial search result, users are asked to further disambiguate their queries in order to effectively explore the information space. Endeca, eBay, Freebase and Apache Solr are some of the champions of diligent search.
I won’t get crossed if you say my matrix looks a bit like Gartner’s Magic Quadrant, but there’s at least one big difference. Contrary to Gartner’s quadrants, this matrix does not suggest any better-than/worse-than relationships between the data points. Each quadrant is just different from the others, and the search technologies and patterns found within simply serve different purposes, having their own particular strengths and weaknesses.
Whether you agree or disagree with my way of mapping the world of search concepts, please share your thoughts. I’ll make sure to give credit to everybody who contributed significantly when I publish the final result. Cheers!
You may also download a poster with both the diagram and the quadrant descriptions.









A Topology of Search Concepts | The Noisy Channel says:
(editor’s note: still trying to fix broken backlinks)
How does this taxonomy address issues of interaction as applied to information seeking? How would you classify systems that make it easier to construct queries, to refine queries, to make sense of the results (obtained through whatever algorithms are available), etc. These interaction issues are important because they affect how usable (and therefore effective) systems that support information seeking truly are.
@Gene Golovchinsky
Thanks for the feedback!
If I understand you correctly, I believe the interaction issues you’re describing belong in the upper right quadrant, together with faceted search. I’ve named this quadrant “Diligent Search”, due to the combination of high information accessibility and user-powered retrieval. Does it sound alright to you?
I know you’re an expert on collaborative search. Where do you think that may fit into this taxonomy?
I am not sure that “user-powered” captures what’s going on. Perhaps “Empowered user” is a better way to look at it. Users can be empowered by making the latent structure of the data more apparent, they can be empowered by making it easier to express their information needs, by making it easier to do all this iteratively. Facets are one example; there are many others. But to empower users, we typically resort to some algorithmic processing. It seems more appropriate to think of the horizontal axis of your model as the degree of interaction the system accords the user. The vertical axis seems to reflect the depth of processing of the data to facilitate these possible interactions. Does that make sense?
To answer your question, collaboration seems to be orthogonal to this, although there may be some coupling of interactions between people and the underlying system, if that system represents people’s actions explicitly.
Yes, it makes sense (I think). In the upper right quadrant (diligent search) I’m trying to show that users can be responsible both for making latent structures of the data more apparent, and for iteratively expressing their information needs. That would include both faceted search and systems like Freebase, where users are the main driving force. This in contrast to systems where algorithms are used for data mining, clustering and NLP, which I would place in the upper left quadrant (ingenious search).
So maybe “empowered user” is a different dimension than user-powered, since a question answering service like Wolfram Alpha would be empowering users much like faceted search would. What you think about that?
I’m not quite sure I get what you mean with depth of processing of the data for the vertical axis. But I’m going to give it some thought.
And finally, collaborative search
Why does you feel that it’s orthogonal to accessibility and algorithm/user-powered? Can’t you place it in the diligent quadrant?
A good collection. Here are some additions:
Contextual Search – Search in a specific space or with useful contextual information
Vertical Search – a little narrower than contextual search
Semantic Search – Searching against well coded information
My quest for a better search:
http://dorai.wordpress.com/2007/04/25/searching-for-a-better-search/
Perhaps the reason I am finding it difficult to categorize collaborative search in this framework is that the term “collaborative search” can mean many different things. If you’re referring to something like SearchTogether, an interface-mediated search with data synchronization, then, yes, it probably belongs somewhere on the right side, although it’s not clear to me where on the vertical continuum to place it.
But there are ways of implementing collaboration that might be classified closer to the middle left of your chart, because the system winds up doing quite a bit to mediate the collaboration.
@Gene Golovchinsky
I can see why it’s difficult to categorize “collaborative search” as one thing. Interface-mediated search could probably fit in with faceted search and the other approaches in the upper right quadrant, which gives the user direct manipulative control over the search results. Algorithmically-mediated search sounds more like a black-boxed approach, and less “user powered”.
Suddenly I realize that “Algorithm v.s User Mediated” could be a better name for the horizontal axis. You can say that every search involves a process of resolving an information need, and this resolution process can be mediated by algorithms, users (alone or in groups), or a mix of both. Is that what you had in mind?
@Dorai Thodla
Thanks for the suggestions. The concepts you mention are perhaps a bit too broad for this model, but I’m sure they can be broken down into smaller concepts, which would be easier to place.
@Vegard I tried to make a stab an explanation in this blog post. Maybe it will make more sense.
I wonder where you would place “serendipitous” vs. more “deterministic” approaches.
Also how about structured vs. unstructured, i.e. where structured is assumed to be anchored by a taxonomy/vocabulary (someone referred to semantic search). I’m referring to the difference between searching on the content itself vs. the meta of the content.
@William Mougayar
I think deterministic vs. serendipitous is more of an orthogonal dimension to the ones that I used for the plot. Serendipity may be characteristic property of any of the four search approaches. What kind of analysis would you get from changing the axes, do you think?
Unstructured vs. structured is another viable option for choosing axes. This is still work ib progress. However, I’m quite pleased with the resulting quadrant and how they can be used as a framework for understanding the potential impact of search technology in terms of business goals and user needs. Thanks for the feedback!
I am not sure “accessibility” is a good term for the vertical axis here. It seems that the upper quadrants here represent those systems that go beyond simple search (where results tend to be just a one dimension list) by trying to provide users with a better information seeking experience with a richer presentation (e.g. hierarchical or multi-dimensional data) or further interactions of the result set.
Hi Jay!
Your comment is very perceptive. Finding the “right” dimension for the vertical axis seems to be difficult, and “accessibility” is just the one that makes the most sense for me so far. I agree that rich presentation/interaction is another way to describe what is happening as search technology is used to dig deeper into the information space, either algorithmically or through user-system communication.
In your opinion, what would be a better interpretation of “accessibility”, and what kind of categorization would that lead to (if different from the one I have proposed)?
BTW, the authors of the paper on accessibility in information retrieval tells me that they have come to prefer the term retrievability over accessibility (because of the obvious confusion with general web accessibility): http://www.dcs.gla.ac.uk/publications/PAPERS/8984/fp0120-azzopardi.pdf
Based on the general conception and the interpretation from your linked papers that “accessibility” tends to be in line with concepts like “searchability” and “retrievability” that measure how good a particular document can be accessed for a given IR model. In this regard, traditional simple presentation of list of search result models may not necessarily be weaker than a more sophisticated presentation model.
I cannot find a good term to better describe your vertical axis. It could be something along the lines of “information richness” or “information synthesis”. Basically, as you have described, those are the systems that try to present “information rich” results to better satisfy user’s information seeking need.
I have spent a lot of time thinking about this in positioning my business, True Knowledge.
Here are some other concepts/axes for you to consider:
Structured versus Unstructured
For me this is about the knowledge source. Is it unstructured natural language like what appears in web pages or structured data that computers can process and reason with. Most of the main search engines uses unstructured web pages as their primary knowledge source but also have databases of structured knowledge they use for certain types of response.
Open Domain search versus Vertical search
Google, Bing, Ask etc. are open domain – as is True Knowledge. Many other search companies specialise in a narrow area and are only interested in information that falls within that area.
Statistical versus Logical
Most search engines use statistical techniques to turn up results. Others (True Knowledge and Wolfram Alpha) generate responses using calculation and logical steps for which statistics is not part of the process.
Keyword versus Question Answering
Natural language questions are the natural way that humans request information. The statistical techniques used by search engines have taught users to present most search queries using keywords.
Hi William,
Thanks for sharing your insight! Here’s my immediate reaction:
* Structured versus Unstructured – I think this could replace the dimension I have referred to as “information accessibility”. Ingenious and Diligent search concepts would require more structure to content, query (or both) than required by simple keyword-based web search and even (superficial) collaborative filtering. Several readers have objected to my use of information accessibility, and I’m seriously reconsidering my choice.
* Open Domain versus Vertical – An interesting dimension if my goal was to classify search services or vendors. It relates more to business goals than search concepts in general. Both Bing and Google can be positioned in several of the 4 quadrants.
* Statistical versus Logical – An interesting dimension once again, this time if my goal was to classify search technologies. I prefer to treat a concept like Question Answering as one, not focusing on the underlaying technology, which may be statistical or logical.
* Keyword versus Question Answering – This is on the other hand closer again to search user experience concepts, but it’s to narrow for this particular analysis.
The question you’re bringing up is very interesting, though. Natural language is undeniably our natural way of requesting information from other humans. But computers have only been around for a short while (in evolution time), and there is still no established “natural” way of communicating with a machine. As you say, search engines have taught us to present searches as keywords, and that is in my opinion the most natural way for us to request information from them – today. Tomorrow may bring us something entirely different, like gestural/natural interfaces, but that is all open to speculation.
Thanks for the interesting approach. I think your dimensions are important and insightful.
One measure that seems to be missing is whether or not the search is based on the object itself (self-referential) or based on information about the object (meta-data). For example, Google searches are primarily self-referential — their search results are collections of web pages that were themselves the objects of the search. Google then adds PageRank, a meta-data element to enhance the sorting of the results.
Now think about searching for people, restaurants, movies, songs, etc. If you want to find a song to listen to, it is difficult to search the song itself, so now you have to rely on meta-data. Consider these three examples of companies using meta-data.
Netflix (www.netflix.com) relates the ratings of all users to predict the rating of an individual user.
At Nanocrowd (www.nanocrowd.com), we also search for movies using meta-data, but we apply semantic analysis of viewer comments. By analyzing what people say about movies, we can organize, summarize, rate, and find similar movies.
Other companies, like Pandora (www.pandora.com) hire people to study songs and add their own meta-data.
These three types of meta-data searches are clearly in your “ingenious” quadrant, but I think identifying whether or not search methods are self-referential is important to classifying search. For example, why is Google so bad at finding a book to read or movie to watch? Why is Bing unable to tell you what song you would like? No matter how refined their algorithms get, they are not working with the right data to find popular media.
How would you introduce this concept of self-referential search vs. meta-data search as an element of your classification?
Hi Roderic,
thanks for calling my work interesting and insightful!
I think your dichotomy is interesting and insightful as well, and it’s definitely worth a separate discussion. It could be that self-referential versus meta-data search is a necessary extension to this framework. I can only hope to scratch the surface here and now.
I agree that meta-data search (non self-referential) belongs in the Ingenious quadrant. The purpose of these concepts seems to be finding answers, not simply links to documents. Answers would in this case be the objects (people, songs and films) referred to by meta-data. These objects map better to the users mental model (made up of people, songs and films) than self-referential documents.
Would you agree that (semantic) structure is the key to ingenious meta-data search, as opposed to unstructured self-referential web search?
I think “Structured versus Unstructured” (really, the degree of structure) is a more exact description of the dimension you’re reaching for – degree of structure in the information being searched, the query, or both.
Self-referential versus meta-data approaches seems to be more related to recommendations than search – intertwined, to be sure, but not exactly the same set of user goals being addressed.
Thank you, John! My sentiments exactly
Hi Vegard and John,
Thanks again for the discussion!
I certainly don’t think the idea of self-referential vs. meta-data search is the most important element of search topology, but I think it plays a greater role than the notion of unstructured vs. structured data. Whether you are building a search engine or a recommendation engine, I believe this issue remains important.
Let me try to clarify my thoughts…
There are many objects that don’t lend themselves to self-referential search. Objects like movies, songs, buildings, and pottery, for example. For these types of objects there is both structured meta data (ratings, genres, actors, size, shape,…) and unstructured meta data (reviews, comments, descriptions, rants,…). Analysis of this meta data is vital to any type of search for these objects.
At Netflix, they use structured meta data for their search engine (genres, ratings, actors, directors, etc.) and for their recommendation engine (ratings).
At Nanocrowd, we work strictly with unstructured commentary for our analysis. Based on that unstructured meta data, we create new meta data (nanogenres, ratings, most-like objects, and nutshells). So far we have primarily used this data as a recommendation engine, but we envision tools that will help you to predict if you will like an object or to search for one based on actor, director, words, etc.
Of course, people have already commented on how they are scraping our structured meta objects to create new methods for understanding the objects themselves. Reminds me of the cycle of life…
Does this help?
Roderic
I think so – the concept of self-referential, meaning having to do with the identity of the object, vs metadata, meaning having to do with placing it in some larger context – is clear, and an important distinction in both search and recommendations, as you say.
If you think about it in terms of a hypothetical ontology, you’d certainly expect to find more semantic structure in the first than in the second.
Ironically, there’s a lot of confusion though around what the term “meta-data” means though.
[...] want to write an e-book about search user experience, based on some of my latest blog posts (and all the great discussions they have sparked). I started writing this summer, and [...]