24 December 2010

The Two Kinds of Search: Locating Items vs. Locating Information

The advent of search engines has changed the world twice. What Google now rules, having toppled Yahoo and Alta Vista, Lycos and Excite, is web search. We all know it, we all use it, and unquestionably the web would be a nightmare without it.

The second time search changed the world was when it came to the personal data we keep on our own computers, or privately, online. Perhaps Apple was first, with Spotlight in OS X, which gave the ability to search essentially everything on your hard disk; it is fabulous. Soon after, Google offered its own Desktop Search, and Microsoft added similar capabilities to Windows, starting with Vista. (I might actually argue that Palm was first: a really striking feature of even the earliest Palm Pilots was integrated search across all your data—contacts, notes, to-do lists, calendar and more. Amazingly, I know of no smart phone that does it as well, even today. On the iPhone, for example, you can’t search on photo names; it’s infuriating.) Google, of course, also offers search across Gmail, and encourages people never to discard a message.

But the real distinction I want to make is not between these two search revolutions, but rather between searching for a particular item—one that you know exists and simply need to locate—and searching for information on a topic with no special knowledge of where, or in what form, that information exists.

These are quite different activities, and the former—locating a specific item—is general harder. This is counter-intuitive. How can searching for something on your own hard disk possibly be harder than finding someone on the countless billions of pages that form the web?

The answer, of course, is that if it is just information you want, there’s a good chance it exists in many forms, in different places. Subject to things like authority, credibility and verifiability, any source (or sources) will do; whereas by definition, when you are searching for a specific item, success requires finding precisely that item.

The problem with searching email

In my experience, email is the hardest information for most people to search successfully. Quite a lot of the time, full-text search barely helps, because it fails to narrow down the information enough, or unwittingly excludes the item you are searching for. Arguably, people are now worse off in this respect than before the advent of local data search, because we are often lulled into a false sense of security, believing that, contrary to our regular experience, we will be able to find it, and thus being freed of the burden to organize our information.

The first is a problem because by the time we come to search for email, we often have only a hazy idea of anything that search might be able to latch onto that really distinguishes that one key email. We might know who it’s from Jo; but perhaps not which email address she used, or whether her or full is in the message anywhere. And if it’s someone you swap mails with a lot, this could still be hundreds or thousands of messages.

We will usually some idea about the date. But even that might be fairly hazy (or in my case, plain wrong).

The hardest problem is that of choosing the other search terms. Most modern search is essentially literal in character—while it will perform stemming (making run and running equivalent; perhaps even ran) and handle simple spelling variants, it is deliberately not semantic; that is, if you search on ‘run’, the search makes no attempt to match ‘sprint’, for example. (There are search engines that are semantic, but mostly this approach is not favoured, and not found to be helpful for internet search.)

In the context of searching the web, this exactness doesn’t matter for reasons that are directly linked to the fact that you are not looking for a particular item: you just want information. You might by-pass sources of information that use a different word, but as long as some useful sources do use it, that isn’t really problematical. The link structure of the web also works to your advantage here, with one page leading to another.

None of this applies in email. If you search on ‘run’ but the email only talks about sprinting, the mail won’t match, and there is unlikely to be another linking to it. So standard search approaches exclude emails that you want to find because of their precision. At the same time, any email that does match your terms will be included. If any ranking of results is shown at all, it will probably be unhelpful, so you end up feeling as if you need search within the search results. But you cannot, not for technical reasons but because you don’t know how to refine it further and if you do you’re almost as likely to exclude that which you seek as that which you do not.

The Genius of Tagging

One crucial difference between organizing with tags and organizing in folders is that you don’t have to choose a single location for something. If you tag regularly, you find that it imposes a very low overhead (much lower, I find, than choosing which folder to put a message into) and that you quickly develop a standard vocabularly of tags that you use without really thinking about it. A good bias to have is “when in doubt, add the tag”, i.e. if you think there’s any chance at all it might be useful, add it. Tags are cheap.

But the true genius of tagging is not that items that contain the tag; it’s the ones that don’t. This is a case of the dog that doesn’t bark in the night.

For by tagging a particular set of messages with “running” (say), I am also (implicitly) not tagging all the other messages—including the ones that include words like run—with running. That difference is critical. When you look for a set of items that you have tagged with a given word, you are looking that things for which that word (that tag) are important, rather than incidental. This is the genius of tagging. It not only identifies that which you tag as (potentially) relevant; it also implicitly marks everything else as probably not relevant to that tag.

When looking for a particular item, that can make the difference between success and failure.

No comments:

Post a Comment

Labels