31 December 2010

The FluidDB People App from @paparent

@paparent, author of the very useful FluidDB Explorer, has produced a simple app that shows the locations of FluidDB people on a Google Map. It’s inspired by the Djangopeople application, but so far has only the map.

If you’d like to add yourself (the more the better) you just need to add three tags into FluidDB—two to give your latitude and longitude, and one to tell @paparent’s app about you.

To do this all you need is a FluidDB account and some way of writing to FluidDB. It’s pretty easy using the command line from my fdb.py library, so I thought I’d quickly document how to do it.

The People application expects you to tag your user object in FluidDB with numeric tags in your namespace called people/latitude and people/longitude. So for me (njr), I had to do this:

fdb tag -a 'Object for the user named njr' people/latitude=55.8817504514
fdb tag -a 'Object for the user named njr' people/longitude=-3.10550451279

(You can see the tags on my object here, assuming you’re using a browser that isn’t Internet Explorer.)

The application collects users on the FluidDB objects with the about tag collection:peopleapp (UUID 62397fb0-6f96-404a-a8fa-ee3758cfa7f2) and looks for tags called peopleapp. So for me, that meant doing the following:

fdb tag -a 'collection:peopleapp' peopleapp

And that’s it.

Notes¶

A few things might not be clear, so I’ll explain them briefly.

My library, fdb.py, automatically creates namespaces and tags as required, so there was no need to create the njr/people namespace or any of the tags: when you write to them, fdb.py assumes you’d like them to exist.
I’ve left my username off the tags because fdb.py assumes that the tags you refer to are yours unless you stick a / at the start. So when I say people/latitide, fdb.py adds in njr/ at the start for me. If I want to refer to (say) paparent‘s latitude in fdb.py, I say /paparent/people/latitude. (This is slightly non-standard in FluidDB, but very convenient.)
The final tag command doesn’t set a value: this is fine, as FluidDB doesn’t require tags to have values (or, more accurately, allows null values, which is what this command sets). @paparent‘s people app doesn’t require a value on the tag, so this is enough.
In case it isn’t clear, the -a 'Object for the user named njr' is specifying the object to be tagged by its about tag. FluidDB creates an object for each user, and always gives it an about tag of this form.

So: go add yourself!

UPDATE¶

@paparent has added instructions for how to add yourself to the map using his FluidDB Explorer and Holger Durer (@HDurer) has written a blog post about how to use his emacs mode fluiddb.el to add yourself to the map without having to leave the comfort of emacs. How wonderful is that?

29 December 2010

Abouttag (library/web app/visualizations): Update

I’ve made a few changes to the abouttag library (for generating canonical about tags according to various conventions documented here, and to the online app, available at abouttag.com.

Interface¶

I’ve added a simple interface to abouttag.com so that you can now just put in an about tag or an object ID, or choose a class of Thing and type in key information about that thing. The website can then generate the about tag for you or take you to a (live) visualization of that object in FluidDB, with its tags and values. The diagrams have been updated slightly, to make them reflect more precisely the way tags are references in FluidDB itself.

Here’s an example, that should work with almost any modern web browser except internet explorer (Firefox, Chrome, Safari, Opera, iPhone, Android etc. should all be fine): planet:Mercury.

Library¶

The library has been expanded slightly to include twitter users, following the conventions adopted by the Tickery web application.

25 December 2010

God Rest Ye Social Bookmarkers

24 December 2010

What Would Wikipedia Do? (WWWD)

Readers of this blog will know that some of us have long been interested in the subject of about tags and how to choose them in FluidDB. The question is important, because FluidDB’s information sharing paradigm is predicated on the idea that different users and applications share data through tagging common objects, and the about tag is the primary way to decide which object that should be. This post discusses a new idea Terry (@terrycojones) and I cooked up about a week ago. At this stage, I’m not proposing it as a convention, but simply discussing it as a possibility and soliciting comments.

WWWD / WDWD?¶

The kernel of idea is to follow wikipedia’s taxonomy by setting FluidDB about tags to the relative path for the wikipedia page for the entity of interest.

For example, the (english language) page for the Eiffel Tower has the page title “Eiffel Tower” and the URL

http://en.wikipedia.org/wiki/Eiffel_Tower

The putative convention under discussion, which I will tentatively call wwwd-1 for the rest of this post, would therefore be

Eiffel_Tower

i.e., if you wanted to tag the Eiffel Tower in FluidDB, the object you would use is the one whose about tag is Eiffel_Tower.

Arguably, with this example, it’s more a case of “What does Wikipedia do?” that “What would Wikipedia do?”, but we’ll come to that later.

So at its simplest, this convention says that to find the wwwd-1 about tag for an object that already exists in the English language edition of Wikipedia, you set the about tag to everything that follows http://en.wikipedia.org/wiki/ in the relevant page URL (without a trailing slash). [1]

The Case For wwwd-1¶

This convention has a number of things going for it, as well as a number of drawbacks. Let’s take the positives first.

1. Pragmatically, it’s easy. Wikipedia exists, is free, and is available to anyone with an internet connection (censorship aside). It is huge, beautiful and widely read, and it represents the work of countless thousands of (mostly) dedicated and intelligent humans, who have already created a rich taxonomy that—imperfect as it must be—has already disambiguated millions of terms. Following its lead is nothing if not pragmatic.

2. The Wikipedia URLs allow obvious things to have obvious, minimal URLs. Generalizing wildly, the most common pattern for URLs in wikipedia is the name of the entity with spaces replaced by underscores, punctuation %-encoded and articles stripped, in title case. Whether you like underscores or not (and I don’t), this is pretty simple, and is described (loosely) here. For example, here are some examples of about tags wwwd-1 would include:

Nelson_Mandela (the former South African President)

Gandhi (the Indian leader)

Terry_Jones (the Monty Python comedian)

Terry_Jones_(baseball) (the retired baseball player)

Earth (the planet)

Mercury_(planet) (the planet)

Mercury_(element) (the element)

Mercury_(mythology) (the God)

San_Andreas_Fault (the geological fault running through California

Prisoner’s_Dilemma (the prisoner’s dilemma)

Carbon (the element)

Apple (the fruit)

Apple_Inc (the company)

Ardbeg (the malt whisky distillery on Islay)

Pi (the ratio of a circle’s diameter to its radius)

Pi_(letter) (the sixteenth letter of the greek alphabet)

Milky_Way (the galaxy)

The_Milky_Way_(1969_film) (the film by Louis Buñuel)

Milky_Way_(song) (the song by Syd Barret)

Milky_Way_bar (a type of chocolate bar)

Beetle (the insect)

Volkswagen_Beetle (the car)

The_Beatles (a rather over-rated band from Liverpool)

Mozart (the composer)

The_Road_to_Wigan_Pier (the book by George Orwell)

I have glossed over some issues here, but this is not a bad set of about tags.

3. They allow us to avoid reinventing wheels. At the time I am drafting this (17:43 GMT on 24th December 2010) this wikipedia entry tells me that there are 3,511,257 entries in the english language edition. (As I proof-read the post, on 28th December, rather impressively the total appears to have increased to 3,514,459.) In the scheme of things, that isn’t all that many; if FluidDB has any success at all, we will end up with orders of magnitude more objects with about tags than that. But most of our entries will have natural about tags that will be very straightforward to determine. For example, URLs will (modulo canonicalization) be their own about tags. I think there is an excellent change that we can do better than Wikipedia for certain well-defined classes of numerous objects, such as books. But the huge value in Wikipedia’s taxonomy is that it deals with a very high proportion of the most noteworthy subjects in the world (almost by definition).

4. Even where entries to not exist in Wikipedia, they give us a kind of template to work from.

Disadvantages and Issues¶

While many of the benefits of the putative wwwd-1 convention are clear and impressive, there are a number of challenges, disadvantages and issues.

1. Splitting, Changing and Disambiguation. Although the URLs for longer-established Wikipedia pages are fairly stable now, they can and do change. This is not particularly problematical within Wikipedia, because there is only really one article on each page, controlled, in some cases, through reasoned debate and collective refinement, and in other cases, through edit wars. Presumably most people find articles by search most of the time anyway, and while it is clearly unfortunate if URLs change and therefore invalidate (or at least redirect) bookmarks, it is not very serious. The situation is very different in FluidDB: every user has his or her own tags, which can be attached to any object in the system. If the mainstream community collectively decides to move from (say) the object whose about tag is “Mercury” to represent the planet, to the object with the about tag “Mercury_(planet)”, it is no small or simple, automatic or guaranteed matter to get all the data moved. [2]

2. Non-uniformity. I have personally already used my Miró software to upload a structured form of data gathered from various wikipedia pages to FluidDB, including (funnily enough) every element in the period table, and every planet (and dwarf planet) in the solar system. I documented the conventions that I used as planet-1 and element-1 and they were very simple: data for Earth was stored on the object with about tag planet:Earth, data on the planet Mercury was stored on the object with the about tag planet:Mercury, and data on the element Mercury was stored on the object with about tag element:Mercury. All of this was very straightforward, and once the scheme is known, it is easy to write code to generate the about tag for the object, knowing only the name of the planet or element. In contrast, using the wwwd-1 convention, life is fairly easy for individual users wanting to add a tag to a single, particular object, but much harder for anyone wanting to upload data automatically. The non-uniformity means that you need to look up wikipedia before knowing what the about tag will be. This introduces a level of complexity not to be underestimated, and is, in my view, the single largest problem with the idea of wwwd-1.

3. Language Issues. For English speakers, there is a clear attraction to using unqualified English terms as the about tags in FluidDB. But what of the myriad other languages? One approach is to say all languages will be on the same footing, and they should all simply follow Wikipedia’s relative URLs in their own language. Mostly, different languages will use different obects for storing data about the same thing, but sometimes they will coincide; when they do coincide, the two languages will sometimes refer to the same real-world entity and sometimes to different ones. This is problematical in some situations, and less or not in others. Another option, is to follow wikipedia more closely and include a language code in the about tag. So we might have en:Earth, rather than Earth, and fr:Terre in French. If we fail to specify language, then English speakers will be likely to put information concerning café’s (informal restaurants) on the same object as french speakers will put information about coffee (the one with the about tag Cafe). More happily, if we avoid the prefix, information about Johnny Halliday will end up on the same object for both English and French speakers; indeed, this will be a particularly common pattern for entries about individual people, since their names are normally not translated, at least within languages with common or similar alphabets. (Mao, for example, would map to the about tag Mao_Zedong in both, even though his name originates in a complete different alphabet.)

4. Priority, Cultural Sensitivity and Longevity. One of the very attractive things about wwwd-1 is that the most common/likely meaning of a given term tends to win the battle for the cleanest, simplest name. This is deliberate. The disambiguation guidelines say (at the time of writing):

Although an ambiguous term may refer to more than one topic, it is often the case that one of these topics is highly likely much more likely than any other, and more likely than all the others combined – to be the subject being sought when a reader enters that ambiguous term in the Search box. If there is such a topic, then it is called the primary topic for that term. If a primary topic exists, the ambiguous term should be the title of, or redirect to, the article on that topic.

This is entirely sane and defensible. But it does mean that human judgement is required, and leads to possible charges of cultural imperialism and so forth. Terry Jones, of Monty Python fame, is surely the most famous Terry Jones and most likely target of any search in Wikipedia today. But if Fluidinfo were to take off and become the next Google, perhaps it would be a different Terry Jones who people would expect to find occupying the object with the about tag Terry Jones.

5. Ugliness and Humans vs. The Machines: wwwd-1T. It’s all very well to say that Wikipedia URLs are quite nice, simple and readable, but they are still designed for machines rather than human beings. After all, why Eiffel_Tower rather than Eiffel Tower? The answer is because best practice dictates that spaces in URLs usually be “%-encoded” as, so that, in a URL, the preferred (safe) form would be Eiffel%20Tower, which is clearly worse, not better, than Eiffel_Tower from most perspectives. The same goes for many other punctuation symbols. On the other hand, there is nothing to say that we need to use the URL, which is in any case largely automatically constructed from the page title, which does have actual spaces. So a variation of the proposal, which I actually prefer, is to use the Wikipedia page title, rather than the relative URL, as the about tag. Then we actually do put information on the Eiffel Tower on the FluidDB object with the about tag Eiffel Tower. We permit any unicode text in about tags anyway, so we might as well take advantage of that and use spaces as required. That way, when we look at the about tag, it is maximally readable, and says exactly what we want it say; and we don’t have to read through percent encodings, underscores and all that other nasty computer stuff. I will call this alternative putative convention wwwd-1T.

6. Standardization. There remain some questions about the precise about tag to use, even if following either wwwd-1 or wwwd-1T, and this arises somewhat inevitably from the fact that ordinarily URLs are not taken to be case sensitive. [3] So while the fairly strong Wikipedia convention seems to be to use Title Case, in fact this is not always followed, and nothing breaks when it is not. A case in point concerns the San Andreas Fault. While I said, above, that wwwd-1 would give us San_Andreas_Fault, at the time of writing, a search on San Andreas Fault returns a URL ending with San_andreas_fault, though the page title is San Andreas Fault. FluidDB is case-sensitive, so we would need to decide. From a readability perspective, Title Case is clearly preferable; but title case is intrinsically ambiguous other than in the context of a fixed algorithm, and there will inevitably be failures as a result of getting the case wrong. A more pragmatic suggestion might be to use all lower case, as I suggested in the book-1 convention. I will call this third variant wwwd-1L for now.

Clearly, this is a complicated issue. To my mind there are significant merits in the idea, particularly in the wwwd-1T and wwwd-1L variants, but there are also significant problems. More than anything else, I think the convention is pretty good for human end users, but pretty difficult for machines (or to put it a different way, for application writers). If it were adopted, I think it would be best adopted as a deafult convention for the case when there is no other established/better convention, and we would be better to establish alternative conventions like book-1 for classes of objects that are numerous, mostly not included in Wikipedia and capable of being generated automatically from readily available information. In cases where we departed from the relevant wwwd convention, it would be particularly fanastic if we (collectively) added some kind of pointer from from the object that would be used under that convention to the page we actually use, in cases where the wikipedia page exists. For example, we would point from The_Road_to_Wigan_Pier to book: the road to wigan pier (george orwell). (Needless to say, we can look forward to a time when not only do we have a pointer from our FluidDB objects to the corresponding Wikipedia page (or pages), where such exists, but Wikipedia has pointers back to FluidDB. If only there were some way for us to add such pointers in Wikipedia . . .)

I’d be fascinated to gather views, in the comments or elsewhere.

[1]	To be clear, the putative `wwwd-1` convention is not a convention for tagging Wikipedia pages. To do that, you simply use the full page URL (preferably canonicalized using the url-1 convention. The proposed convention is simply based on the Wikipedia page naming convention (practice).

[2]

When I say ‘the community decides to move’ I am not referring to any kind of formal procedure: there are no such procedures with FluidDB. I can put my information wherever I like, and that is unlikely to change. I really mean the conventions in use, whether agreed in some way or simply used by the bulk of FluidDB users.

[3]	More particularly, Wikipedia does not interpret URLs in a case-sensitive manner.

The Two Kinds of Search: Locating Items vs. Locating Information

The advent of search engines has changed the world twice. What Google now rules, having toppled Yahoo and Alta Vista, Lycos and Excite, is web search. We all know it, we all use it, and unquestionably the web would be a nightmare without it.

The second time search changed the world was when it came to the personal data we keep on our own computers, or privately, online. Perhaps Apple was first, with Spotlight in OS X, which gave the ability to search essentially everything on your hard disk; it is fabulous. Soon after, Google offered its own Desktop Search, and Microsoft added similar capabilities to Windows, starting with Vista. (I might actually argue that Palm was first: a really striking feature of even the earliest Palm Pilots was integrated search across all your data—contacts, notes, to-do lists, calendar and more. Amazingly, I know of no smart phone that does it as well, even today. On the iPhone, for example, you can’t search on photo names; it’s infuriating.) Google, of course, also offers search across Gmail, and encourages people never to discard a message.

But the real distinction I want to make is not between these two search revolutions, but rather between searching for a particular item—one that you know exists and simply need to locate—and searching for information on a topic with no special knowledge of where, or in what form, that information exists.

These are quite different activities, and the former—locating a specific item—is general harder. This is counter-intuitive. How can searching for something on your own hard disk possibly be harder than finding someone on the countless billions of pages that form the web?

The answer, of course, is that if it is just information you want, there’s a good chance it exists in many forms, in different places. Subject to things like authority, credibility and verifiability, any source (or sources) will do; whereas by definition, when you are searching for a specific item, success requires finding precisely that item.

The problem with searching email¶

In my experience, email is the hardest information for most people to search successfully. Quite a lot of the time, full-text search barely helps, because it fails to narrow down the information enough, or unwittingly excludes the item you are searching for. Arguably, people are now worse off in this respect than before the advent of local data search, because we are often lulled into a false sense of security, believing that, contrary to our regular experience, we will be able to find it, and thus being freed of the burden to organize our information.

The first is a problem because by the time we come to search for email, we often have only a hazy idea of anything that search might be able to latch onto that really distinguishes that one key email. We might know who it’s from Jo; but perhaps not which email address she used, or whether her or full is in the message anywhere. And if it’s someone you swap mails with a lot, this could still be hundreds or thousands of messages.

We will usually some idea about the date. But even that might be fairly hazy (or in my case, plain wrong).

The hardest problem is that of choosing the other search terms. Most modern search is essentially literal in character—while it will perform stemming (making run and running equivalent; perhaps even ran) and handle simple spelling variants, it is deliberately not semantic; that is, if you search on ‘run’, the search makes no attempt to match ‘sprint’, for example. (There are search engines that are semantic, but mostly this approach is not favoured, and not found to be helpful for internet search.)

In the context of searching the web, this exactness doesn’t matter for reasons that are directly linked to the fact that you are not looking for a particular item: you just want information. You might by-pass sources of information that use a different word, but as long as some useful sources do use it, that isn’t really problematical. The link structure of the web also works to your advantage here, with one page leading to another.

None of this applies in email. If you search on ‘run’ but the email only talks about sprinting, the mail won’t match, and there is unlikely to be another linking to it. So standard search approaches exclude emails that you want to find because of their precision. At the same time, any email that does match your terms will be included. If any ranking of results is shown at all, it will probably be unhelpful, so you end up feeling as if you need search within the search results. But you cannot, not for technical reasons but because you don’t know how to refine it further and if you do you’re almost as likely to exclude that which you seek as that which you do not.

The Genius of Tagging¶

One crucial difference between organizing with tags and organizing in folders is that you don’t have to choose a single location for something. If you tag regularly, you find that it imposes a very low overhead (much lower, I find, than choosing which folder to put a message into) and that you quickly develop a standard vocabularly of tags that you use without really thinking about it. A good bias to have is “when in doubt, add the tag”, i.e. if you think there’s any chance at all it might be useful, add it. Tags are cheap.

But the true genius of tagging is not that items that contain the tag; it’s the ones that don’t. This is a case of the dog that doesn’t bark in the night.

For by tagging a particular set of messages with “running” (say), I am also (implicitly) not tagging all the other messages—including the ones that include words like run—with running. That difference is critical. When you look for a set of items that you have tagged with a given word, you are looking that things for which that word (that tag) are important, rather than incidental. This is the genius of tagging. It not only identifies that which you tag as (potentially) relevant; it also implicitly marks everything else as probably not relevant to that tag.

When looking for a particular item, that can make the difference between success and failure.

23 December 2010

A Translation of Yahoo!'s "What’s Next for Delicious?" Blog Post

A translation of the Yahoo! blog post on Delicious. [With apologies to John Gruber, who (as far as I know) invented this ‘translation’ format.]

What’s Next for Delicious?¶

Many of you have read the news stories about Delicious that began appearing yesterday. We’re genuinely sorry to have these stories appear with so little context for our loyal users.

[Shit. This wasn’t supposed to happen. This will affect how much we can get for Delicious.]

While we can’t answer each of your questions individually, we wanted to address what we can at this stage and we promise to keep you posted as future plans get finalized.

[We’re not going to risk another SNAFU like this.]

Is Delicious being shut down? And should I be worried about my data?

No, we are not shutting down Delicious. While we have determined that there is not a strategic fit at Yahoo!, we believe there is a ideal home for Delicious outside of the company where it can be resourced to the level where it can be competitive.

[Technically, it’s still running. True we laid off the entire Delicious team, and everyone knows that once you get rid of the developers, a system is al but useless, but we’re going to highlight the technicality.]

[Yes, you should be extremely worried about your data.] [1]

What is Yahoo! going to do with Delicious?¶

We’re actively thinking about the future of Delicious and we believe there is a home outside the company that would make more sense for the service and our users.

[We’re sunsetting it with extreme prejudice.]

We’re in the process of exploring a variety of options and talking to companies right now. And we’ll share our plans with you as soon as we can.

[But not Joshua. Anyone but Joshua. Who does he think he is, carping from the sidelines as we kill his creation with that unique Yahoo! combination of neglect and active destruction? It’ll go to the highest bidder; or to the second highest bidder, if Joshua is the highest bidder.]

What if I want to get my bookmarks out of Delicious right away?¶

As noted above, there’s no reason to panic.

[There is every reason to panic. But we really don’t want anyone to dump your bookmarks out of Delicious right now. We desperately wanted to keep a lid on this so that the rats wouldn’t desert the sinking ship, thus compromising what anyone might might pay for said ship. Please don’t go.]

We are maintaining Delicious and encourage you to keep using it.

[We’ll keep the power on in the hope that some of you idiots don’t notice and keep using it till we can sell it. You’d better keep using it too, our we’ll be out of pocket big-time when it comes to the sale.]

That said, we have export options if you so choose.

[Run! For crying out loud, if you have any sense, grab your bookmarks and run.]

Additionally, many services provide the ability to import Delicious links and tags.

[Actually, that’s spot on.]

We can only imagine how upsetting the news coverage over the past 24 hours has been to many of you

[“We’re just shutting down delicious, not selling your children to gypsies. Get the fuck over it.” — @fakecarolbartz]

Speaking for our team, we were very disappointed by the way that this appeared in the press. We’ll let you know more as things develop.

[Speaking for Yahoo! (not the Delicous team, obviously; they have their pink slips): It’s so not fair. Yahoo! used to have the mojo. Now people treat us like we don’t even get the internet. As if. Just because Carol doesn’t have a flickr account doesn’t mean she still uses 35mm film and a fountain pen, you know. Yahoo!s are people too, and it really hurts when people like Thomas Hawk come up with crap like this:

Do you even realize what you have with Flickr? It’s the largest well organized library of images in the world. Not only that, it has a very strong social networking component. In fact, Flickr may represent (if managed correctly) your single biggest opportunity to launch a much larger and more lucrative social network (and stock photography agency as well). Have you spent any time in any Flickr groups? They are addicting. People live in them. They play games in them. All kinds of activity goes on in them every day. And if you took the time to really explore the social side of Flickr, you’d learn this, and figure out a way to grow it. (Quoted by Charles Arthur at Guardian Technology)

Tom Hawk is full of shit. Flickr’s next, and you can certain there’ll be no leaks this time. I’m sure Ballmer will give us a billion for it. Well, half a billion anyway. Hell, he could do that out of his own petty cash. Ballmer would be perfect for Delicious. He’d probably bring it up to date using Silverlight and ActiveX and make your bookmarks dance like Clippy.]

[1]

But in all seriousness, no one who takes any care should lose anything. First, having made this public commitment, Yahoo! would probably face an even bigger backlash if it deleted the data now. Secondly, Delicious has always had some of the best export options around, and just about every other bookmarking site on the web will import Delicious’s exports. Just go here and save the resulting XML file and you’ll be safe. Better still, import the result to Pinboard or another site of your choice.

21 December 2010

Del.icio.us Exporting And Alternatives: An Update

A few days ago, I blogged about some ways to get data out of del.icio.us and into FluidDB, and also about the fact that I was working on a kind-of old-style del.icio.us clone.

Things have moved on a little since then, so I thought I’d update.

First, although bad, the situation doesn’t look as dire as it did. By all accounts, the del.icio.us staff are gone, but Yahoo has made a very public statement that our bookmarks are safe, for the time being, the the service will continue to operate, and that its intention is to sell or otherwise migrate del.icio.us somewhere else, rather than simply to stop it. Given that del.icio.us has always had excellent export options, supported by (as far as I know) all of its competitors, there is certainly no reason why anyone aware of the situation should lose any significant amount of data.

Another way in which the situation has moved on, for me, is that I’ve discovered and signed up for Pinboard and started using that. Pinboard is the first alternative to del.icio.us that has felt like its developer was on the same wavelength as Joshua Schachter (who created del.icio.us). So far, I’m impressed with it. Although I don’t particularly like the aesthetic, I do like the minimalism. Functionally it looks strong and technically it appears credible. Despite some heavy breathing, it appears to have stood up well to a deluge of sign-ups and imports, and clearly has a energy and momentum in a useful direction; something that hasn’t been true of del.icio.us for far too long. It also has interesting and potentially useful extra features both in production and on its (commendably public) roadmap. I definitely wish Maciej Ceglowski and Peter Gadjokov, who run the site, all the best and hope that Pinboard site has a great future. Right now, it looks to me like the best alternative to del.icio.us on the net, and a better medium-term bet than del.icio.us itself.

None of this is to suggest that I don’t still think it’s an excellent idea for people to import their bookmarks into FluidDB, as discussed in previous posts; FluidDB is a completely different kind of system, allowing things it is most unlikely Pinboard will ever even wish to support. But to be clear, FluidDB alone is not a del.icio.us competitor, and could only be so if a client were developed. In fact, FluidDB’s ideal role is as a secondary store for bookmarks from any and all bookmarking sites, so if you use something else (or even just store bookmarks in your browser), that’s something to consider.

Finally, what of my plans? Well, I still fully intend to release a site that will include old-style del.icio.us functionality (not unlike Pinboard’s) with some new twists (mostly rather unlike Pinboard’s twists). The main focus of that site, however, will be content creation, with bookmarking as an organizational paradigm that can be extended to existing web resources, rather than a site developed with the primary goal of supplanting del.icio.us (or Pinboard, for that matter). Inevitably, even if things go supremely well, there will be a bedding down period, and it will be a while before I recommend anyone to use it as a primary repository. But now that the urgency has gone, I’ve decided to wait at least a month before opening it up in any form.. I have all my bookmarks in there (as well as in del.icio.us, Pinboard and FluidDB!), and am really enjoying being able to tune it to operate in the way I think it really should. I remain hopeful that some time fairly early in 2011, some others will come in give it a try. But for now, my advice is:

Export your bookmarks from del.icio.us regularly; (this is good practice regardless; I extract mine several times a week);

Try Pinboard if you haven’t given it a spin and want something in the mould of old-style delicious; (there’s a modest sign-up fee, currently $8.88, and rising in direct proportion to the number of users.)

If you’re so inclined, get a FluidDB account and import your bookmarks in there too, for the powerful query and extended tagging capabilities, the rich permissions and the cool (and useful) visualization that this brings.

Watch this space. I don’t know when, but some time fairly early in 2011 I hope to offer something else for people to try; I have high hopes for it.

20 December 2010

A Conversation with Eliza about Carol Bartz and Delicious

[Eliza is the built in pschotherapist in Emacs, my editor of choice. I thought I’d see what Eliza thought about the delicious situation.]

I am the psychotherapist. Please, describe your problems. Each time you are finished talking, type RET twice.

I think Carol Bartz hates delicious

Why do you say bartz hates delicious?

She plans to kill it

This is not a healthy way of thinking.

It certainly isn’t

What makes you believe it certainly is not?

Well, delicious is wonderful.

Earlier you said bartz hates delicious?

She laid off all its staff, apparently

Maybe your plans have something to do with this.

I don’t think so.

Is it because of your life that you say you do not think so?

I don’t think Carol Bartz knows my plans.

You mentioned that bartz hates delicious?

Yes. And apparently she doesn’t have a flickr account either.

Is it because of the people you hang around with that you are going through all this?

Why not?

delicious is an online service

Are you sure that delicious is an online service?

Yes

I see... Well, do you really think this is so?

That it’s an online service, or that Carol Bartz doesn’t get the internet and plans to kill delicious?

Is it because it’s an online service or that carol bartz does not get the internet and plans to kill delicious that you came to me?

Because Carol Bartz doesn’t get the internet and plans to kill delicious

Does it bother you that carol bartz does not get the internet and plans to kill delicious?

Yes

Yes, hmmm... so, what makes you think this is so?

Yahoo bought two of the most important properties on the internet and Carol doesn’t understand them.

Why do you say bartz hates delicious?

She sacked all the staff and plans to get rid of it

18 December 2010

Del.icio.us and I: Reflections and Alternative Alternatives

I’ve been captivated by del.icio.us from the moment I found the site in early 2004. I don’t remember how I found it, but it captivated me immediately. It was so minimal it wasn’t even clear what the site was for, but as soon as I figured it out, I was hooked. I think del.icio.us was (and remains) far more important and innovative than is generally recognized.

Hierarchical Storage vs. Tags¶

The trouble with hierarchical folders is that, in practice, they force me to choose a single place to put something. This was a serious problem for email, and remains so, to a lesser extent, for files. Since I have currently about 2,000 bookmarks, it’s also a problem for them.

The problem is perhaps clearest with email. All of the people I know who are good at finding old emails, without exception, chose, fairly early in their lives, a single organizational paradigm. I know some people who store email strictly by date. I know others who store it strictly by sender. And I know still others who store it by subject (though they are generally less successful at retrieval the the people who use one of the first two methods). I never decided, and I have always struggled to find old emails. It has always felt to me as if I need to put emails in multiple places, to reflect the fact that I will probably only half remember one detail when I’m searching for an email, and it’s very hard to predict what that thing will be.

This is the problem that del.icio.us solved for bookmarks with tags. By allowing me to attach as many relevant tags as I like to a bookmark, I almost never have any trouble finding it. Whereas I find it very difficult to anticipate the single category I will need to retrieve it in the future, it is remarkably easy to attach a handful of tags that will almost certainly mean that when I come back to look for it, I will find my bookmark quickly. As a result, since I started using del.icio.us, I have almost never struggled to find a website I’ve saved and tagged. It is remarkable, and it works for emails (and could work for files) as well.

(Search isn’t as good.)¶

(Many people claim that the advent of full text search has eliminated the need for organization. I disagree. While there is no question that Spotlight, on the Mac, full-text search in gmail, and equivalent solutions elsewhere, have been enormously positive, I find that I still struggle to find email, particularly, because I tend to get thousands of results when I search. The brilliance of tags is that not only can I identify bookmarks (or emails) of interest by using a tag; I also exclude all the items to which I didn’t attach that tag. This turns out to be almost more important.)

for:alex with love¶

Before the site even supported for: tags, I started tagging sites that I thought would be of interest to my son, Alex, with an alex tag. (Did this mess up the tagsonomy? Not obviously.) And he would periodically go to del.icio.us/njr/alex and find the sites I’d saved for him. I save origami sites for my mother, who folds, at del.icio.us/njr/origami. It works.

But del.icio.us then made this even better by introducing for: tags. I can now actually send bookmarks to Alex with a simply by using a for:alexradcliffe tag. When he goes to the site, he sees them. It’s mar.vello.us.

Love at First Site¶

Since adopting del.icio.us, I have used it more-or-less daily and have found it so spectacularly useful that I have built a number of aspects of my digital life around it. One of these is that I have a dense home page, for all of my browsers, that is built by extracting everything I’ve tagged with home and structuring them into a dense page that has all my most important sites. I have over a hundred links on this single dense page, and it serves most of my common internet needs, both on computers and (reformatted) on my phone. (Read this to see it, and get the code by following the instructions here , if you’d like your own).

Christmas Carol¶

Joshua Schachter, the banker-turned-internet-entrepreneur who built del.icio.us to solve his own need to organize and share bookmarks, sold delicious to Yahoo a few years ago. He stayed a while but quit when it was clear that Yahoo didn’t get delicious. It was apparently Joshua we have to thank for the tags in flickr as well, for (as I understand it) he spoke to Caterina and suggested that flickr needed tags. Flickr, of course, is also owned by Yahoo, and, from my perspective is the only other part of Yahoo that deserves any kind of future. But John Gruber at Daring Fireball reports that Carol Bartz, who unfortunately doesn’t get the internet, fired the entire del.icio.us team a couple of days ago as part of her plan to dump del.icio.us. (I think almost every change that team has made to del.icio.us since Joshua left has been retrograde; but I’m still not celebrating.) Charles Arthur, who does get the internet, nailed the Yahoo fiasco in the Guardian’s Technology Blog yesterday:

The trouble with all this? It’s on the internet, so Carol Bartz isn’t going to see it. If only there were some way to make it physical so she could read it . . .

Maybe she’ll dump flickr next; Charles Arthur reports that she doesn’t even have a flickr account.

Enter Terry Jones (@terrycojones; not the Python)¶

For much of the eighties and nineties I did research in the somewhat obscure and (then) emerging field of genetic algorithms. At conferences, I tended to spend time with Terry Jones, who worked directly with John Holland, the MacArthur genius who founded the field. Terry and I both went onto other things and we lost touch. But he was interesting, and I looked him up on the internet one day. He had a very eclectic home page that, among other things, included a set of papers he had written about computer storage mechanisms. He tried and failed numerous times to get these published. I read them and was captivated by the brilliance and beauty of the ideas in them.

In the papers, Terry discussed search and an embryonic form of tagging as the two core organizing principles that he thought should the basis for computation storage and retrieval. This was before del.icio.us, and before search assumed the prominence that it now enjoys. Terry’s tags (which he then called attributes) were more complex than del.icio.us tags, in that they carried values. So while in most tagging systems you can attach tags as labels to objects, in Terry’s mind you should be able to attach any information to anything using a tag. So at the simplest level, you could attaching a rating to something (I rate Fugitive Pieces, by Anne Michaels, 10). Or you could go further and attach an image or a webpage or anything at all, to anything else. It was extremely innovative, and the fact that he couldn’t get them published says a lot more about peer review than it does about the papers. (The papers are available here, here and here; the last was eventually published [1].)

Terry tried twice to build versions of his idea, but struggled and basically failed. I got in touch with him after reading his papers, and enthused, and I think he said I was essentially the first person who had ever liked his work in this area. He sounded quite depressed.

A bit later, another friend of his, Russell Manley (@rustlem) told him about del.icio.us, and this was the spur that made him try a third time to build his vision. This time, he sold his flat, created a company (Fluidinfo, in which I have invested and to which I am an advisor) and went for it. The result is Fluidinfo Inc, and its main product, FluidDB.

FluidDB¶

I think of FluidDB as like del.icio.us on steroids (though Terry doesn’t like that description of it). Seen through my permanent lens of del.icio.us, you can get to FluidDB through a series of generalizations of a social bookmarking site.

First, instead of just URLs, in FluidDB you can tag anything. FluidDB contains objects, and the objects can represent anything at all. They are identified by a special tag (the about tag, fluiddb/about) that can be used to identify the object. So I have bookmarks for websites in FluidDB, which are stored on objects whose about is the URL to which they refer. For example, I have a bookmark for entry in this blog describing how to tag books from the Guardian’s 1000 books everyone must read in FluidDB. In a modern, standards-compliant browser (essentially anything except Internet Explorer), you can see an image (generated live from FluidDB) showing my tags on that object by clicking this link. (FluidDB is completely compatible with Internet Explorer, but my graphical image generator for FluidDB is not.) Here’s a static snapshot of the same thing.

The next thing you add when transforming a social bookmarking site into FluidDB the ability (but no requirement) for tags to have values. For example, this image shows the FluidDB object corresponding to Mars, and an application called Miró has added a bunch of tags to it, with information about Mars. (If you had a FluidDB account, which you could, you could add your own information about Mars to the same object.) Again, here’s a static snapshot of the object.

The third thing you have to add to get FluidDB is a fine-grained permissions system. In del.icio.us, almost everything is shared, though there is the ability to mark a bookmark as private (which means that only you can see it.)

In FluidDB, every tag has its own permissions, with separate controls for reading and writing and an access-control list. For each of your tags, you can choose who can read them and who can write them, either including or excluding people, or making them completely private or completely public. It’s very powerful (see Permissions Worth Getting Excited About) and The Permissions Sketch for more details.)

The fourth thing you add to produce FluidDB is a simple but rich query language. For example, you can find all the planets heavier than Earth with the FluidDB query

miro/planets/Mass > 1.0

There are lots of tools around that let you issue queries against FluidDB (though it doesn’t really have its own interface yet). I have a command line tool that talks to fluidDB, and the command

fdb show -q 'miro/planets/Mass > 1.0' /miro/planets/Name

results in this output.

4 objects matched
Object e06bea33-a000-4294-a7b2-d3245f1481ca:
  /miro/planets/Name = "Saturn"
Object e9b022e6-c770-44ad-abaa-1a2cde9a3224:
  /miro/planets/Name = "Uranus"
Object 2994f561-8efe-4e13-9374-bf3f9436eac6:
  /miro/planets/Name = "Jupiter"
Object 72144788-a59e-4819-a9c9-6b8577e2695b:
  /miro/planets/Name = "Neptune"

You can see them in a modern browser by following the hyperlinks on the miro/planets/db-next-record-about tag on the live version of the image. You can also use a more point-and-click tool like @paparent’s FluidDB Explorer by visiting http://explorer.fluidinfo.com/fluiddb/ and typing miro/planets/Mass > 1.0 into the query box at the top right. (Don’t omit the .0; FluidDB is distressingly strict at the moment, though I am promised it will change.)

In a bookmarking context, this allows you to do queries like Show me all the pages that Terry and Russell have tagged with the tag fluiddb that I haven’t. This is considerably more flexible than del.icio.us or other bookmarking tools.

The last major thing you add to a social bookmarking site to get FluidDB is an API to allow applications to talk to it. Of course, del.icio.us has a (rather good) simple API, but you can’t do very much with it because part of del.icio.us’s excellence is that it can’t actually do very much. By definition, anything that you can do in FluidDB you can do through the API because the API is the only supported way to access FluidDB at all (though, as I say, there are lots of libraries and applications built on FluidDB that use the API). Technically, the API is a RESTful, pure HTTP API that uses JSON when necessary for exchanging data. It is documented here.

FluidDB as a new, more powerful alternative to del.icio.us¶

FluidDB has the potential to be a very interesting alternative to deli.cio.us. Or perhaps a more accurate statement would be, FluidDB should be considered as a very serious and flexible place to rehouse data currently in del.icio.us. The power and flexibility of its information architecture can allow users to store more kinds of information, about more things, and to query and recombine that information in more flexible ways. (In fact, Joshua Schachter is an investor in Fluidinfo, though I don’t know whether he would endorse anything I’m saying here.)

Today, however, there are some important limitations worth noting.

The main limitation is that there is no application like del.icio.us for FluidDB. There are (at least) two ways to import data from del.icio.us to FluidDB, preserving everything in the export, but until some work is done building a del.icio.us-like client application, it will be awkward to use the data and to add new bookmarks. I’m confident that over the coming weeks and months, applications will be built that will provide basic social bookmarking using FluidDB, but until that time, FluidDB is only really a suitable alternative for technical users.

There are two other minor limitations today. The first that although FluidDB has a much more powerful permissions system than del.icio.us, its design (for rather fundamental and deliberate reasons) does not make it very easy to support private bookmarks in the ‘natural’ way. To be clear, it is entirely possible to have completely private bookmarks in FluidDB: but you need to organize your data in a slightly more complex structure to achieve this and native FluidDB queries on private data will have to look slightly different from native queries on public bookmarks. Such complexity can easily be hidden from the user by an application, and again, I suspect that applications that do this will appear. But they don’t exist yet.

If you do want to import information from del.icio.us to FluidDB, there (are least) two published ways to do so.

Some months ago, I published some python code to github that takes a very direct approach, creating a FluidDB tag for each of your del.icio.us tags and attaching them to objects whose fluiddb/about tag is the URL for the bookmark. At the moment, that script doesn’t upload any information about private bookmarks to FluidDB, but it could obviously do so, and I imagine I’ll add that capability some time over the next few weeks when I decide what I think the best way to do it is. I think this approach mirrors del.icio.us most directly and naturally and is a good choice if you want to use FluidDB primarily for social bookmarking, perhaps expanding to take in tags with values. It requires you to install two python packages, both available on github. You can find information in this blog entry.

But there's a simpler and better alternative from Nicholas Tollervery (@ntoll), who works at Fluidinfo. He has written a single script that just prompts you a few questions before it does the upload. It also uploads all the information, rather than just the tags, as mine does.

I imagine we’ll simplify both approaches over the coming weeks.

This diagram shows the object for a webpage Nicholas and I have both bookmarked

I found this bookmark my using the FluidDB query

has njr/fluidinfo and has ntoll/delicious/tags/fluidinfo

I had a single del.icio.us tag for this, whereas Nicholas had several. Nicholas has chosen to prefix all his delicious tag names with delicious/tags, which is why the are so long, but by default both his and my script put all the main data in the top level, so that a del.icio.us tag njr/fluidinfo becomes a FluidDB tag njr/fluidinfo, and the title and notes attributes are stored using FluidDB tags title and >notes respectively. Like my script, Nicholas's currently doesn't tag anything in the case of private bookmarks, but his script does create all FluidDB tags that you use, even if some of them are only used for private bookmarks. So if the existence of a particular tag you have is secret, that would be a reason not to use his script. Nicholas's code is available from github and also (perhaps more easily) from the python package index PyPI, at this location. This allows you to install it with setup tools or easy_install etc. I imagine he'll blog about it soon.

An “old del.icio.us” alternative¶

As will be clear by now, del.icio.us has been pretty influential in my life. A few weeks ago I had an idea for a web site not initially very similar to del.icio.us, which I have been developing slowly. I don’t want to go into details now, but it’s in the general area of checklists, allowing users to find, create, share and use checklists of various kinds. It’s a social application on exactly the del.icio.us model (by which I mean, it has no explicit ratings and is useful even if no one else uses it). More generally, you can think of it as a kind of del.icio.us for user-created and re-mixed content, specifically around checklists for now.

A week ago today I realised that since I was effectively building almost everything you need for del.icio.us, I could easily extend my new website to include an actual del.icio.us alternative, i.e. I could allow users to save bookmarks for websites as well as for content they create in the application itself. I’m generally nervous about extending small ideas and making them more complex, but this seemed like a very minor extension, and I have become increasingly nervous about the future of del.icio.us ever since Yahoo acquired it. News of Yahoo’s intention to divest itself of del.icio.us prompted me to stop wondering and start implementing on Thursday night. I hope to make a limited service available in the next few days. (I’ll update this post and create a new one announcing it when I do.)

Its functionality will be limited at first, but will include (does include, in fact) the ability to import all bookmarks and tags from del.icio.us (private and public) and maintain their state, and all the most basic functionality (creating, editing, deleting bookmarks). There will also be limited social functionality (looking at tags across users etc.) and, of course, the ability to export your data in the same XML format as the one del.icio.us uses.

Over time, I’ll try to make it ever more like the pre-Yahoo del.icio.us, as far as I can remember that. And I think it’s very likely that I will also offer users the option of duplicating bookmarks in FluidDB, to allow for richer sharing, and to provide exactly the del.icio.us-like FluidDB client that I would love myself.

Of course, I realise there are a dozen or more del.icio.us alternatives already up and running, and they are clearly a more stable, safer bet. But I hope that at least a few people will think this approach is interesting enough to try. I will probably add an option to hide all the non-bookmarking functionality, from my site, though I don’t think it will really distract much anyway. (I’ll probably turn it all off until it’s ready, next year, for now anyway.)

Carol Bartz may not be bringing much Christmas cheer at Yahoo, or to del.icio.us users, but I hope that my embryonic del.icio.us replacement and FluidDB can be part of an ecosystem of alternatives that will end up being more empowering for users and will allow hard-core del.icio.us fans to have a future more like the del.icio.us of old.

Maybe, it will be glor.io.us.

[1]	New Approaches to Information Management: Attribute-Centric Data Systems, R. Baeza-Yates, T. Jones, and G. Rawlins. Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000 pp. 17-27.