31 December 2010

The FluidDB People App from @paparent

>

@paparent, author of the very useful FluidDB Explorer, has produced a simple app that shows the locations of FluidDB people on a Google Map. It’s inspired by the Djangopeople application, but so far has only the map.

If you’d like to add yourself (the more the better) you just need to add three tags into FluidDB—two to give your latitude and longitude, and one to tell @paparent’s app about you.

To do this all you need is a FluidDB account and some way of writing to FluidDB. It’s pretty easy using the command line from my fdb.py library, so I thought I’d quickly document how to do it.

The People application expects you to tag your user object in FluidDB with numeric tags in your namespace called people/latitude and people/longitude. So for me (njr), I had to do this:

fdb tag -a 'Object for the user named njr' people/latitude=55.8817504514
fdb tag -a 'Object for the user named njr' people/longitude=-3.10550451279

(You can see the tags on my object here, assuming you’re using a browser that isn’t Internet Explorer.)

The application collects users on the FluidDB objects with the about tag collection:peopleapp (UUID 62397fb0-6f96-404a-a8fa-ee3758cfa7f2) and looks for tags called peopleapp. So for me, that meant doing the following:

fdb tag -a 'collection:peopleapp' peopleapp

And that’s it.

Notes

A few things might not be clear, so I’ll explain them briefly.

  1. My library, fdb.py, automatically creates namespaces and tags as required, so there was no need to create the njr/people namespace or any of the tags: when you write to them, fdb.py assumes you’d like them to exist.
  2. I’ve left my username off the tags because fdb.py assumes that the tags you refer to are yours unless you stick a / at the start. So when I say people/latitide, fdb.py adds in njr/ at the start for me. If I want to refer to (say) paparent‘s latitude in fdb.py, I say /paparent/people/latitude. (This is slightly non-standard in FluidDB, but very convenient.)
  3. The final tag command doesn’t set a value: this is fine, as FluidDB doesn’t require tags to have values (or, more accurately, allows null values, which is what this command sets). @paparent‘s people app doesn’t require a value on the tag, so this is enough.
  4. In case it isn’t clear, the -a 'Object for the user named njr' is specifying the object to be tagged by its about tag. FluidDB creates an object for each user, and always gives it an about tag of this form.

So: go add yourself!

UPDATE

@paparent has added instructions for how to add yourself to the map using his FluidDB Explorer and Holger Durer (@HDurer) has written a blog post about how to use his emacs mode fluiddb.el to add yourself to the map without having to leave the comfort of emacs. How wonderful is that?

29 December 2010

Abouttag (library/web app/visualizations): Update

I’ve made a few changes to the abouttag library (for generating canonical about tags according to various conventions documented here, and to the online app, available at abouttag.com.

Interface

I’ve added a simple interface to abouttag.com so that you can now just put in an about tag or an object ID, or choose a class of Thing and type in key information about that thing. The website can then generate the about tag for you or take you to a (live) visualization of that object in FluidDB, with its tags and values. The diagrams have been updated slightly, to make them reflect more precisely the way tags are references in FluidDB itself.

Here’s an example, that should work with almost any modern web browser except internet explorer (Firefox, Chrome, Safari, Opera, iPhone, Android etc. should all be fine): planet:Mercury.

Library

The library has been expanded slightly to include twitter users, following the conventions adopted by the Tickery web application.

25 December 2010

God Rest Ye Social Bookmarkers

’Tis the season to be merry and last night, our local troupe, the Bartz Carol singers were out in force. Although I’m not confident I caught the words correctly, my impression is they were singing some slightly non-traditional carols. This was what I think I heard.

God rest ye social bookmarkers
Let nothing you dismay
For Carol Bartz our nemesis
Was brought to clear the way
To save us all from Yahoo’s power
When we were gone astray
While tagging our bookmarks of the web
Of the web
While tagging our bookmarks of the web.

At pinboard in in India
A saviour site was born
And brought to life by Maciej
Upon one blessed morn
The which the other social sites
did tattle and scorn
While tagging our bookmarks of the web
Of the web
While tagging our bookmarks of the web.

From Joshua, the father,
A tweet it emanates
That Yahoo talks to everyone
except he who creates
The Tasty and Delicious things
that Yahoo itself hates
While tagging our bookmarks of the web
Of the web
While tagging our bookmarks of the web.

Fear not then all ye bookmarkers
Let nothing you afright
You export options still remain
Your bookmarks are alright.
Export them now and import them
Wherever you see light
While tagging our bookmarks of the web
Of the web
While tagging our bookmarks of the web.

I think there was more, but by this time the Bartz Carol singers were moving away, the stars were coming out and flickring, and I couldn’t really catch the rest.

I might have been mistaken anyway; perhaps it was all just a dream.

24 December 2010

What Would Wikipedia Do? (WWWD)

Readers of this blog will know that some of us have long been interested in the subject of about tags and how to choose them in FluidDB. The question is important, because FluidDB’s information sharing paradigm is predicated on the idea that different users and applications share data through tagging common objects, and the about tag is the primary way to decide which object that should be. This post discusses a new idea Terry (@terrycojones) and I cooked up about a week ago. At this stage, I’m not proposing it as a convention, but simply discussing it as a possibility and soliciting comments.

WWWD / WDWD?

The kernel of idea is to follow wikipedia’s taxonomy by setting FluidDB about tags to the relative path for the wikipedia page for the entity of interest.

For example, the (english language) page for the Eiffel Tower has the page title “Eiffel Tower” and the URL

http://en.wikipedia.org/wiki/Eiffel_Tower

The putative convention under discussion, which I will tentatively call wwwd-1 for the rest of this post, would therefore be

Eiffel_Tower

i.e., if you wanted to tag the Eiffel Tower in FluidDB, the object you would use is the one whose about tag is Eiffel_Tower.

Arguably, with this example, it’s more a case of “What does Wikipedia do?” that “What would Wikipedia do?”, but we’ll come to that later.

So at its simplest, this convention says that to find the wwwd-1 about tag for an object that already exists in the English language edition of Wikipedia, you set the about tag to everything that follows http://en.wikipedia.org/wiki/ in the relevant page URL (without a trailing slash). [1]

The Case For wwwd-1

This convention has a number of things going for it, as well as a number of drawbacks. Let’s take the positives first.

1. Pragmatically, it’s easy. Wikipedia exists, is free, and is available to anyone with an internet connection (censorship aside). It is huge, beautiful and widely read, and it represents the work of countless thousands of (mostly) dedicated and intelligent humans, who have already created a rich taxonomy that—imperfect as it must be—has already disambiguated millions of terms. Following its lead is nothing if not pragmatic.

2. The Wikipedia URLs allow obvious things to have obvious, minimal URLs. Generalizing wildly, the most common pattern for URLs in wikipedia is the name of the entity with spaces replaced by underscores, punctuation %-encoded and articles stripped, in title case. Whether you like underscores or not (and I don’t), this is pretty simple, and is described (loosely) here. For example, here are some examples of about tags wwwd-1 would include:

I have glossed over some issues here, but this is not a bad set of about tags.

3. They allow us to avoid reinventing wheels. At the time I am drafting this (17:43 GMT on 24th December 2010) this wikipedia entry tells me that there are 3,511,257 entries in the english language edition. (As I proof-read the post, on 28th December, rather impressively the total appears to have increased to 3,514,459.) In the scheme of things, that isn’t all that many; if FluidDB has any success at all, we will end up with orders of magnitude more objects with about tags than that. But most of our entries will have natural about tags that will be very straightforward to determine. For example, URLs will (modulo canonicalization) be their own about tags. I think there is an excellent change that we can do better than Wikipedia for certain well-defined classes of numerous objects, such as books. But the huge value in Wikipedia’s taxonomy is that it deals with a very high proportion of the most noteworthy subjects in the world (almost by definition).

4. Even where entries to not exist in Wikipedia, they give us a kind of template to work from.

Disadvantages and Issues

While many of the benefits of the putative wwwd-1 convention are clear and impressive, there are a number of challenges, disadvantages and issues.

1. Splitting, Changing and Disambiguation. Although the URLs for longer-established Wikipedia pages are fairly stable now, they can and do change. This is not particularly problematical within Wikipedia, because there is only really one article on each page, controlled, in some cases, through reasoned debate and collective refinement, and in other cases, through edit wars. Presumably most people find articles by search most of the time anyway, and while it is clearly unfortunate if URLs change and therefore invalidate (or at least redirect) bookmarks, it is not very serious. The situation is very different in FluidDB: every user has his or her own tags, which can be attached to any object in the system. If the mainstream community collectively decides to move from (say) the object whose about tag is “Mercury” to represent the planet, to the object with the about tag “Mercury_(planet)”, it is no small or simple, automatic or guaranteed matter to get all the data moved. [2]

2. Non-uniformity. I have personally already used my Miró software to upload a structured form of data gathered from various wikipedia pages to FluidDB, including (funnily enough) every element in the period table, and every planet (and dwarf planet) in the solar system. I documented the conventions that I used as planet-1 and element-1 and they were very simple: data for Earth was stored on the object with about tag planet:Earth, data on the planet Mercury was stored on the object with the about tag planet:Mercury, and data on the element Mercury was stored on the object with about tag element:Mercury. All of this was very straightforward, and once the scheme is known, it is easy to write code to generate the about tag for the object, knowing only the name of the planet or element. In contrast, using the wwwd-1 convention, life is fairly easy for individual users wanting to add a tag to a single, particular object, but much harder for anyone wanting to upload data automatically. The non-uniformity means that you need to look up wikipedia before knowing what the about tag will be. This introduces a level of complexity not to be underestimated, and is, in my view, the single largest problem with the idea of wwwd-1.

3. Language Issues. For English speakers, there is a clear attraction to using unqualified English terms as the about tags in FluidDB. But what of the myriad other languages? One approach is to say all languages will be on the same footing, and they should all simply follow Wikipedia’s relative URLs in their own language. Mostly, different languages will use different obects for storing data about the same thing, but sometimes they will coincide; when they do coincide, the two languages will sometimes refer to the same real-world entity and sometimes to different ones. This is problematical in some situations, and less or not in others. Another option, is to follow wikipedia more closely and include a language code in the about tag. So we might have en:Earth, rather than Earth, and fr:Terre in French. If we fail to specify language, then English speakers will be likely to put information concerning café’s (informal restaurants) on the same object as french speakers will put information about coffee (the one with the about tag Cafe). More happily, if we avoid the prefix, information about Johnny Halliday will end up on the same object for both English and French speakers; indeed, this will be a particularly common pattern for entries about individual people, since their names are normally not translated, at least within languages with common or similar alphabets. (Mao, for example, would map to the about tag Mao_Zedong in both, even though his name originates in a complete different alphabet.)

4. Priority, Cultural Sensitivity and Longevity. One of the very attractive things about wwwd-1 is that the most common/likely meaning of a given term tends to win the battle for the cleanest, simplest name. This is deliberate. The disambiguation guidelines say (at the time of writing):

Although an ambiguous term may refer to more than one topic, it is often the case that one of these topics is highly likely much more likely than any other, and more likely than all the others combined – to be the subject being sought when a reader enters that ambiguous term in the Search box. If there is such a topic, then it is called the primary topic for that term. If a primary topic exists, the ambiguous term should be the title of, or redirect to, the article on that topic.

This is entirely sane and defensible. But it does mean that human judgement is required, and leads to possible charges of cultural imperialism and so forth. Terry Jones, of Monty Python fame, is surely the most famous Terry Jones and most likely target of any search in Wikipedia today. But if Fluidinfo were to take off and become the next Google, perhaps it would be a different Terry Jones who people would expect to find occupying the object with the about tag Terry Jones.

5. Ugliness and Humans vs. The Machines: wwwd-1T. It’s all very well to say that Wikipedia URLs are quite nice, simple and readable, but they are still designed for machines rather than human beings. After all, why Eiffel_Tower rather than Eiffel Tower? The answer is because best practice dictates that spaces in URLs usually be “%-encoded” as, so that, in a URL, the preferred (safe) form would be Eiffel%20Tower, which is clearly worse, not better, than Eiffel_Tower from most perspectives. The same goes for many other punctuation symbols. On the other hand, there is nothing to say that we need to use the URL, which is in any case largely automatically constructed from the page title, which does have actual spaces. So a variation of the proposal, which I actually prefer, is to use the Wikipedia page title, rather than the relative URL, as the about tag. Then we actually do put information on the Eiffel Tower on the FluidDB object with the about tag Eiffel Tower. We permit any unicode text in about tags anyway, so we might as well take advantage of that and use spaces as required. That way, when we look at the about tag, it is maximally readable, and says exactly what we want it say; and we don’t have to read through percent encodings, underscores and all that other nasty computer stuff. I will call this alternative putative convention wwwd-1T.

6. Standardization. There remain some questions about the precise about tag to use, even if following either wwwd-1 or wwwd-1T, and this arises somewhat inevitably from the fact that ordinarily URLs are not taken to be case sensitive. [3] So while the fairly strong Wikipedia convention seems to be to use Title Case, in fact this is not always followed, and nothing breaks when it is not. A case in point concerns the San Andreas Fault. While I said, above, that wwwd-1 would give us San_Andreas_Fault, at the time of writing, a search on San Andreas Fault returns a URL ending with San_andreas_fault, though the page title is San Andreas Fault. FluidDB is case-sensitive, so we would need to decide. From a readability perspective, Title Case is clearly preferable; but title case is intrinsically ambiguous other than in the context of a fixed algorithm, and there will inevitably be failures as a result of getting the case wrong. A more pragmatic suggestion might be to use all lower case, as I suggested in the book-1 convention. I will call this third variant wwwd-1L for now.

Clearly, this is a complicated issue. To my mind there are significant merits in the idea, particularly in the wwwd-1T and wwwd-1L variants, but there are also significant problems. More than anything else, I think the convention is pretty good for human end users, but pretty difficult for machines (or to put it a different way, for application writers). If it were adopted, I think it would be best adopted as a deafult convention for the case when there is no other established/better convention, and we would be better to establish alternative conventions like book-1 for classes of objects that are numerous, mostly not included in Wikipedia and capable of being generated automatically from readily available information. In cases where we departed from the relevant wwwd convention, it would be particularly fanastic if we (collectively) added some kind of pointer from from the object that would be used under that convention to the page we actually use, in cases where the wikipedia page exists. For example, we would point from The_Road_to_Wigan_Pier to book: the road to wigan pier (george orwell). (Needless to say, we can look forward to a time when not only do we have a pointer from our FluidDB objects to the corresponding Wikipedia page (or pages), where such exists, but Wikipedia has pointers back to FluidDB. If only there were some way for us to add such pointers in Wikipedia . . .)

I’d be fascinated to gather views, in the comments or elsewhere.

[1]To be clear, the putative wwwd-1 convention is not a convention for tagging Wikipedia pages. To do that, you simply use the full page URL (preferably canonicalized using the url-1 convention. The proposed convention is simply based on the Wikipedia page naming convention (practice).
[2]When I say ‘the community decides to move’ I am not referring to any kind of formal procedure: there are no such procedures with FluidDB. I can put my information wherever I like, and that is unlikely to change. I really mean the conventions in use, whether agreed in some way or simply used by the bulk of FluidDB users.
[3]More particularly, Wikipedia does not interpret URLs in a case-sensitive manner.

The Two Kinds of Search: Locating Items vs. Locating Information

The advent of search engines has changed the world twice. What Google now rules, having toppled Yahoo and Alta Vista, Lycos and Excite, is web search. We all know it, we all use it, and unquestionably the web would be a nightmare without it.

The second time search changed the world was when it came to the personal data we keep on our own computers, or privately, online. Perhaps Apple was first, with Spotlight in OS X, which gave the ability to search essentially everything on your hard disk; it is fabulous. Soon after, Google offered its own Desktop Search, and Microsoft added similar capabilities to Windows, starting with Vista. (I might actually argue that Palm was first: a really striking feature of even the earliest Palm Pilots was integrated search across all your data—contacts, notes, to-do lists, calendar and more. Amazingly, I know of no smart phone that does it as well, even today. On the iPhone, for example, you can’t search on photo names; it’s infuriating.) Google, of course, also offers search across Gmail, and encourages people never to discard a message.

But the real distinction I want to make is not between these two search revolutions, but rather between searching for a particular item—one that you know exists and simply need to locate—and searching for information on a topic with no special knowledge of where, or in what form, that information exists.

These are quite different activities, and the former—locating a specific item—is general harder. This is counter-intuitive. How can searching for something on your own hard disk possibly be harder than finding someone on the countless billions of pages that form the web?

The answer, of course, is that if it is just information you want, there’s a good chance it exists in many forms, in different places. Subject to things like authority, credibility and verifiability, any source (or sources) will do; whereas by definition, when you are searching for a specific item, success requires finding precisely that item.

The problem with searching email

In my experience, email is the hardest information for most people to search successfully. Quite a lot of the time, full-text search barely helps, because it fails to narrow down the information enough, or unwittingly excludes the item you are searching for. Arguably, people are now worse off in this respect than before the advent of local data search, because we are often lulled into a false sense of security, believing that, contrary to our regular experience, we will be able to find it, and thus being freed of the burden to organize our information.

The first is a problem because by the time we come to search for email, we often have only a hazy idea of anything that search might be able to latch onto that really distinguishes that one key email. We might know who it’s from Jo; but perhaps not which email address she used, or whether her or full is in the message anywhere. And if it’s someone you swap mails with a lot, this could still be hundreds or thousands of messages.

We will usually some idea about the date. But even that might be fairly hazy (or in my case, plain wrong).

The hardest problem is that of choosing the other search terms. Most modern search is essentially literal in character—while it will perform stemming (making run and running equivalent; perhaps even ran) and handle simple spelling variants, it is deliberately not semantic; that is, if you search on ‘run’, the search makes no attempt to match ‘sprint’, for example. (There are search engines that are semantic, but mostly this approach is not favoured, and not found to be helpful for internet search.)

In the context of searching the web, this exactness doesn’t matter for reasons that are directly linked to the fact that you are not looking for a particular item: you just want information. You might by-pass sources of information that use a different word, but as long as some useful sources do use it, that isn’t really problematical. The link structure of the web also works to your advantage here, with one page leading to another.

None of this applies in email. If you search on ‘run’ but the email only talks about sprinting, the mail won’t match, and there is unlikely to be another linking to it. So standard search approaches exclude emails that you want to find because of their precision. At the same time, any email that does match your terms will be included. If any ranking of results is shown at all, it will probably be unhelpful, so you end up feeling as if you need search within the search results. But you cannot, not for technical reasons but because you don’t know how to refine it further and if you do you’re almost as likely to exclude that which you seek as that which you do not.

The Genius of Tagging

One crucial difference between organizing with tags and organizing in folders is that you don’t have to choose a single location for something. If you tag regularly, you find that it imposes a very low overhead (much lower, I find, than choosing which folder to put a message into) and that you quickly develop a standard vocabularly of tags that you use without really thinking about it. A good bias to have is “when in doubt, add the tag”, i.e. if you think there’s any chance at all it might be useful, add it. Tags are cheap.

But the true genius of tagging is not that items that contain the tag; it’s the ones that don’t. This is a case of the dog that doesn’t bark in the night.

For by tagging a particular set of messages with “running” (say), I am also (implicitly) not tagging all the other messages—including the ones that include words like run—with running. That difference is critical. When you look for a set of items that you have tagged with a given word, you are looking that things for which that word (that tag) are important, rather than incidental. This is the genius of tagging. It not only identifies that which you tag as (potentially) relevant; it also implicitly marks everything else as probably not relevant to that tag.

When looking for a particular item, that can make the difference between success and failure.

23 December 2010

A Translation of Yahoo!'s "What’s Next for Delicious?" Blog Post

A translation of the Yahoo! blog post on Delicious. [With apologies to John Gruber, who (as far as I know) invented this ‘translation’ format.]

What’s Next for Delicious?

Many of you have read the news stories about Delicious that began appearing yesterday. We’re genuinely sorry to have these stories appear with so little context for our loyal users.

[Shit. This wasn’t supposed to happen. This will affect how much we can get for Delicious.]

While we can’t answer each of your questions individually, we wanted to address what we can at this stage and we promise to keep you posted as future plans get finalized.

[We’re not going to risk another SNAFU like this.]

Is Delicious being shut down? And should I be worried about my data?

  • No, we are not shutting down Delicious. While we have determined that there is not a strategic fit at Yahoo!, we believe there is a ideal home for Delicious outside of the company where it can be resourced to the level where it can be competitive.
  • [Technically, it’s still running. True we laid off the entire Delicious team, and everyone knows that once you get rid of the developers, a system is al but useless, but we’re going to highlight the technicality.]

[Yes, you should be extremely worried about your data.] [1]

What is Yahoo! going to do with Delicious?

  • We’re actively thinking about the future of Delicious and we believe there is a home outside the company that would make more sense for the service and our users.
  • [We’re sunsetting it with extreme prejudice.]
  • We’re in the process of exploring a variety of options and talking to companies right now. And we’ll share our plans with you as soon as we can.
  • [But not Joshua. Anyone but Joshua. Who does he think he is, carping from the sidelines as we kill his creation with that unique Yahoo! combination of neglect and active destruction? It’ll go to the highest bidder; or to the second highest bidder, if Joshua is the highest bidder.]

What if I want to get my bookmarks out of Delicious right away?

  • As noted above, there’s no reason to panic.
  • [There is every reason to panic. But we really don’t want anyone to dump your bookmarks out of Delicious right now. We desperately wanted to keep a lid on this so that the rats wouldn’t desert the sinking ship, thus compromising what anyone might might pay for said ship. Please don’t go.]
  • We are maintaining Delicious and encourage you to keep using it.
  • [We’ll keep the power on in the hope that some of you idiots don’t notice and keep using it till we can sell it. You’d better keep using it too, our we’ll be out of pocket big-time when it comes to the sale.]
  • That said, we have export options if you so choose.
  • [Run! For crying out loud, if you have any sense, grab your bookmarks and run.]
  • Additionally, many services provide the ability to import Delicious links and tags.
  • [Actually, that’s spot on.]

We can only imagine how upsetting the news coverage over the past 24 hours has been to many of you

[“We’re just shutting down delicious, not selling your children to gypsies. Get the fuck over it.”@fakecarolbartz]

Speaking for our team, we were very disappointed by the way that this appeared in the press. We’ll let you know more as things develop.

[Speaking for Yahoo! (not the Delicous team, obviously; they have their pink slips): It’s so not fair. Yahoo! used to have the mojo. Now people treat us like we don’t even get the internet. As if. Just because Carol doesn’t have a flickr account doesn’t mean she still uses 35mm film and a fountain pen, you know. Yahoo!s are people too, and it really hurts when people like Thomas Hawk come up with crap like this:

Do you even realize what you have with Flickr? It’s the largest well organized library of images in the world. Not only that, it has a very strong social networking component. In fact, Flickr may represent (if managed correctly) your single biggest opportunity to launch a much larger and more lucrative social network (and stock photography agency as well). Have you spent any time in any Flickr groups? They are addicting. People live in them. They play games in them. All kinds of activity goes on in them every day. And if you took the time to really explore the social side of Flickr, you’d learn this, and figure out a way to grow it. (Quoted by Charles Arthur at Guardian Technology)

Tom Hawk is full of shit. Flickr’s next, and you can certain there’ll be no leaks this time. I’m sure Ballmer will give us a billion for it. Well, half a billion anyway. Hell, he could do that out of his own petty cash. Ballmer would be perfect for Delicious. He’d probably bring it up to date using Silverlight and ActiveX and make your bookmarks dance like Clippy.]

[1]But in all seriousness, no one who takes any care should lose anything. First, having made this public commitment, Yahoo! would probably face an even bigger backlash if it deleted the data now. Secondly, Delicious has always had some of the best export options around, and just about every other bookmarking site on the web will import Delicious’s exports. Just go here and save the resulting XML file and you’ll be safe. Better still, import the result to Pinboard or another site of your choice.

21 December 2010

Del.icio.us Exporting And Alternatives: An Update

A few days ago, I blogged about some ways to get data out of del.icio.us and into FluidDB, and also about the fact that I was working on a kind-of old-style del.icio.us clone.

Things have moved on a little since then, so I thought I’d update.

First, although bad, the situation doesn’t look as dire as it did. By all accounts, the del.icio.us staff are gone, but Yahoo has made a very public statement that our bookmarks are safe, for the time being, the the service will continue to operate, and that its intention is to sell or otherwise migrate del.icio.us somewhere else, rather than simply to stop it. Given that del.icio.us has always had excellent export options, supported by (as far as I know) all of its competitors, there is certainly no reason why anyone aware of the situation should lose any significant amount of data.

Another way in which the situation has moved on, for me, is that I’ve discovered and signed up for Pinboard and started using that. Pinboard is the first alternative to del.icio.us that has felt like its developer was on the same wavelength as Joshua Schachter (who created del.icio.us). So far, I’m impressed with it. Although I don’t particularly like the aesthetic, I do like the minimalism. Functionally it looks strong and technically it appears credible. Despite some heavy breathing, it appears to have stood up well to a deluge of sign-ups and imports, and clearly has a energy and momentum in a useful direction; something that hasn’t been true of del.icio.us for far too long. It also has interesting and potentially useful extra features both in production and on its (commendably public) roadmap. I definitely wish Maciej Ceglowski and Peter Gadjokov, who run the site, all the best and hope that Pinboard site has a great future. Right now, it looks to me like the best alternative to del.icio.us on the net, and a better medium-term bet than del.icio.us itself.

None of this is to suggest that I don’t still think it’s an excellent idea for people to import their bookmarks into FluidDB, as discussed in previous posts; FluidDB is a completely different kind of system, allowing things it is most unlikely Pinboard will ever even wish to support. But to be clear, FluidDB alone is not a del.icio.us competitor, and could only be so if a client were developed. In fact, FluidDB’s ideal role is as a secondary store for bookmarks from any and all bookmarking sites, so if you use something else (or even just store bookmarks in your browser), that’s something to consider.

Finally, what of my plans? Well, I still fully intend to release a site that will include old-style del.icio.us functionality (not unlike Pinboard’s) with some new twists (mostly rather unlike Pinboard’s twists). The main focus of that site, however, will be content creation, with bookmarking as an organizational paradigm that can be extended to existing web resources, rather than a site developed with the primary goal of supplanting del.icio.us (or Pinboard, for that matter). Inevitably, even if things go supremely well, there will be a bedding down period, and it will be a while before I recommend anyone to use it as a primary repository. But now that the urgency has gone, I’ve decided to wait at least a month before opening it up in any form.. I have all my bookmarks in there (as well as in del.icio.us, Pinboard and FluidDB!), and am really enjoying being able to tune it to operate in the way I think it really should. I remain hopeful that some time fairly early in 2011, some others will come in give it a try. But for now, my advice is:

  1. Export your bookmarks from del.icio.us regularly; (this is good practice regardless; I extract mine several times a week);
  2. Try Pinboard if you haven’t given it a spin and want something in the mould of old-style delicious; (there’s a modest sign-up fee, currently $8.88, and rising in direct proportion to the number of users.)
  3. If you’re so inclined, get a FluidDB account and import your bookmarks in there too, for the powerful query and extended tagging capabilities, the rich permissions and the cool (and useful) visualization that this brings.
  4. Watch this space. I don’t know when, but some time fairly early in 2011 I hope to offer something else for people to try; I have high hopes for it.

20 December 2010

A Conversation with Eliza about Carol Bartz and Delicious

[Eliza is the built in pschotherapist in Emacs, my editor of choice. I thought I’d see what Eliza thought about the delicious situation.]

I am the psychotherapist. Please, describe your problems. Each time you are finished talking, type RET twice.

I think Carol Bartz hates delicious

Why do you say bartz hates delicious?

She plans to kill it

This is not a healthy way of thinking.

It certainly isn’t

What makes you believe it certainly is not?

Well, delicious is wonderful.

Earlier you said bartz hates delicious?

She laid off all its staff, apparently

Maybe your plans have something to do with this.

I don’t think so.

Is it because of your life that you say you do not think so?

I don’t think Carol Bartz knows my plans.

You mentioned that bartz hates delicious?

Yes. And apparently she doesn’t have a flickr account either.

Is it because of the people you hang around with that you are going through all this?

No

Why not?

delicious is an online service

Are you sure that delicious is an online service?

Yes

I see... Well, do you really think this is so?

That it’s an online service, or that Carol Bartz doesn’t get the internet and plans to kill delicious?

Is it because it’s an online service or that carol bartz does not get the internet and plans to kill delicious that you came to me?

Because Carol Bartz doesn’t get the internet and plans to kill delicious

Does it bother you that carol bartz does not get the internet and plans to kill delicious?

Yes

Yes, hmmm... so, what makes you think this is so?

Yahoo bought two of the most important properties on the internet and Carol doesn’t understand them.

Why do you say bartz hates delicious?

She sacked all the staff and plans to get rid of it

18 December 2010

Del.icio.us and I: Reflections and Alternative Alternatives

I’ve been captivated by del.icio.us from the moment I found the site in early 2004. I don’t remember how I found it, but it captivated me immediately. It was so minimal it wasn’t even clear what the site was for, but as soon as I figured it out, I was hooked. I think del.icio.us was (and remains) far more important and innovative than is generally recognized.

The Social Site for Solipsists

An aspect of del.icio.us that is rarely discussed is that while it is the grandfather of all social sites, unlike most others, del.icio.us is ridiculously useful even if you are its only user. When I save a bookmark on del.icio.us, and tag it, I do so entirely selfishly. I get two intense benefits from storing bookmarks on del.icio.us, even if no one else uses it. The first is that I can access my bookmarks equally easily from different browsers and different machines. The second is that I can organize them using tags, which are dramatically more useful than folders.

Hierarchical Storage vs. Tags

The trouble with hierarchical folders is that, in practice, they force me to choose a single place to put something. This was a serious problem for email, and remains so, to a lesser extent, for files. Since I have currently about 2,000 bookmarks, it’s also a problem for them.

The problem is perhaps clearest with email. All of the people I know who are good at finding old emails, without exception, chose, fairly early in their lives, a single organizational paradigm. I know some people who store email strictly by date. I know others who store it strictly by sender. And I know still others who store it by subject (though they are generally less successful at retrieval the the people who use one of the first two methods). I never decided, and I have always struggled to find old emails. It has always felt to me as if I need to put emails in multiple places, to reflect the fact that I will probably only half remember one detail when I’m searching for an email, and it’s very hard to predict what that thing will be.

This is the problem that del.icio.us solved for bookmarks with tags. By allowing me to attach as many relevant tags as I like to a bookmark, I almost never have any trouble finding it. Whereas I find it very difficult to anticipate the single category I will need to retrieve it in the future, it is remarkably easy to attach a handful of tags that will almost certainly mean that when I come back to look for it, I will find my bookmark quickly. As a result, since I started using del.icio.us, I have almost never struggled to find a website I’ve saved and tagged. It is remarkable, and it works for emails (and could work for files) as well.

(Search isn’t as good.)

(Many people claim that the advent of full text search has eliminated the need for organization. I disagree. While there is no question that Spotlight, on the Mac, full-text search in gmail, and equivalent solutions elsewhere, have been enormously positive, I find that I still struggle to find email, particularly, because I tend to get thousands of results when I search. The brilliance of tags is that not only can I identify bookmarks (or emails) of interest by using a tag; I also exclude all the items to which I didn’t attach that tag. This turns out to be almost more important.)

The Social Solipsist

The cross-browser/cross-machine accessibility of bookmarks and the organizational power of the tag are the two most important benefits that del.icio.us brings for me, but that is not to suggest that the social aspect is unimportant. It is also remarkable.

With del.icio.us, of course, bookmarks are public by default. (In fact, I’m pretty sure there were no private bookmarks initially.) Anyone can go to del.icio.us/njr and see all but a handful of my bookmarks. And anyone interested to see the bookmarks I have tagged with Fluidinfo, need only visit del.icio.us/njr/fluidinfo to see them. (Notice the zen-like, RESTful, perfect URLs.)

But there’s more. To see what anyone has tagged with fluidinfo, I can go to del.icio.us/tag/fluidinfo. And here something truly remarkable happens.

Despite the complete lack of any organizing principle or oversight, there is rich structure in the tags. The tagsonomy, or folksonomy, as it is called, simply emerges, and is useful. When Google fails me, I often go to del.icio.us, and look up what I want using a few tags. The results are often better than Google’s, because everything in del.icio.us tagged with a given word has been chosen by someone, in vast majority cases, for his or her own selfish (or at less, non-altruistic) reasons. There is no voting on del.icio.us. It is not like reddit, or digg; there is only saving and tagging. And though you might think that this would lead to chaos, it doesn’t. It is true both that words are ambiguous, and that there are many words with similar meanings. But this hardly seems to matter. If you look at a tag with multiple meanings, you may find bookmarks for sites covering each meaning, but that isn’t a big problem, and you can search on tag intersections anyway. It’s also true that the first tag you search on might not be the one most people use. That also turns out to be a largely unproblematical. The tagsonomy that emerges from millions of selfish actions is surprisingly clean, regular, and useful. It is almost mirac.ulo.us.

for:alex with love

Before the site even supported for: tags, I started tagging sites that I thought would be of interest to my son, Alex, with an alex tag. (Did this mess up the tagsonomy? Not obviously.) And he would periodically go to del.icio.us/njr/alex and find the sites I’d saved for him. I save origami sites for my mother, who folds, at del.icio.us/njr/origami. It works.

But del.icio.us then made this even better by introducing for: tags. I can now actually send bookmarks to Alex with a simply by using a for:alexradcliffe tag. When he goes to the site, he sees them. It’s mar.vello.us.

Love at First Site

Since adopting del.icio.us, I have used it more-or-less daily and have found it so spectacularly useful that I have built a number of aspects of my digital life around it. One of these is that I have a dense home page, for all of my browsers, that is built by extracting everything I’ve tagged with home and structuring them into a dense page that has all my most important sites. I have over a hundred links on this single dense page, and it serves most of my common internet needs, both on computers and (reformatted) on my phone. (Read this to see it, and get the code by following the instructions here , if you’d like your own).

Christmas Carol

Joshua Schachter, the banker-turned-internet-entrepreneur who built del.icio.us to solve his own need to organize and share bookmarks, sold delicious to Yahoo a few years ago. He stayed a while but quit when it was clear that Yahoo didn’t get delicious. It was apparently Joshua we have to thank for the tags in flickr as well, for (as I understand it) he spoke to Caterina and suggested that flickr needed tags. Flickr, of course, is also owned by Yahoo, and, from my perspective is the only other part of Yahoo that deserves any kind of future. But John Gruber at Daring Fireball reports that Carol Bartz, who unfortunately doesn’t get the internet, fired the entire del.icio.us team a couple of days ago as part of her plan to dump del.icio.us. (I think almost every change that team has made to del.icio.us since Joshua left has been retrograde; but I’m still not celebrating.) Charles Arthur, who does get the internet, nailed the Yahoo fiasco in the Guardian’s Technology Blog yesterday:

The trouble with all this? It’s on the internet, so Carol Bartz isn’t going to see it. If only there were some way to make it physical so she could read it . . .

Maybe she’ll dump flickr next; Charles Arthur reports that she doesn’t even have a flickr account.

Enter Terry Jones (@terrycojones; not the Python)

For much of the eighties and nineties I did research in the somewhat obscure and (then) emerging field of genetic algorithms. At conferences, I tended to spend time with Terry Jones, who worked directly with John Holland, the MacArthur genius who founded the field. Terry and I both went onto other things and we lost touch. But he was interesting, and I looked him up on the internet one day. He had a very eclectic home page that, among other things, included a set of papers he had written about computer storage mechanisms. He tried and failed numerous times to get these published. I read them and was captivated by the brilliance and beauty of the ideas in them.

In the papers, Terry discussed search and an embryonic form of tagging as the two core organizing principles that he thought should the basis for computation storage and retrieval. This was before del.icio.us, and before search assumed the prominence that it now enjoys. Terry’s tags (which he then called attributes) were more complex than del.icio.us tags, in that they carried values. So while in most tagging systems you can attach tags as labels to objects, in Terry’s mind you should be able to attach any information to anything using a tag. So at the simplest level, you could attaching a rating to something (I rate Fugitive Pieces, by Anne Michaels, 10). Or you could go further and attach an image or a webpage or anything at all, to anything else. It was extremely innovative, and the fact that he couldn’t get them published says a lot more about peer review than it does about the papers. (The papers are available here, here and here; the last was eventually published [1].)

Terry tried twice to build versions of his idea, but struggled and basically failed. I got in touch with him after reading his papers, and enthused, and I think he said I was essentially the first person who had ever liked his work in this area. He sounded quite depressed.

A bit later, another friend of his, Russell Manley (@rustlem) told him about del.icio.us, and this was the spur that made him try a third time to build his vision. This time, he sold his flat, created a company (Fluidinfo, in which I have invested and to which I am an advisor) and went for it. The result is Fluidinfo Inc, and its main product, FluidDB.

FluidDB

I think of FluidDB as like del.icio.us on steroids (though Terry doesn’t like that description of it). Seen through my permanent lens of del.icio.us, you can get to FluidDB through a series of generalizations of a social bookmarking site.

First, instead of just URLs, in FluidDB you can tag anything. FluidDB contains objects, and the objects can represent anything at all. They are identified by a special tag (the about tag, fluiddb/about) that can be used to identify the object. So I have bookmarks for websites in FluidDB, which are stored on objects whose about is the URL to which they refer. For example, I have a bookmark for entry in this blog describing how to tag books from the Guardian’s 1000 books everyone must read in FluidDB. In a modern, standards-compliant browser (essentially anything except Internet Explorer), you can see an image (generated live from FluidDB) showing my tags on that object by clicking this link. (FluidDB is completely compatible with Internet Explorer, but my graphical image generator for FluidDB is not.) Here’s a static snapshot of the same thing.

blogpost.png

The next thing you add when transforming a social bookmarking site into FluidDB the ability (but no requirement) for tags to have values. For example, this image shows the FluidDB object corresponding to Mars, and an application called Miró has added a bunch of tags to it, with information about Mars. (If you had a FluidDB account, which you could, you could add your own information about Mars to the same object.) Again, here’s a static snapshot of the object.

Mars.png

The third thing you have to add to get FluidDB is a fine-grained permissions system. In del.icio.us, almost everything is shared, though there is the ability to mark a bookmark as private (which means that only you can see it.)

In FluidDB, every tag has its own permissions, with separate controls for reading and writing and an access-control list. For each of your tags, you can choose who can read them and who can write them, either including or excluding people, or making them completely private or completely public. It’s very powerful (see Permissions Worth Getting Excited About) and The Permissions Sketch for more details.)

The fourth thing you add to produce FluidDB is a simple but rich query language. For example, you can find all the planets heavier than Earth with the FluidDB query

miro/planets/Mass > 1.0

There are lots of tools around that let you issue queries against FluidDB (though it doesn’t really have its own interface yet). I have a command line tool that talks to fluidDB, and the command

fdb show -q 'miro/planets/Mass > 1.0' /miro/planets/Name

results in this output.

4 objects matched
Object e06bea33-a000-4294-a7b2-d3245f1481ca:
  /miro/planets/Name = "Saturn"
Object e9b022e6-c770-44ad-abaa-1a2cde9a3224:
  /miro/planets/Name = "Uranus"
Object 2994f561-8efe-4e13-9374-bf3f9436eac6:
  /miro/planets/Name = "Jupiter"
Object 72144788-a59e-4819-a9c9-6b8577e2695b:
  /miro/planets/Name = "Neptune"

You can see them in a modern browser by following the hyperlinks on the miro/planets/db-next-record-about tag on the live version of the image. You can also use a more point-and-click tool like @paparent’s FluidDB Explorer by visiting http://explorer.fluidinfo.com/fluiddb/ and typing miro/planets/Mass > 1.0 into the query box at the top right. (Don’t omit the .0; FluidDB is distressingly strict at the moment, though I am promised it will change.)

In a bookmarking context, this allows you to do queries like Show me all the pages that Terry and Russell have tagged with the tag fluiddb that I haven’t. This is considerably more flexible than del.icio.us or other bookmarking tools.

The last major thing you add to a social bookmarking site to get FluidDB is an API to allow applications to talk to it. Of course, del.icio.us has a (rather good) simple API, but you can’t do very much with it because part of del.icio.us’s excellence is that it can’t actually do very much. By definition, anything that you can do in FluidDB you can do through the API because the API is the only supported way to access FluidDB at all (though, as I say, there are lots of libraries and applications built on FluidDB that use the API). Technically, the API is a RESTful, pure HTTP API that uses JSON when necessary for exchanging data. It is documented here.

FluidDB as a new, more powerful alternative to del.icio.us

FluidDB has the potential to be a very interesting alternative to deli.cio.us. Or perhaps a more accurate statement would be, FluidDB should be considered as a very serious and flexible place to rehouse data currently in del.icio.us. The power and flexibility of its information architecture can allow users to store more kinds of information, about more things, and to query and recombine that information in more flexible ways. (In fact, Joshua Schachter is an investor in Fluidinfo, though I don’t know whether he would endorse anything I’m saying here.)

Today, however, there are some important limitations worth noting.

The main limitation is that there is no application like del.icio.us for FluidDB. There are (at least) two ways to import data from del.icio.us to FluidDB, preserving everything in the export, but until some work is done building a del.icio.us-like client application, it will be awkward to use the data and to add new bookmarks. I’m confident that over the coming weeks and months, applications will be built that will provide basic social bookmarking using FluidDB, but until that time, FluidDB is only really a suitable alternative for technical users.

There are two other minor limitations today. The first that although FluidDB has a much more powerful permissions system than del.icio.us, its design (for rather fundamental and deliberate reasons) does not make it very easy to support private bookmarks in the ‘natural’ way. To be clear, it is entirely possible to have completely private bookmarks in FluidDB: but you need to organize your data in a slightly more complex structure to achieve this and native FluidDB queries on private data will have to look slightly different from native queries on public bookmarks. Such complexity can easily be hidden from the user by an application, and again, I suspect that applications that do this will appear. But they don’t exist yet.

If you do want to import information from del.icio.us to FluidDB, there (are least) two published ways to do so.

Some months ago, I published some python code to github that takes a very direct approach, creating a FluidDB tag for each of your del.icio.us tags and attaching them to objects whose fluiddb/about tag is the URL for the bookmark. At the moment, that script doesn’t upload any information about private bookmarks to FluidDB, but it could obviously do so, and I imagine I’ll add that capability some time over the next few weeks when I decide what I think the best way to do it is. I think this approach mirrors del.icio.us most directly and naturally and is a good choice if you want to use FluidDB primarily for social bookmarking, perhaps expanding to take in tags with values. It requires you to install two python packages, both available on github. You can find information in this blog entry.

But there's a simpler and better alternative from Nicholas Tollervery (@ntoll), who works at Fluidinfo. He has written a single script that just prompts you a few questions before it does the upload. It also uploads all the information, rather than just the tags, as mine does.

I imagine we’ll simplify both approaches over the coming weeks.

This diagram shows the object for a webpage Nicholas and I have both bookmarked

shared-bookmark.png

I found this bookmark my using the FluidDB query

has njr/fluidinfo and has ntoll/delicious/tags/fluidinfo

I had a single del.icio.us tag for this, whereas Nicholas had several. Nicholas has chosen to prefix all his delicious tag names with delicious/tags, which is why the are so long, but by default both his and my script put all the main data in the top level, so that a del.icio.us tag njr/fluidinfo becomes a FluidDB tag njr/fluidinfo, and the title and notes attributes are stored using FluidDB tags title and >notes respectively. Like my script, Nicholas's currently doesn't tag anything in the case of private bookmarks, but his script does create all FluidDB tags that you use, even if some of them are only used for private bookmarks. So if the existence of a particular tag you have is secret, that would be a reason not to use his script. Nicholas's code is available from github and also (perhaps more easily) from the python package index PyPI, at this location. This allows you to install it with setup tools or easy_install etc. I imagine he'll blog about it soon.

An “old del.icio.us” alternative

As will be clear by now, del.icio.us has been pretty influential in my life. A few weeks ago I had an idea for a web site not initially very similar to del.icio.us, which I have been developing slowly. I don’t want to go into details now, but it’s in the general area of checklists, allowing users to find, create, share and use checklists of various kinds. It’s a social application on exactly the del.icio.us model (by which I mean, it has no explicit ratings and is useful even if no one else uses it). More generally, you can think of it as a kind of del.icio.us for user-created and re-mixed content, specifically around checklists for now.

A week ago today I realised that since I was effectively building almost everything you need for del.icio.us, I could easily extend my new website to include an actual del.icio.us alternative, i.e. I could allow users to save bookmarks for websites as well as for content they create in the application itself. I’m generally nervous about extending small ideas and making them more complex, but this seemed like a very minor extension, and I have become increasingly nervous about the future of del.icio.us ever since Yahoo acquired it. News of Yahoo’s intention to divest itself of del.icio.us prompted me to stop wondering and start implementing on Thursday night. I hope to make a limited service available in the next few days. (I’ll update this post and create a new one announcing it when I do.)

Its functionality will be limited at first, but will include (does include, in fact) the ability to import all bookmarks and tags from del.icio.us (private and public) and maintain their state, and all the most basic functionality (creating, editing, deleting bookmarks). There will also be limited social functionality (looking at tags across users etc.) and, of course, the ability to export your data in the same XML format as the one del.icio.us uses.

Over time, I’ll try to make it ever more like the pre-Yahoo del.icio.us, as far as I can remember that. And I think it’s very likely that I will also offer users the option of duplicating bookmarks in FluidDB, to allow for richer sharing, and to provide exactly the del.icio.us-like FluidDB client that I would love myself.

Of course, I realise there are a dozen or more del.icio.us alternatives already up and running, and they are clearly a more stable, safer bet. But I hope that at least a few people will think this approach is interesting enough to try. I will probably add an option to hide all the non-bookmarking functionality, from my site, though I don’t think it will really distract much anyway. (I’ll probably turn it all off until it’s ready, next year, for now anyway.)

Carol Bartz may not be bringing much Christmas cheer at Yahoo, or to del.icio.us users, but I hope that my embryonic del.icio.us replacement and FluidDB can be part of an ecosystem of alternatives that will end up being more empowering for users and will allow hard-core del.icio.us fans to have a future more like the del.icio.us of old.

Maybe, it will be glor.io.us.

[1]New Approaches to Information Management: Attribute-Centric Data Systems, R. Baeza-Yates, T. Jones, and G. Rawlins. Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000 pp. 17-27.

06 October 2010

Ratings and FluidDB

As Terry (@terrycojones) will endlessly tell you, there aren’t really any rules in FluidDB. Yet this blog is largely dedicated to trying to establish and promote useful (voluntary) conventions that I believe will make FluidDB more useful for everyone. Today, I want to talk about ratings and suggest some conventions for those.

Why Conventions Matter

You may have heard that NASA lost its $125m Mars Climate Orbiter spacecraft as a result of using imperial units in some places and metric units in other places. If not, read all about it in New Scientist (Schoolkid blunder brought down Mars probe). While conversion between systems usually possible, life is much simpler, and less error-prone, if everyone uses the same system.

Why Good Conventions Matter

It’s sometimes argued that as long as there is a convention, it doesn’t really matter what it is. I don’t really agree with that. Even though I personally think in imperial units (feet, inches, pounds, chains, furlongs, perches, tons, pints, acres, farenheit etc.), and maintain they are excellent human-scale units for every-day measurement, I would never dream of calculating with them. Clearly, the SI system (metric units) is far superior for calculation. This rather wonderful video of American Chopper mechanics calculating in inches illustrates the problem. (It really is worth watching, and is only two-and-a-half minutes.)

http://www.wimp.com/metricsystem/

Summary Recommendations

I’ll explain them below, but in descending order of importance, my suggestions are as follows:

  1. If you rate things in FluidDB, do it with a tag called rating.

  2. Make your ratings numeric, with 0 as the worst rating and 10 as the best rating. (I’m sure most people will use integers; in principle, floating point values should be just fine too, but apparently FluidDB’s query language is broken with respect to type coercion in inequalities for now, so it’s probably best to stick with integers.)

  3. Declare your tag as a numeric rating from 0 to 10 by tagging the object for your tag with a top-level tag in your namespace called convention with the value fluiddb:rating:0-10.

    If you use the fdb utility, and your FluidDB username is njr (which it isn’t) you could do this as follows:

    fdb tag -a "Object for the attribute njr/rating" convention="fluiddb:rating:0-10"

    This works because the object for the rating tag for user username (always) has the about tag

    Object for the attribute username/rating

    and the fdb tag ... command above will create a tag called convention in your namespace (assuming you use your credentials with fdb, and you don’t already have such a tag) and then tag the relevant FluidDB object with it.

    If you don’t use fdb you should be able to do this with any other FluidDB library or tool. You may first need to find the ID of the object by using the FluidDB query

    fluiddb/about = "Object for the attribute username/rating"
    

    (replacing username with your username), and then using that ID to specify the object you want to tag.

Why These Three Suggestions?

1. Why rating?

One of the core motivating ideas for FluidDB is that different users should be able to share information by placing tags on objects representing things that they both know something about, or have an interest in. Many of the posts here on the AboutTag blog are concerned with conventions for choosing which FluidDB object to use for a particular entity. For example, the suggested object for Lewis Carroll’s book “Alice’s Adventures in Wonderland” is the one with the about tag

book:alices adventures in wonderland (lewis Carroll)

which has ID

03c8ce35-aa5e-4b58-b3ab-ddda55642b15

(There’s a list of The Guardian’s “1,000 Novels everyone must read” and the about tags for their FluidDB objects here.)

One of Terry’s favourite examples involves looking things that he hasn’t read but someone else he knows has rated above some value. So if he were looking for things I have rated above 7 that he hasn’t read, he might use a query such as:

njr/rating > 7 except has terrycojones/has-read

It rather goes without saying, that in order to write such a query, he has to know what tag I use for rating things. Clearly, he could maintain a list of what tags each of his friends uses for rating, but obviously his (and everyone’s) life will be simpler if we all just agree that we’ll use the same tag for ratings. Of course, there’s nothing magic about rating except that Terry has already used it in many of his examples, but for English-speakers at least, rating seems like a reasonable choice.

2. Why 0–10, numeric

I feel pretty strongly that rating should be numeric because ratings are ordinal, i.e. the core idea of a rating scale is that the different ratings go from worst to best in a well-defined sequence. Numbers are generally the best choice for ordinal values. Among other virtues, they allow us to compute statistics, (mean, mode, median, standard deviation, min, max ...) and are much more international than words.

Again, the choice of 0 to 10 is arbitrary, but is at least one of a number of scales already in common use—perhaps even the most common.

Clearly, lots of other schemes are in widespread use, including

  • star ratings — often either 1–5 (e.g. Apple App store), or 0–5 (Guardian Film ratings) but sometimes 0–4, sometimes 0–3, occasionally 0–10 or 1–10 (e.g. IMDb). (In fact, IMDb computes averages of its star ratings and reports them numerically.)

  • phrase ratings — e.g. such as
    • “Highly recommended”, “Recommended”, “Neutral”, “Disliked”, “Highly disliked” (suggested by @jkakar),
    • “Excellent”, “Good”, “Fair”, “Poor”, “Very Poor”
  • Percentages

  • Grades A+ to C– or A+ to E–.

and if people want to use these, obviously there’s nothing I can (or would want) to do to stop them. At least in these cases, there are straightforward ways to translate, but I maintain that all of these are really just proxies for numeric scales, and that there is a natural way to map each of them onto a 0–10 scale.

If there’s a clamour, we can certainly introduce other conventions (e.g. fluiddb:star-rating:1-5) and suggested mappings between them, but for now, I suggest just going with 0–10.

3. Why Tag the Tag?

If everyone ends up using a rating tag called rating and having the values run from 0–10, there won’t be much need for people to declare their conventions. Given the open nature of the system, however, and the unlikihood of conventions being mandated, an opt-in declaration will help.

Obviously, for Terry to be able to formulate the query

njr/rating > 7 except has terrycojones/has-read

he needs to know the name of the tag and the range of values it takes on, and by my declaring my rating tag to be a 0–10 rating tag, he can have more confidence that I at least intend it to be used that way. (He can potentially check the range of values I’ve actually used as well, but if he sees only zeros and ones, it’s hard for him to know whether I’m in fact using a 0–1 scale or am simply hard to please.)

The Conventions

Just as I have started to collate conventions for about tags, I intend, before too long, to start collating conventions for tags such as rating; has-read will probably be next. For now, I’ve simply instantiated an object in FluidDB with the about tag fluiddb:rating:0-10 and presuaded miro to tag it with a description. Here’s how fdb sees it:

fdb show -a fluiddb:rating:0-10 /miro/description /id
Object with about="fluiddb:rating:0-10":
  /miro/description = "FluidDB numeric rating scale from 0 (worst) to 10 (best). Usually used with tags called rating."
  /id = "5fb6dc31-addd-4a75-a08b-4decec269ff5"

Happy rating!

24 September 2010

Seeing What's In FluidDB

After a hiatus, I’m back to doing lots of things with FluidDB.

One of the things is building a service to visualize some of the data in FluidDB. For example: try this link a modern browser:

http://abouttag.appspot.com/about/butterfly/planet:Earth

You should have soeen something that looks similar (but perhaps not identical) to the image below:

_images/butterflyPlanetEarth.png

It should be similar to the one below, but different in these respects:

  1. It will be generated on the fly by querying FluidDB. As a result, it may have different tags. Or values. Or both.
  2. Some of the tags should be clickable. In particular, the tag /miro/planetes/db-next-record-about="planet:Mars" should take you to the next planet.
  3. You can scale the image without pixelation. If you zoom in or out with the browser, everything should just work.

When I say you need a modern browser, what I really mean is that it doesn’t work in Internet Explorer; I suspect it won’t even work in Internet Explorer 9. The reason for this is that the image will be in SVG (scalable vector graphics), served as XHTML, which is not supported by any version of Internet Explorer, though all recent versions of Firefox, Chrome, Safari and Opera (including mobile version) are just fine. Although Internet Explorer 9 does supposedly support SVG, I believe it still won’t supprt XHTML, so this will continue not to work. (I’d go to HTML5, but it isn’t ready for this either.)

Effectively, what I’ve built, is a web service that will draw a picture of any object in FluidDB. You can specify the object using its about tag or its object ID, and there are two styles of drawings. The one above is what I call the “Butterfly” style. There’s also a “daisy” style, exemplified by the image below, which you can get to (live) by clicking on this link:

http://abouttag.appspot.com/about/daisy/element:Hydrogen

_images/daisyElementHydrogen.png

Again, in the live version, the link to Helium works.

Using the Service

At the moment, the only interface is the address bar in your browser. When drawing an image, you have three choices to make:

  • whether to specify the object by about tag or id
  • whether to show it as a daisy or as a butterfly
  • whether to embed it in HTML (specifically, XHTML), or just serve the SVG. The latter will actually work fine in most modern browsers too, and is more convenient if you want to embed the result. Maybe that will even work in IE9; who knows?

We’ll take Alice In Wonderland as an example and show the eight variations.

The about tag for the book Alice’s Adventures in Wonderland, by Lewis Carroll, is book:alices adventures in wonderland (lewis carroll). (See http://abouttag.blogspot.com/2010/03/how-to-tag-books-in-fluiddb.html and http://abouttag.blogspot.com/2010/03/perfect-about-tag-books-in-fluiddb.html for a description of conventions for about tags for books in FluidDB.) And its object has ID 03c8ce35-aa5e-4b58-b3ab-ddda55642b15.

The four options for producing HTML are:

(You can entitize the book title if you like, but it works in most cases without. Having said that, my blogging system screwed up without them, so the actual links are entitized here.)

The first and third look like this (at the time of blogging).

_images/butterflyBookAlice.png

The four options for producing the raw SVG, are as follows:

That’s it.

The code seems to be reasonably reliable, but there a few things to note. Specifically:

  • Very long tags lead to big pictures
  • Very large numbers of tags can lead to the Google Application Engine (on which it is running) timing out. (Those cases are rare.) If it times out on an object that you think it should manage, just refresh and it probably will.
  • Putting in about tags that are URLs can be problematical This is a problem with URL escaping. I will address that sometime.

I suppose the one other thing is the hyperlinking. Klee and Miró, which are the bits of software doing the drawing, try to spot things that they think are hyperlinks and put them in. This seems to work OK, but the algorithms are approximate and may get it wrong occasionally.

If you play, let me know what you think. Of course, lots more bells and whistle can, and probably will, be added. And with incredible speed, Pierre André has already added it to FDBExplorer (http://fluiddbexplorer.appspot.com), from where you can pop up a visualization in a window.

Final note: There is another part to the app. If you just go to the front page (http://abouttag.appspot.com), you will see a form that you can type a title and author into to find the recommended about tag for that book. Unfortunately, Google has changed something and this doesn’t work if you’re not logged in, but if you have a Google account, you can log in with it (Google handles the authentication for all apps on appspot.com, which is a Google domain). It will remember all the books you put in.

The point of the system is to do the standardization of form so that it’s more likely different people will record information on the same object. For more details of that, see the article on the abouttag library at http://abouttag.blogspot.com/2010/03/about-tag-conventions-in-fluiddb.html.

A few more conventions have emerged, which I will add to that list, but it’s still somewhat useful.

Labels