06 October 2010

Ratings and FluidDB

As Terry (@terrycojones) will endlessly tell you, there aren’t really any rules in FluidDB. Yet this blog is largely dedicated to trying to establish and promote useful (voluntary) conventions that I believe will make FluidDB more useful for everyone. Today, I want to talk about ratings and suggest some conventions for those.

Why Conventions Matter

You may have heard that NASA lost its $125m Mars Climate Orbiter spacecraft as a result of using imperial units in some places and metric units in other places. If not, read all about it in New Scientist (Schoolkid blunder brought down Mars probe). While conversion between systems usually possible, life is much simpler, and less error-prone, if everyone uses the same system.

Why Good Conventions Matter

It’s sometimes argued that as long as there is a convention, it doesn’t really matter what it is. I don’t really agree with that. Even though I personally think in imperial units (feet, inches, pounds, chains, furlongs, perches, tons, pints, acres, farenheit etc.), and maintain they are excellent human-scale units for every-day measurement, I would never dream of calculating with them. Clearly, the SI system (metric units) is far superior for calculation. This rather wonderful video of American Chopper mechanics calculating in inches illustrates the problem. (It really is worth watching, and is only two-and-a-half minutes.)

http://www.wimp.com/metricsystem/

Summary Recommendations

I’ll explain them below, but in descending order of importance, my suggestions are as follows:

  1. If you rate things in FluidDB, do it with a tag called rating.

  2. Make your ratings numeric, with 0 as the worst rating and 10 as the best rating. (I’m sure most people will use integers; in principle, floating point values should be just fine too, but apparently FluidDB’s query language is broken with respect to type coercion in inequalities for now, so it’s probably best to stick with integers.)

  3. Declare your tag as a numeric rating from 0 to 10 by tagging the object for your tag with a top-level tag in your namespace called convention with the value fluiddb:rating:0-10.

    If you use the fdb utility, and your FluidDB username is njr (which it isn’t) you could do this as follows:

    fdb tag -a "Object for the attribute njr/rating" convention="fluiddb:rating:0-10"

    This works because the object for the rating tag for user username (always) has the about tag

    Object for the attribute username/rating

    and the fdb tag ... command above will create a tag called convention in your namespace (assuming you use your credentials with fdb, and you don’t already have such a tag) and then tag the relevant FluidDB object with it.

    If you don’t use fdb you should be able to do this with any other FluidDB library or tool. You may first need to find the ID of the object by using the FluidDB query

    fluiddb/about = "Object for the attribute username/rating"
    

    (replacing username with your username), and then using that ID to specify the object you want to tag.

Why These Three Suggestions?

1. Why rating?

One of the core motivating ideas for FluidDB is that different users should be able to share information by placing tags on objects representing things that they both know something about, or have an interest in. Many of the posts here on the AboutTag blog are concerned with conventions for choosing which FluidDB object to use for a particular entity. For example, the suggested object for Lewis Carroll’s book “Alice’s Adventures in Wonderland” is the one with the about tag

book:alices adventures in wonderland (lewis Carroll)

which has ID

03c8ce35-aa5e-4b58-b3ab-ddda55642b15

(There’s a list of The Guardian’s “1,000 Novels everyone must read” and the about tags for their FluidDB objects here.)

One of Terry’s favourite examples involves looking things that he hasn’t read but someone else he knows has rated above some value. So if he were looking for things I have rated above 7 that he hasn’t read, he might use a query such as:

njr/rating > 7 except has terrycojones/has-read

It rather goes without saying, that in order to write such a query, he has to know what tag I use for rating things. Clearly, he could maintain a list of what tags each of his friends uses for rating, but obviously his (and everyone’s) life will be simpler if we all just agree that we’ll use the same tag for ratings. Of course, there’s nothing magic about rating except that Terry has already used it in many of his examples, but for English-speakers at least, rating seems like a reasonable choice.

2. Why 0–10, numeric

I feel pretty strongly that rating should be numeric because ratings are ordinal, i.e. the core idea of a rating scale is that the different ratings go from worst to best in a well-defined sequence. Numbers are generally the best choice for ordinal values. Among other virtues, they allow us to compute statistics, (mean, mode, median, standard deviation, min, max ...) and are much more international than words.

Again, the choice of 0 to 10 is arbitrary, but is at least one of a number of scales already in common use—perhaps even the most common.

Clearly, lots of other schemes are in widespread use, including

  • star ratings — often either 1–5 (e.g. Apple App store), or 0–5 (Guardian Film ratings) but sometimes 0–4, sometimes 0–3, occasionally 0–10 or 1–10 (e.g. IMDb). (In fact, IMDb computes averages of its star ratings and reports them numerically.)

  • phrase ratings — e.g. such as
    • “Highly recommended”, “Recommended”, “Neutral”, “Disliked”, “Highly disliked” (suggested by @jkakar),
    • “Excellent”, “Good”, “Fair”, “Poor”, “Very Poor”
  • Percentages

  • Grades A+ to C– or A+ to E–.

and if people want to use these, obviously there’s nothing I can (or would want) to do to stop them. At least in these cases, there are straightforward ways to translate, but I maintain that all of these are really just proxies for numeric scales, and that there is a natural way to map each of them onto a 0–10 scale.

If there’s a clamour, we can certainly introduce other conventions (e.g. fluiddb:star-rating:1-5) and suggested mappings between them, but for now, I suggest just going with 0–10.

3. Why Tag the Tag?

If everyone ends up using a rating tag called rating and having the values run from 0–10, there won’t be much need for people to declare their conventions. Given the open nature of the system, however, and the unlikihood of conventions being mandated, an opt-in declaration will help.

Obviously, for Terry to be able to formulate the query

njr/rating > 7 except has terrycojones/has-read

he needs to know the name of the tag and the range of values it takes on, and by my declaring my rating tag to be a 0–10 rating tag, he can have more confidence that I at least intend it to be used that way. (He can potentially check the range of values I’ve actually used as well, but if he sees only zeros and ones, it’s hard for him to know whether I’m in fact using a 0–1 scale or am simply hard to please.)

The Conventions

Just as I have started to collate conventions for about tags, I intend, before too long, to start collating conventions for tags such as rating; has-read will probably be next. For now, I’ve simply instantiated an object in FluidDB with the about tag fluiddb:rating:0-10 and presuaded miro to tag it with a description. Here’s how fdb sees it:

fdb show -a fluiddb:rating:0-10 /miro/description /id
Object with about="fluiddb:rating:0-10":
  /miro/description = "FluidDB numeric rating scale from 0 (worst) to 10 (best). Usually used with tags called rating."
  /id = "5fb6dc31-addd-4a75-a08b-4decec269ff5"

Happy rating!

Labels