About Tag: about

Showing posts with label about. Show all posts

15 January 2012

Updates to the Web Apps Shell-Fish and About Tag

I just pushed new versions of Shell-Fish (the online version of Fish) and the About Tag app.

There are some quite significant changes, which I will summarize briefly.

Shell-Fish has been updated to the latest version 4.26. It has sat at 4.00.0 for a ridiculously long time, and that became self-sustaining because non-trivial changes were needed to the web version to make it use the new one.

The About Tag app now includes a clone of Shell-Fish. Click the bottom tag on the logo to use this.

The search functionality in About Tag has been overhauled and remodelled on the much better version used at yet another of my sites, Art of Tagging, which is hosted on the intriguing PythonAnywhere platform. Changes include:

When you click a link, the diagram now appears inline

If you re-click, it refreshes

If you’re signed in, you can tag items and the diagram updates.

The changes are quite major, and I haven’t tested them as thoroughly as I would wish, so I’ve probably broken some thing. Sorry. Let me know and I’ll try to fix them.

The three sites all use the same model of requiring your Fluidinfo credentials; unfortunately they don’t share databases, so if you want to use all three, you need put them in three times. This is crazy; all three sites should merge, but that will take time.

Although the search functionality on the About Tag site mostly works well, Google’s hard 10-second time limit on queries means that queries that match a lot of objects tend to time out. The version of Art of Tagging doesn’t suffer from this limit, but unfortunately that site is quite often down for maintenance.
There are many small changes to Shell-Fish but two really major bits of new functionality:
Aliases and syncing now work. If you create aliases on a local version, type sync in Shell-Fish and they will appear there, and vice-versa. Among other things, this makes it easy to use sequences and access them from both sites.
You can drop the -i and -a in almost all cases. So (in the most common case), rather than:
fish> tag -a "artist:melody gardot" rating=10
you can simply use:
fish> tag "artist:melody gardot" rating=10
though the old form will, of course, continue to work, along with -q and the new -@ (for anonymous objects).
My process of turning everything from XHTML to HTML5 continues. Shell-Fish and the About Tag we app are now pure HTML5, like this blog. Previously, they were all XHTML, which worked for everything under the Sun except Microsoft Browsers; at least now Internet Explorer 9 should be able to use some of the functionality. Users of Internet Explorer 1–8, and therefore of Windows XP, remain out of luck (in more ways than I can possibly enumerate, most of which are rather more significant than not being able to use these web sites).

27 December 2011

The British Library Catalogue / British National Bibliography

I have added to Fluidinfo information on approximately 2.5 million books drawn from the roughly 3 million records in the British National Bibliography, which documents the British Library’s Catalogue.

As ever, I have used the book-u convention (implemented using the Python abouttag library) to select about tags for the objects, and have tagged the books in Fluidinfo under the book user. Data specific to the British National Biography (BNB) is stored in the namespace book/bnb, while more generic data (derived from the information contained in the Bibliography) is stored directly in the book namespace.

Here is an example of a book that has been augmented with data from the British National Library. The book is George Orwell’s Animal Farm, and it is illustrated using the About Tag visualizer. (If you can’t see the picture below, upgrade to the latest version of your browser or see here for information on why you might be having trouble.) The green tags are the new ones.

Notice that, because of the careful normalization inherent in the book-u convention, where the book is already in Fluidinfo, the new data has generally been added to the existing object corresponding to that book, as in the case above.

The core data that should almost always be present is:

the about tag fluiddb/about, normalized using the book-u convention:

book:animal farm (george orwell)
the book/author tag, containing the best author information I was able to extract, in this case

George Orwell

Where there is more than one author, they are generally shown separated by commas, with the last joined with an and (with no Oxford Comma). For example, The Feynman Lectures on Physics, by Feynman, Leighton and Sands has
$ fish show -a 'book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)'
/book/author

Object with about="book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)":
  /book/author = "Richard P. Feynman, Robert B. Leighton and Matthew L. Sands"
or, graphically:

The book/author tag has had a lot of processing done to it, as described below.
the book/title field, which is usually almost identical to that in the BNB data. In this case it is:

Animal farm

I have not altered the capitalization, which is therefore generally consistent with some entry in the BNB database (though I would really prefer it were in Title Case).

the book/source tag shows where the base data was taken from. This tag’s value is a set of strings, each of which corresponds an entry in one of the 17 files from which the BNB data was extracted. The entries consist of

the name of the file (always BNBrdfdcNN.xml) where NN runs from 01 to 17

a dash -

the datestamp on that file (always 20101115 at present)

the digit zero (0) and a # sign

the record number in the file, starting from 1, with six digits.

Since multiple bibliographic entries can correspond to the same work, there is sometimes more than one of these.
the book/r tag is a pseudo-random floating point value with 0.0 ≤ book/r < 1.0.

Some of the raw data has also been added, with almost no cleaning up, under the book/bnb namespace. The BNB data uses the Dublin Core metadata standard, and includes:

bnb/creator, which is the person or organization primarily responsible for the creation of the work. This is sometimes blank, and is stored as a single string value.

bnb/contributors, which is a list of contributors, sometimes including the creator and sometimes not.

bnb/dewey is the set of Dewey Decimal classifications found on the records corresponding to this book.

bnb/isbn is the set of international standard book numbers found on the records corresponding to this book.

bnb/id is the set of British Library IDs found on the records corresponding to this book. (I’m not entirely clear what this identifier is, but it appears to be important and well populated.)

Other information is available in the data (including classification information), and I will probably extract this and add it at a later time.

Finding, Inspecting and Tagging Books in Fluidinfo¶

There are multiple ways of retrieving book data from Fluidinfo and of tagging it.

Probably the easiest and most general method is to go to http://artoftagging.com and do a search that involves a book and some keywords from the title and/or author. A list of results should come back and you can see a visualization of any of them by clicking the link If you have a Fluidinfo account, you can create an account at artoftagging.com and then save your Fluidinfo details there. Once logged in, you will then be able to add your own tags to any object you find.
If you just want to construct the about tag for a book, you can do that using the online version of the Fluidinfo Shell, Fish. Once there, type, for example:
fish> about book "Animal Farm" "George Orwell"
book:animal farm (george orwell)

fish> about book "The Feynman Lectures on Physics' 'Richard P. Feynman"
"Robert B. Leighton" "Matthew L. Sands"
book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)
(The quotes tell Fish that “Animal Farm” is the title and “George Orwell” a single author.) Alternatively, you can download and install Fish on your own machine. (It is available from Github.) You can then type the same commands, after fish, e.g.:
$ fish> about book "Animal Farm" "George Orwell"
book:animal farm (george orwell)
You can then use any Fluidinfo tool, including the new Object Browser, to work with that object, signing in with Twitter if you like.
Another easy way of finding an about tag for a book is to find it on Amazon (US or UK, for now) and use the az-fish bookmarklet available at the top of the online Fish (drag it to your browser’ toolbar). The bookmarklet will take the item on the current Amazon page and issue the appropriate Fish command to find the about tag. (You don’t need to log into Fish or Fluidinfo to do this.)

The Hierarchy of Books: Works and Manifestations¶

The International Federation of Library Associations (IFLA) describes a hierarchy of four kinds of “book” entities in its report Functional Requirements for Bibliographic Records. These are:

works

expressions

manifestations

items.

Quoting from that report:

“The entities defined as work (a distinct intellectual or artistic creation) and expression (the intellectual or artistic realization of a work) reflect intellectual or artistic content. The entities defined as manifestation (the physical embodiment of an expression of a work) and item (a single exemplar of a manifestation), on the other hand, reflect physical form.”

Loosely, a work is the conceptual book, usually described by the combination of a title and author—Animal Farm by George Orwell.

The report describes an expression of a work as “the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms.” Thus George Orwell’s Animal Farm can be translated into different languages, laid out differently, typeset on pages, or in digital form, or recorded as spoken words, and these correspond to different expressions of that same book. There may also be different editions, printings etc., which may have slightly different content. Again, these are different expressions of the same conceptual work. (Occasionally, expressions may encompass several works, such as in the case of compendia.)

Moving down the hierarchy, a manifestation is a particular rendering of a work into physical form — “the physical embodiment of an expression of a work.” Note that “[A]s an entity, manifestation represents all the physical objects that bear the same characteristics, in respect to both intellectual content and physical form.” Thus, all the copies of the same printing of the same edition of Animal Farm that are essentially indistinguishable collectively correspond to a manifestation of George Orwell’s Animal Farm.

Finally, an item is an individual copy of a book: “a single exemplar of a manifestation.”

The entries in the British Library’s catalogue correspond literally to items, but conceptually to manifestations, but the objects to which I have attached the data in Fluidinfo correspond to works. This is why the c. 3 million records reduce to c. 2.5 million Fluidinfo objects, and why some of the objects have multiple ISBNs etc. It is entirely possible to create further objects at the level of manifestations (and even items, if someone really wants to do so), and even more so at the level of expressions, but I have not done this yet.

The reason I have concentrated on works rather than manifestations is that this seems much the most important level to represent in a system like Fluidinfo: with important exceptions, when people want to rate or comment on a book, it is most often the work, rather than the manifestation, that they are interested in. Moreover, collecting together information about the different ISBNs associated with a single work is positively helpful. That is not to say that there isn’t a case for creating other objects at the level of expressions or manifestations.

Further Work¶

There is a great deal more that can be usefully done with the fabulous data from the British Library. While I am not committing to doing these, tasks on my list list include:

Authors. Creating an object corresponding to each creator/author/contributor. I plan to use about tags of the form author:normalized name (birth-year) for these, e.g. author:George Orwell (1903). The required data is largely available in the BNB dataset. I would then plan to add a book/related-authors tag to each book, pointing to its authors’ objects and, on the author objects, corresponding sets of book/related-books tags pointing back to their works.
Upload Checking. Checking the everything uploaded OK. I count 2,558,738 unique books (as works) in the BNB dataset, and I appeared to upload all of these successfully (getting HTTP 204 statuses back from Fluidinfo). However, when I count objects having a book/r tag, I get only 2,468,661, a shortfall of 90,077.

Whether this indicates a problem or not is unclear, as if I count the number of books with a book/source but no book/r, with the query
has book/source except has book/r
it reports 18,921 such books, but as far as I can tell, all those it finds in fact have a book/r, so it appears that Fluidinfo is having some difficulty executing some queries correctly at the moment.
About Tag Checking. I had to use some fairly hairy code to coerce the BNB data into the correct form to generate canonical about tags in the book-u convention, and it has definitely failed in some cases. For example, I have seen at least one example where the surname of an author in the BNB data preceded the forename but without a comma, so that forename and surname will have been reversed. To the extent that I can detect these problems, I will try to fix them.

Recent additions. I believe the British Library has issued updates with recent additions (since November 2010); I certainly plan to get that data and import it in a similar fashion, and then to set up a CRON job to do that regularly. In this way, I hope the dataset will be living and always current.

Categorizations. The BNB data includes subject categories for the records, which I have not imported thus far. I will do so.

Year information. There is information about publication dates in the BNB data, but it is not in a very structured form. If I am able to extract it with a satisfactory degree of reliability, I will get this too. Obviously, different manifestations will have different publication dates, so this will probably be a set-valued tag.

Enjoy the data, and let me know if you find problems.

I expect I will write a number of other posts on issues associated with this data.

About Tag Goes HTML5 with Embedded SVG: Browser Requirements

This page uses HTML5 and an embedded Scalable Vector Graphics (SVG) diagram, and acts as a test. From this point on (27th December 2011) my plan is to use HTML5 with embedded SVG as the default format for posts, and therefore if you use a browser (or feed reader) that does not support this, you will miss out.

You should see an elegant diagram below. If you not, it probably means that you are not using a modern, standards-compliant HTML5 Browser.

I have tested this page on the following:

A Macbook Pro running OS X 10.7.2 (Lion) with Safari 5.1.2, Chrome 16.0.912.63, Firefox 8.0.1, Opera 11.6
A Mac Pro running OS X 10.6.8 (Snow Leopard) with the same browsers.
An iPad running iOS 5.0.1 (in Safari, naturally)
An iPhone running iOS 5.0.1 (again, Safari, of course)
A Sony Vaio running Windows 7 with Internet Explorer 9
A prehistoric Dell latitude laptop running Chrome 16 and Firefox 9

and everything looks good. It does *not* work with Internet Explorer 8, but then, what does? It also does not work with many older versions of Firefox, Chrome, Safari or Opera. So this is a good time to upgrade. I don't know which Android or Linux growers it works with, but I would guess it will work with some.

I imagine the diagrams will not show up in most feed readers, and that is unfortunate, but I think this is a time to push forward, so that is what I am doing.

Among the other marvellous benefits of SVG, it is scalable (just like the S says), which means that if you zoom the browser (generally command-plus on macs, and control-plus on Windows) the diagram will scale too, without looking terrible. Wonder of wonders.

07 July 2011

About Tags In Fish

I’ve added a new command to fish (and updated the online version, Shell-Fish accordingly) to allow easy construction of standardized about tags using the conventions from the abouttag library. They make use of a new abouttag function, available in the new generic.py file in the abouttag library, which takes the object type as its first parameter, and the usual parameters as a variable parameter list.
The new fish command is abouttag, though can also be abbreviated to about and its general form is:

fish abouttag <object type> <object specifiers>

The object type is something like book, album or fi-user and the object specifiers are the key parameters used to describe that object, in the same order as they are used in the corresponding function from the abouttag library.
The easiest way to illustrate and define these is with examples. The following examples are taken from a Unix system; on Windows, use double quotes rather than single around parameters. In the online version (Shell-Fish), and on Unix, single or double quotes work. In the online version, you don’t need the fish prefix (though it does work).
I should note that part of the motivation for adding this functionality is a desire to allow the command to be used to specify objects without knowing the exact form of their about tags. In Unix-like systems (Linux, Mac OS X, Solaris etc.), this is possible by using left quotes, which can be placed inside double quotes. Thus, the following, slightly ungainly command (using all three forms of quote) works, at least in bash:

$ fish show -F -a "`fish abouttag book 'Gödel, Escher, Bach: An Eternal Golden Braid' 'Douglas R. Hofstader'`" njr/rating
Object with about="book:gödel escher bach an eternal golden braid (douglas r hofstader)":
  njr/rating = 10

I will leave it to the reader to judge whether this is easier than using cut and paste. For those who don’t know about left quotes in Unix shells, a command enclosed in left quotes within another command is evaluted before its enclosing command; its output replaces the left-quoted phrase on the original command line. So in the case above, we first run the command

fish abouttag book 'Gödel, Escher, Bach: An Eternal Golden Braid' 'Douglas R. Hofstader'

which generates

book:gödel escher bach an eternal golden braid (douglas r hofstader)

as its output. In effect, the outer command is then transformed to

fish show -F -a "book:gödel escher bach an eternal golden braid (douglas r hofstader)" njr/rating

I hope to extend shell-fish, the on-line version of fish, to support left quotes, but that may take a little while.
The following examples are taken from the fish documentation, which is available online from http://fluiddb.fluidinfo.com/about/fish/fish/index.html.

Books and related items using the book-u convention (book, author)

$ fish abouttag book 'Gödel, Escher, Bach: An Eternal Golden Braid' 'Douglas R. Hofstader'
book:gödel escher bach an eternal golden braid (douglas r hofstader)

$ fish abouttag book 'The Feynman Lectures on Physics' 'Richard P. Feynman' 'Robert B. Leighton' 'Matthew Sands'
book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew sands)

$ fish abouttag book 'The Oxford English Dictionary: second edition, volume 3', 'John Simpson', 'Edmund Weiner'
book:the oxford english dictionary second edition volume 3 (john simpson; edmund weiner)

$ fish abouttag author 'Douglas R. Hofstadter' 1945 2  15
author:douglas r hofstadter (1945-02-15)

Music-related items (track, album, artist, isrc-recording)

$ fish abouttag track 'Bamboulé' 'Bensusan and Malherbe'
track:bamboulé (bensusan and malherbe)

$ fish abouttag album 'Solilaï' 'Pierre Bensusan'
album:solilaï (pierre bensusan)

$ fish abouttag artist 'Crosby, Stills, Nash & Young'
artist:crosby stills nash & young

$ fish abouttag isrc-recording 'US-PR3-73-00012'
isrc:USPR37300012

URLs and URIs (URI, URL)

$ fish abouttag uri FluidDB.fluidinfo.com
http://fluiddb.fluidinfo.com

$ fish abouttag url https://FluidDB.fluidinfo.com/one/two/
https://fluiddb.fluidinfo.com/one/two

$ fish abouttag URI http://fluiddb.fluidinfo.com/one/two/
http://fluiddb.fluidinfo.com/one/two

$ fish abouttag URL 'http://test.com/one/two/?referrer=http://a.b/c'
http://test.com/one/two/?referrer=http://a.b/c

Fluidinfo objects (fi-user, fi-namespace, fi-tag)

$ fish abouttag fi-user njr
Object for the user named njr

$ fish abouttag fi-namespace njr/misc
Object for the namespace njr/misc

$ fish abouttag fi-ns njr/private
Object for the namespace njr/private

$ fish abouttag fi-tag terrycojones/private/rating
Object for the attribute terrycojones/private/rating

Database components (db-table, db-field)

$ fish abouttag db-table 'elements'
table:elements

$ fish abouttag db-field 'name' 'elements'
field:name in table:elements

Miscellaneous (planet, element)

$ fish abouttag planet 'Mars'
planet:Mars

$ fish abouttag element 'Helium'
element:Helium

24 May 2011

A Search Engine for Fluidinfo

I wrote an extremely simple search front-end for Fluidinfo which you can access at http://abouttag.appspot.com/search.

It is extremely simple. You type one or more search terms into the box and it “searches” Fluidinfo about tags for those terms.

For example, here’s what happens if you type in solitude:

and here’s what happens if you type in marquez book:

Here’s what you need to know:

All this does is turn this into a values query on Fluidinfo that ANDs together the search terms (after white-space stripping). So the query part for these two searches become
fluiddb/about matches "solitude"
and
fluiddb/about matches "marquez" AND fluiddb/about matches "book"
respectively.
I don’t fully understand Fluidinfo’s string matching, which is based on Lucene, but it is fairly search-engine like. I think the following is true:

case is ignored in matching

punctuation is discarded

only whole-words match

accented characters match themselves (case insensitively) and not their non-accented counterparts, and vice versa. So café matches CAFÉ but not cafe and CAFE matches cafe but not CAFÉ. (This was broken when this was originally posted, but is fixed now.)

If we’re lucky, Manuel (@ceronman) or Esteve (@esteve) might add clarification in the comments, which I will promote to here if appropriate.

Consequences of the above include:

You can’t search on prefixes like film:, because the puntuation is discarded (though you can search on film and it will match things containing film:)

There is no stemming or substring matching, so soli won’t match solitude etc.

At the moment a maximum of 100 results are returned and there is no paging implemented; I plan to add that soon.

Result order is essentially random. If I implement paging, I will probably sort them. My first thought is to sort them as shortest-to-longest, with an alphabetical subsort to break ties. (Comments?)

Various links are returned for each matching object.

The main link points to the raw Fluidinfo object, accessed though /about. This will show you its tags as a JSON dump.

The object’s ID is shown underneath, and that links to the raw object in Fluidinfo, this time through /objects.

Links to both the butterfly and daisy visualizations from http://abouttag.com are provided.

Finally, a link to the object in P A Parent’s Fluidinfo Explorer is given.

I thought about adding a curl link too, that would show the syntax for accessing the object with curl (cURL, if you prefer), but I couldn’t really think of a neat way of doing it; a link to a one-line page seems over the top and I hate pop-ups. I suppose some kind of javascript manipulation to show the curl text below would be a possibility. Let me know if you would find this useful.

Like the rest of the About Tag site, the application is built on Google’s App Engine. Unfortunately, this implements a time-out after 10 seconds on all HTTP requests, and even more unfortunately, some searches in Fluidinfo take more than 10 seconds. If you see a time-out, that’s probably what’s happening. This is usually because too many results are being returned. Unfortunately, Fluidinfo does not implement any form of paging or limiting of results at the moment, so the only way round this is to write a more specific query that will have fewer results.

For example, at the moment, when I search on book, it consistently times out; if I instead search on book orwell, it consistently works.

There’s not much I can do about this: the Fluidinfo team is working hard on making Fluidinfo faster and is (I believe) actively considering implementing some kind of paging mechanism.

At the moment, only the about tag (fluiddb/about) is searched, (which is, I suppose, appropriate for this blog/site). It would be very easy for me to provide other interfaces. One obvious thing would be to allow the user to select the tag searched, and another would be to allow a full Fluidinfo query to be typed. If there’s interest, I can do these.

If you want to jump straight to results, you can just add a ?q=terms to the end of the search URL (http://abouttag.appspot.com/search). For example, http://abouttag.appspot.com/search?q=george+orwell will reveal what Fluidinfo knows about the great man. Use + to separate search terms in the URL or, if you prefer, use percent encoding.

This was implemented extremely quickly, and has only been tested very briefly. Let me know if you find problems, whether you find it useful, if you’d like any of the other versions etc.

27 April 2011

Terry's Query

Regular readers of this blog will know that I have an special interest in conventions for Fluidinfo tags and values—especially the about tag—that some might consider borderline obsessive. Ironically, much of this focus comes from thinking about the canonical query that Terry (@terrycojones) always used to introduce Fluidinfo with. It usually went something like this:

Find me all the books that Russell (@rustlem) has rated more than 8 that I haven’t read.

This typically gets translated into a query like

rustlem/rating > 8 except terrycojones/has-read

I would argue

that this isn’t really the right query

that it doesn’t necessarily bring back the information Terry would really want

that for it to work requires more some conventions

Let’s take these in turn.

1. The Wrong Query: Specifying the Kind of Object¶

At the simplest level, Terry’s English-language query specifies books, but his Fluidinfo query doesn’t. So as a bare minimum, Terry needs to add something to restrict the query to books.

My own favoured approach (as exemplified by both the book-1 and book-u conventions) is to prefix the about tag with book:. In principle, this would allow a query to pull out books only. An approximation would be to use the query:

(fluiddb/about matches "book:" and rustlem/rating > 8)
except terrycojones/has-read

though the way that the match operator works today actually throws away punctuation, so this really just restricts it to about tags containing book, rather than those that contain “book:”. I hope that in time we’ll get some alternative string matching operators so that we could, for example, use a regular expression such as

(fluiddb/about =~ /^book:.*$/ and rustlem/rating > 8)
except terrycojones/has-read

or perhaps an operator such as starts-with:

(fluiddb/about starts-with "book:" and rustlem/rating > 8)
except terrycojones/has-read

Of course, there are plenty of other ways we might identify books. These include:

Looking for a tag that somehow indicates the kind of object, if we know such a tag and trust it to be reasonably thorough; for example, I have started an embryonic classification project using the tag njr/index/class. It hasn’t got very far yet, but a lot twitter users have their njr/index/class set to “twitter-user” and a few books (ironically, not the ones starting book: yet, but some others) have a class of book. I hope, over time, to expand this to cover most objects with about tags in Fluidinfo.

(I keep wondering whether “kind” would have been a better word than “class”. But if I had chosen “kind”, I suspect I’d be worrying that “class” would have been better. At least I avoided “type”!)
An interesting variation of the above would be to have some kind group-writable tag for class. I’ve been talking to Terry and some of the other Fluidinfo people about the idea of a wiki user for Fluidinfo. Lots of people would have write access to some or all tags/namespaces under the wiki user. If this happened, the idea would be that when you want to put personal data into Fluidinfo, you would obviously use your own namespace, but if you were wanting to put factual/reference data in, you might do that using the wiki user. This would potentially solve the problem of knowing which tag to look for to find core information like a book’s author and title, a film’s director and year etc. As in wikipedia, there would be potential for spam and other abuse, but at least in Fluidinfo we have permissions and policies, so we could be permissive in allowing people to write to in the wiki namespaces but harsh in removing access for abusers. I’m not totally convinced by this idea, but I can see many virtues in it.

If there were such wiki user, and it had a class tag, the query might become:
(wiki/class = "book" and rustlem/rating > 8)
except terrycojones/has-read
If we happen to know that Russell chooses to tag his books (only) with a particular tag such as has-read, we can add a clause like that to the query:
(has rustlem has-read and rustlem/rating > 8)
except terrycojones/has-read
though in practice, it might be more likely that has-read would also be used for things like online articles.
If there were a recognized authoritative book user in Fluidinfo (perhaps isbn.org or amazon.com), we might be able to use the presence of one of its tags as an indicator. For example, if isbn.org were known to tag all (or enough) books with tags including isbn.org/book/title and isbn.org/book/author then we would be able to use something like
(has isbn.org/book/title and rustlem/rating > 8)
except terrycojones/has-read
But this obviously depends on the existence of such an authoritative user.
If we had wildcarding on namespaces, we might also accept something like
(has */book/title and rustlem/rating > 8)
except terrycojones/has-read
which would certainly add to the breadth of results, but might introduce quite a lot extra noise, especially once the system has a lot of users.

2. Pulling Back the Right Information¶

The second interesting issue with Terry’s query is how you get the right information back. In the original Fluidinfo API, all the query could return for you is the object ID; you’d then have to browse the object and look at its tags to understand what it is.

The current API offers richer alternatives, allowing objects to be specified by about tag and allowing particular tags to requested for an object matching a query, including the about tag.

If the about tag itself contains the key information (in the case of a book, the title and author, normalized) then the query can pretty much work just by specifying the objects of interest and requesting the about tag. You can actually do this today. For example, the following curl command (in which I’ve used njr instead of rustlem, since Russell hasn’t quite got around to rating anything in Fluidinfo yet) actually works:

fdb show -q '(fluiddb/about matches "book" and njr/rating > 8) except has terrycojones/has-read' /about
4 objects matched
Object 8cd6fd0a-40e1-4889-ac3f-2b3dbb6f861d:
  /fluiddb/about = "book:through the looking glass and what alice found there (lewis carroll)"
Object 1c3b1874-0413-4607-97db-74cb9c92dcbf:
  /fluiddb/about = "book:fugitive pieces (anne michels)"
Object c64aeced-1505-4bb3-ab8a-0ce4c6a70ba3:
  /fluiddb/about = "book:white teeth (zadie smith)"
Object a78d77ce-a055-40e3-97a9-de4223858bd8:
  /fluiddb/about = "book:nineteen eighty four (george orwell)"

It should be noted, however, that it works as well as it does partly because all the books I have rated have about tags in my preferred (book-1) convention, so that the title and author are obvious and the class (book) is in the about tag.

I would argue that this is pretty good and meaningful, though of course we would also want information to be stored in separate tags for the title, author and other useful things like year etc. This is true for most, if not all, of the books matched here, but you have to know that the title and author information is under the username miro. Again, trusted authoritative users with known conventions will help here, as might wildcarding on namespaces.

The wiki user would also help. It turns out that three of the four books this query have title and author information under the miro/books namespace, so if you add these tags to the fdb request, you get most of what you want.

fdb show -q '(fluiddb/about matches "book" and njr/rating > 8) except has terrycojones/has-read' /about /miro/books/title /miro/books/author
4 objects matched

Object 8cd6fd0a-40e1-4889-ac3f-2b3dbb6f861d:
  /fluiddb/about = "book:through the looking glass and what alice found there (lewis carroll)"
  /miro/books/title = "Through the Looking-Glass, and What Alice Found There"
  /miro/books/author = "Lewis Carroll"

Object 1c3b1874-0413-4607-97db-74cb9c92dcbf:
  /fluiddb/about = "book:fugitive pieces (anne michels)"
  (tag /miro/books/title not present)
  (tag /miro/books/author not present)

Object c64aeced-1505-4bb3-ab8a-0ce4c6a70ba3:
  /fluiddb/about = "book:white teeth (zadie smith)"
  /miro/books/title = "White Teeth"
  /miro/books/author = "Zadie Smith"

Object a78d77ce-a055-40e3-97a9-de4223858bd8:
  /fluiddb/about = "book:nineteen eighty four (george orwell)"
  /miro/books/title = "Nineteen Eighty-four"
  /miro/books/author = "George Orwell"

If the wiki user existed, this might be even more natural:

fdb show -q '(wiki/class matches "book" and njr/rating > 8) except has terrycojones/has-read' /about /wiki/title /wiki/author
4 objects matched

Object 8cd6fd0a-40e1-4889-ac3f-2b3dbb6f861d:
  /fluiddb/about = "book:through the looking glass and what alice found there (lewis carroll)"
  /wiki/title = "Through the Looking-Glass, and What Alice Found There"
  /wiki/author = "Lewis Carroll"

Object 1c3b1874-0413-4607-97db-74cb9c92dcbf:
  /fluiddb/about = "book:fugitive pieces (anne michels)"
  /wiki/title = "Fugitive Pieces"
  /wiki/author = "Anne Michels"

Object c64aeced-1505-4bb3-ab8a-0ce4c6a70ba3:
  /fluiddb/about = "book:white teeth (zadie smith)"
  /wiki/title = "White Teeth"
  /wiki/author = "Zadie Smith"

Object a78d77ce-a055-40e3-97a9-de4223858bd8:
  /fluiddb/about = "book:nineteen eighty four (george orwell)"
  /wiki/title = "Nineteen Eighty-four"
  /wiki/author = "George Orwell"

(Note: the above query does not work; there is no wiki user at present.)

3. Tag Conventions¶

As well as the considerations around object class and pulling back the desired information, there is a final requirement of knowing what tags actually to include in the query, and what they mean. I’ve written about this before, but will recap briefly.

Essentially, we need to know

what the tag is called

what range of values it uses and which end of the scale is better.

For a particular friend, this is not too hard (but is still much easier if everyone uses the same conventions), but it becomes even more useful if essentially everyone uses the same conventions. If everyone uses the same conventions we can imagine a future API might support queries like

(fluiddb/about starts-with "book:" and */rating > 8)
except terrycojones/has-read

or perhaps

(fluiddb/about starts-with "book:" and [njr, rustlem, ntoll]/rating > 8)
except terrycojones/has-read

or even

(fluiddb/about starts-with "book:" and mean(*/rating) > 8)
except terrycojones/has-read

if we simply trust that everyone is using a 0–10 scale for ratings. If we are worried about individuals skewing the system by using ratings on a 0–1,000,000, by adding some query complexity we can even imagine filtering those out, though I concede this will always be somewhat painful.

The Challenge¶

It’s the same old themes, but here I’m trying to illustrate just how directly all the Fluidinfo themes I keep harping on about relate to some of the most fundamental motivations for bulding this system in the first place. For those who believe all the stuff about conventions is a distraction that doesn’t matter, my challenge is: how else are we going to allow Terry’s query actually to work in Fluidinfo?

One possible answer lies within applications. Clearly, if a FluidBook application appears, that has its own conventions (whatever they may be) for storing information in Fluidinfo, it may be easy to get consistency within the data from that particular application. But Terry’s real dream involves allowing users to choose which applications to use and for data to be shared seamlessly across those applications. Again, it’s obvious that a few applications can agree conventions among themselves, but the most universal way of supporting interoperability is if conventions just exist at the level of Fluidinfo itself.

A wiki user might also provide another way forward.

05 April 2011

Pretty Good Uniqueness

Software developers are neurotic about uniqueness—no two files may share the same path, no two users the same ID. That’s probably good: we like money and email to go to right person.

Over in the Real World™, people are more relaxed. We tolerate quite a lot of ambiguity, relying partly on context to remove it, and partly on clarification when necessary–“Paris, France, not Paris, Texas”. We even tolerate a certain level of confusion and error as a reasonable price to pay for not always having to refer to each other by National Insurance number.

Terry Jones (not the Python, nor the Qu’ran burning pastor, but @terrycojones, the unorthodox visionary behind Fluidinfo) frequently says that he wants to make working with information in computers more like working with information in the Real World™. It’s a useful goal.

Almost from the first moment I heard about Fluidinfo, with its model of information sharing based on tagging common objects, I’ve been interested in (some might might say obsessed with) the question of how to map Real-World™ objects and concepts (like Paris, Animal Farm, The Eiffel Tower, Existential Philosophy and the ring on my finger) to Fluidinfo objects, romantically identified, as they are, by 128-bit integers (hubristically so-called ‘universally unique identifiers’ [UUIDs]) such as 6387ab3f-e3d5-4ca9-bd13-ae3f-fd9c1830.

Fluidinfo’s about tag (fluiddb/about, to give it its full name) was created specifically to make it easier to decide where to put information in Fluidinfo. Every object in Fluidinfo, when it’s created, can optionally have this about tag set to a unicode string and Fluidinfo guarantees that about tags are unique, i.e. that no two different objects will ever share an about tag. As a result, you can directly address objects in Fluidinfo by specifying an about tag. For example, http://fluiddb.fluidinfo.com/about/Paris is the URL for the Fluidinfo object with the about tag “Paris” (UUID 17ecdfbc-c148-41d3-b898-0b5396ebe6cc, since you ask).

Fluidinfo, by Terry’s very specific design, does not force anyone to use about tags in any specific way. Any Fluidinfo user can attach any information to any Fluidinfo object she likes. If user jacqui decides to attach information about Paris, Texas to the Paris object above, and gemma chooses to use it to store information about Paris, France, that is entirely fine. It’s even fine of Fluidinfo user anarchist decides to store information about Birmingham (Alabam), or existential philosophy, or her entire record collection on the same object. There will be no one from Fluidinfo complaining or banning or undoing (though it’s possible that those with acute hearing may perceive a quiet “tsk, tsk” sound emanating from the author of this blog).

I believe, however, that most Fluidinfo users will want there to be conventions for about tags that will encourage information about the same thing to be stored on a well-defined common object, and for information about different things to be stored on different objects. Of course, we won’t always get those conventions right first time, and they will evolve over time, but my feeling is that a few hours of thinking can avoid many, many hours of trial error. The question is: what should those conventions be?

My feeling is that what we need to aim for is “pretty good uniqueness”, a concept that might be compared loosely to “pretty good privacy” or “probabilistically approximately complete” learning. I don’t have a formal definition, nor even a very good rule of thumb, but I think we need to aim for a set of about tag conventions that are easy to use and which mean that collisions are very rare, but that we should not aim for absolute uniqueness, as to do so would lead inexorably to conventions that are much less appealing to humans. In other words, we should aim to make about tag conventions lie in a sweet spot somewhere between the computer programmer’s “absolute, guaranteed, uniqueness in all circumstances” and the Real-World™, human-style “let’s not worry about it too much and just deal with collisions when they occur”.

The nearest I have to a rule of thumb is that when you’re uploading a reasonably large quantity of data to Fluidinfo (say, some tens of thousands of objects), most of the time, you should not encounter a conflict. I’m not sure how to quantify this. If 1% of items have conflicting about tags, I’m pretty clear that this is much to high a collision rate. And I’m pretty clear that 1-in-a-billion is OK. My guess is that it is probably good enough to aim for collision rates below about 1-in-a-million. But that’s just a feeling.

This can be made more concrete with some examples. One convention I suggested that seems to be being used quite widely and successfully is for books (as works, rather than individual editions, printings etc.). The basic form of this is to combine a ‘book:’ prefix with a normalized title and author. The normalization aims to remove ambiguity with case, punctuation etc., to make it more likely that different people will arrive at the same about tag, without significantly affecting uniqueness or legibility. So an example about tag for a book is:

book:nineteen eighty four (george orwell)

Notice that the (troublesome) hyphen that we would normally include when writing “nineteen eighty-four” has been removed, as have capitals (there’s a library available to do the standardization, which can be used in python) or online.)

[The original version of the convention (book-1) also removed all accents from letters in an effort to reduce further the likelihood of minor variations; however, when Nicholas Tollervey (@ntoll) and Terry started publishing large volumes of book data that included some non-European names it became clear that this convention sometimes went a normalization too far, so the (so-far undocumented) book-u variant convention was born, in which letters are mapped to lower case, but accents are preserved. (This is supported in the python library, but not yet in the web app.)]

These conventions for about tags for books seem to me to hit the sweet spot I was talking about. Book titles, alone, are definitely not sufficiently unique in two different respects: first, it is not uncommon for different authors to write books with the same title; secondly book titles (alone) are frequently shared with other (non-book) items, like films, people, places etc. However, by combining a prefix (book:) that specifies the class of object, together with the title and the author (all normalized), we get something that feels, for practical purposes, pretty good uniqueness. I would be surprised if there are not examples of pairs of books that share both author and title, but I suspect those are so rare that they will cause us little trouble and (personally) feel quite content to do some ad hoc disambigation to handle those cases.

Indeed, the pattern of a class prefix, a main identifier, and a disambiguator, feels like a useful pattern for many kinds of Real-World™ entities to me. I’ve been discussing films, for example, with Michael Hawkes, in the comments on another blog post, and there is seems that using either film:title (year) or film:title (director) will probably work well. Again, there might be cases in which two directors sharing a name produce films of the same name, or in which two films of the same name are produced in the same year, but these seem likely to be so rare that ad hoc disambiguation of those cases might be acceptable. It is also, of course, not a coincidence that in the real world films are often identified by title and year or title and director. Michael and I both lean toward year as probably the better disambiguator, so I suspect I will soon be proposing film:title (year) as a convention; though American readers might prefer a “movie:” prefix.

For me, the other great virtue of this style of about tag is that it is very easy to construct the canonical about tag using only information that the user might reasonably expect to have at hand, rather than depending on some kind of external lookup. To labour the point, if I want to tag a book, I probably know the title and author, and can certainly find that information in the book. With a film, I concede, it would be less unusual to know the title but not the year or director, but even there, this data is easily available from multiple sources, crucially including from the film itself.

Perhaps unsurprisingly, there are those who feel that the whole notion of trying to organize, specify, or guide conventions is objectionably authoritarian and/or pointless, and that it would be much better simple to see what emerges organically. (Terry has been known to accuse me of “fascist librarian” tendencies, though I sure he means it in the nicest possible way.) Terry and I both studied so-called genetic algorithms, in which evolutionary processes are simulated on computers to tackle search and optimization tasks, and we are both impressed with the power of evolutionary mechanisms. I, however, fear that Fluidinfo doesn’t have the luxury evolutionary timescales to succeed, and therefore tend to favour trying to help evolution along a little. If you don’t, just ignore all this, do your own thing, and pay no attention to the annoying tsking from Scotland.

19 January 2011

The Music of FluidDB I: Albums, Tracks and Songs

I have been thinking for a while about what conventions for tagging kinds musical entities in FluidDB. The kinds of things I have in mind include recordings of music, pieces of music (compositions), artists and composers. My firmest conclusion so far is that it’s complicated and I can’t tackle it all in one go.

In particular, classical music feels very complicated to me, with a common situation for a classical “record” being recordings of several pieces with somewhat variable names, often by different composers, being played often by a somewhat fluid and ambiguous collection of musicians.

In this post, therefore, I’m going to try to tackle what feels like a simpler problem by restricting myself to considering non-classical music and three kinds of entities—albums, tracks and songs.

My basic suggestion is to adopt conventions very similar to those I have been championing for books, in the form of the book-1 convention.

Books (Recap)¶

Recall that book-1 convention for about tags for books in English has the following basic components:

the prefix book:

the title of the book, normalized using NACO-like conventions, which standardize to lower case, remove most punctuation and accents and regularize spacing;

the author, again normalized in a NACO-like manner, in parentheses.

For example, Alice in Wonderland, by Lewis Carroll, uses the about tag

book:alice in wonderland (lewis carroll)

So far this convention seems to have worked quite well. Its virtues include:

it is simple to construct with only easily available information (the stuff you can see if you have the book or a normal reference to it)

it is unique for the almost all books

it is clearly identified as a book (and thus disambiguated from a film, for example).

The next stage beyond a single-author book is multi-author books, and there the convention is simply to list the authors, in the order they appear on the book, separated by semicolons. For example, The Feynman Lectures on Physics, by Richard P. Feynman, Robert B. Leighton and Matthew Sands uses the about tag:

book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew sands)

Albums, Tracks and Songs¶

Recorded non-classical music consists primarily of albums—a named collection of tracks, normally purchased together—and individual tracks, sometimes known as singles or songs.

At the simplest level, the conventions I am going to propose for about tags for albums and tracks are very similar to those for books but using the prefixes album: and track:. So the album, The Dark Side of the Moon, by Pink Floyd, is

album:the dark side of the moon (pink floyd)

and the track The Great Gig in the Sky, from that same album, is

track:the great gig in the sky (pink floyd)

But there are number of points to discuss.

Albums¶

The suggested about tag for albums is fairly straightforward. The main complication/ambiguity I can see concerns multi-volume sets. So, on vinyl, for example, Neil Young’s Decade has three disks; and it is a double CD. This is quite an easy case: I think we ignore the ‘disk’ number entirely where an just regard double and triple albums as albums. So all of Decade is:

album: decade (neil young)

For multi-volume collections that are normally sold separately, simply include the volume number. So, for example, The Tatum Group Masterpieces Volume 1, by Art Tatum, Benny Carter, Louis Bellson, becomes

album:the tatum group masterpieces volume 1 (art tatum; benny carter; louis bellson)

The NACO-like normalization conventions were described in this post and are implemented in the abouttag library.

The handling of artists is in principle quite simple, though in practice slightly hard to automate completely. My suggestion is that whenever there is a list of musicians, as with authors, they are simply separated with semicolons (and a space); any ampersands or ands are removed. In the case of groups, the group name is simply used. The interesting and slightly troubling cases are those where a group combines with person. The most common case of this is exemplified by Diana Ross and the Supremes. My suggestion is that such cases are left intact, other than normalization, using ‘and’ rather than ampersand (&). So the album “Reflections” becomes

album:reflections (diana ross and the supremes)

There are probably awkward corner cases, but I think this handles most.

The biggest problem I foresee is that it will hard to automate the construction of the standard form of an artist from something like iTunes metadata because the input (from Gracenote) doesn’t separate out a list of artists in any remotely consistent way, so I think standardizing them will require a degree of human intervention. This is not, however, in any way particular to this suggested convention; it’s fundamentally to do with the fact that some artists identified as a list of people, and others have a group name, and telling these apart is hard, even without complication such as the band Alice Cooper!

Here are a few examples of the sorts of album about tags I’m suggesting:

The Black Balloon, by John Renbourn album:the blank balloon (john renbourn)

The Composer, by Thelonious Monk album:the composer (thelonious monk)

Fleetwood Mac, by Fleetwood Mac album:fleetwood mac (fleetwood mac)

Wu Wei, by Pierre Bensusan album:wu wei (pierre bensusan)

The Tatum Group Masterpieces Volume 1, by Art Tatum, Benny Carter, Louis Bellson album:the tatum group masterpieces volume 1 (art tatum; benny carter; louis bellson)

Ms. Right, by Duck Baker album:ms right (duck baker)

‘Round About Midnight, by The Miles Davis Quintet album:round about midnight (the miles davis quintet)

A Matter Of Time, by Gordon Giltrap & Martin Taylor album:a matter of time (gordon giltrap; martin taylor)

Musiques / Solilaï, by Pierre Bensusan album:musiques solilai (pierre bensusan)

Live Au New Morning, by Bensusan & Malherbe album:live au new morning (bensusan; malherbe)

Eye To The Telescope, by KT Tunstall album:eye to the telescope (k t tunstall)

Grace & Danger, by John Martyn album:grace & danger (john martyn)

Alas, I Cannot Swim, by Laura Marling album:alas i cannot swim (laura marling)

Lady In Autumn: The Best Of The Verve Years, by Billie Holiday album:lady in autumn the best of the verve years (billie holiday)

Tracks¶

I was originally minded to suggest using song: as the prefix for individual album tracks, notwithstanding the fact that this is slighty inappropriate for instrumental pieces. This was until I realised that we will certainly want to have entries for songs themselves (independent of artist) in FluidDB. Given this, I think we have little choice but to fall back to track, which is more perhaps more appropriate anyway.

I think there are couple of points to made about tracks. The first is that I do not propose to tie them to albums. Thus if an artist records a track (piece/song), I suggest that in the common case we don’t distinguish between different records. When you talk about Billie Holiday’s recording of God Bless the Child, you actually talk about all her records of that song, in the general case.

track:god bless the child (billie holiday)

Similarly, if, as is quite common, a track is qualified by (live) or [live], I suggest that be omitted in the standard case.

The other reasonably common complication, particularly for folk music, is the medley. In this case, my suggestion is just hand the track name to the NACO-like normalization routine and use what it produces. In most cases, this works fine.

To try to illustrate lots of common cases, here is a fairly long list of examples:

Rhythm-a-Ning, by Thelonious Monk track:rhythm a ning (thelonious monk)

Round Midnight, by Thelonious Monk track:round midnight (thelonious monk)

Straight, No Chaser, by Thelonious Monk track:straight no chaser (thelonious monk)

Bourrée I and II, by John Renbourn track:bourree i and ii (john renbourn)

Medley: The Mist Covered Mountains of Home / The Orphan / Tarboulton, by John Renbourn track:medley the mist covered mountains of home the orphan tarboulton (john renbourn)

Monday Morning, by Fleetwood Mac track:monday morning (fleetwood mac)

Poussière d’Amants, by Pierre Bensusan track:poussiere damants (pierre bensusan)

Doherty’s - Return to Milltown - Tommy People’s, by Tony McManus track:dohertys return to milltown tommy peoples (tony mcmanus)

Jackie Coleman’s - The Milliner’s Daughter - Rakish Paddy - Connor Dunn’s, by Tony McManus track:jackie colemans the milliners daughter rakish paddy connor dunns (tony mcmanus)

Blues in C, by Art Tatum, Benny Carter, Louis Bellson track:blues in c (art tatum; benny carter; louis bellson)

S’Wonderful, by Art Tatum, Benny Carter, Louis Bellson track:swonderful (art tatum; benny carter; louis bellson)

Makin’ Whoopee, by Art Tatum, Benny Carter, Louis Bellson track:makin whoopee (art tatum; benny carter; louis bellson)

(I’m Left With the) Blues in my Heart, by Art Tatum, Benny Carter, Louis Bellson track:im left with the blues in my heart (art tatum; benny carter; louis bellson)

The Nine Maidens a. Clarsach b. The Nine Maidens c. The Fiddler, by John Renbourn track:the nine maidens a clarsach b the nine maidens c the fiddler (john renbourn)

Ms. Right, by Duck Baker track:ms right (duck baker)

‘Round Midnight, by The Miles Davis Quintet track:round midnight (the miles davis quintet)

Ah-Leu-Cha, by The Miles Davis Quintet track:ah leu cha (the miles davis quintet)

Across The Pond, by Gordon Giltrap & Martin Taylor track:across the pond (gordon giltrap; martin taylor)

G & T Blues, by Gordon Giltrap & Martin Taylor track:g & t blues (gordon giltrap; martin taylor)

Abide With Me / Old Gloryland, by Stefan Grossman & John Renbourn track:abide with me old gloryland (stefan grossman; john renbourn)

Badhra, by Anouar Brahem, John Surman, Dave Holland, track:badhra (anouar brahem; john surman; dave holland)

Biodag Aig Mac Thomais/The Nine Pint Coggie/The Spike Island Lasses, by Tony McManus track:biodag aig mac thomais the nine pint coggie the spike island lasses (tony mcmanus)

Three Pieces By O’Carolan;The Lamentation Of Owen Roe O’Neill; Lord Inchiquin; Mrs Power (O’Carlan’s Concerto), by John Renbourn track:three pieces by ocarolan the lamentation of owen roe oneill lord inchiquin mrs power ocarlans concerto (john renbourn)

Heman Dubh, by Pierre Bensusan track:heman dubh (pierre bensusan)

Le Voyage pour L’Irelande, by Pierre Bensusan track:le voyage pour lirelande (pierre bensusan)

50 Ways To Leave Your Lover, by Paul Simon track:50 ways to leave your lover (paul simon)

La Danse Du Capricorne 1, by Pierre Bensusan track:la danse du capricorne 1 (pierre bensusan)

Reels - “The Pure Drop”/”The Flax In Bloom”, by Pierre Bensusan track:reels "the pure drop" "the flax in bloom" (pierre bensusan)

Mille Vallées, by Bensusan & Malherbe track:mille vallees (bensusan; malherbe)

Bamboo Shoot (Improvisation), by Bensusan & Malherbe track:bamboo shoot improvisation (bensusan; malherbe)

Black Horse And The Cherry Tree, by KT Tunstall track:black horse and the cherry tree (k t tunstall)

Universe & U, by KT Tunstall track:universe & u (k t tunstall)

Sigmund Freud’s Impersonation Of Albert Einstein In America, by Randy Newman track:sigmund freuds impersonation of albert einstein in america (randy newman)

Mr. President (Have Pity On The Working Man), by Randy Newman track:mr president have pity on the working man (randy newman)

I Love L.A., by Randy Newman track:i love l a (randy newman)

The Blues, by Randy Newman track:the blues (randy newman)

Through-Us-All, by Isaac Guillory track:through us all (isaac guillory)

A Terrible Pickle, by Dean Friedman track:a terrible pickle (dean friedman)

Money, by Pink Floyd track:money (pink floyd)

Take Five, by Dave Brubeck Quartet track:take five (dave brubeck quartet)

Pirates (So Long Lonely Avenue), by Rickie Lee Jones track:pirates so long lonely avenue (rickie lee jones)

The Returns, by Rickie Lee Jones track:the returns (rickie lee jones)

Chuck E’s In Love, by Rickie Lee Jones track:chuck es in love (rickie lee jones)

Harry’s House/Centerpiece, by Joni Mitchell track:harrys house centerpiece (joni mitchell)

I’s A Muggin’ (Rap), by Joni Mitchell track:is a muggin rap (joni mitchell)

Miles Beyond, by Mahavishnu Orchestra track:miles beyond (mahavishnu orchestra)

A Surfer Courted Me, by Martha Tilston and the Woods track:a surfer courted me (martha tilston and the woods)

Lookin’ On, by John Martyn track:lookin on (john martyn)

The Captain And The Hourglass, by Laura Marling track:the captain and the hourglass (laura marling)

Le Chien Sur Les Genoux de la Devineresse, by Anouar Brahem, Barbaros erkose, Kudsi Erguner & Lassad Hosni track:le chien sur les genoux de la devineresse (anouar brahem; barbaros erkose; kudsi erguner; lassad hosni)

A Prayer, by Madeleine Peyroux track:a prayer (madeleine peyroux)

Was I?, by Madeleine Peyroux track:was i (madeleine peyroux)

(I Got A Man Crazy For Me) He’s Funny That Way, by Billie Holiday track:i got a man crazy for me hes funny that way (billie holiday)

Lover Man (Oh, Where Can You Be?), by Billie Holiday track:lover man oh where can you be (billie holiday)

St. Louis Blues, by Billie Holiday track:st louis blues (billie holiday)

Songs¶

[UPDATE 2011/01/19: I have modified this recommendation since it was first posted, after thinking more about the lack of consistency in how composers are identified.]

I have given less thought to songs (as distinct from tracks, or recordings of songs), but the obvious convention would seem to be to use the song: prefix, followed by the normalized song title, followed by the composer or composers in brackets, again in whatever order they are normally listed. The only real complication I can see there is the fairly common case in which music and lyrics are given separate credits. In that case, I think I suggest simply listing the music composer ahead of the lyrics composer.

The slightly subtle question concerns ow to standardize the composer’s name. I the case of artists (and authors) my normal recommendation is to start from the name as it appears on the work, so John Martyn, J. D. Salinger etc. This works well because you just have to look at the work to see how it is written; and for this reason, there’s a well-defined, standard place to look (the work).

Composers are more awkward, because it is much less clear where to look. If you own a record, the easy thing to do is to look at the sleeve, or the liner notes, or sometimes on the record (or CD) itself. But the same song can be recorded many times and the composer won’t always be displayed consistently. You could also look at the sheet music. Or in Wikipedia. In short, there is no consistency. A quick look through the first half dozen make it clear there’s not even consistency on a single CD in many cases.

In this case, therefore, my recommendation is to use surnames only. So in a simple case, Summertime by George Gershwin, is

song:summertime (gershwin)

The Lennon/McCartney partnership would produce, for example

song:hey jude (lennon; mccartney)

A case in which lyrics and music are credited separately would be Officer Krupke, from Westside Story, by Leonard Bernstein (music) and Stephen Sondheim (lyrics). So this would be:

song:office krupke (bernstein; sondheim)

The reason I’ve gone for surname only is that it seems to involve very little loss of precision (it will be rare indeed for two songs with the same title to have different composers with the same surname but different forenames), and to use the smallest amount of information that is commonly available. I think this is probably a fairly good convention.

Comments Invited¶

As ever, I’d be interested in thoughts from anyone, in the blog comments or directly. I haven’t pushed an updated version of the abouttag library containing these to github yet, but will probably do so in a few days unless there is significant push-back.