Comments on About Tag: The Guardian 1000 Novels Everyone Must Read

The "canonical" (persistent) identifier ...

2010-01-28T21:13:43.857+00:00

The "canonical" (persistent) identifier in the publishing industry is increasingly the Digital Object Identifier (DOI); see http://doi.org

The fascinating thing for me about FluidDB is that a few of us active in the DOI/Handle System community over the last decade++ have imagined associating metadata with objects in this fashion, but the infrastructure has been too heavy. With FluidDB, it isn't!

How does FluidDB help with the recommendation side...

2009-09-14T18:26:47.494+01:00

How does FluidDB help with the recommendation side of this ("which of your friends seem to like the same novels as you")? The article's great as far as it goes, but I'm intrigued how "this is exactly the kind of thing FluidDB was built for" is the case, and how it could help me to answer similar questions.

Yup, I encountered that "abstract tag" c...

2009-08-31T02:42:04.151+01:00

Yup, I encountered that "abstract tag" construct right after posting my previous comment (I'm bitsucker, btw). I agree that this would be a natural place to identify the datatype for a tag.

But I still think this just begs the question: if different apps design their own mechanisms for describing their data-types, third-party mashups will be forced to develop and maintain "interop glue" for each new convention.

Perhaps this is just me leaning towards the "librarian end" myself, but my intuition is that some sort of standard for schema metadata would be hugely beneficial. Whether it is designed into the DB itself, or adopted by convention "after market," someone will need to establish a canonical way to encode datatypes -- to say, for example, "this is a value from 0-10 indicating the owner's opinion of the quality of the tagged object." Without this, the data will necessarily be unstructured soup.

If you agree with that, then I further suggest that it'd be preferable to at least offer a suggested "canonical encoding" for this metadata. Even an RFC with a starting proposal for this could help to herd the community towards an eventual standard. You have the opportunity now, in the beginning, to guide the users in how they construct and annotate their data. If you wait and hope that the community will converge on its own standard, it may be much less likely to succeed.

I should confess, of course, that I'm just getting started in reading about your project and its community; I may be unaware of some discussion that's already been shared in this area. If so, please set me straight! And congrats on coming this far with a very exciting project; I hope it lives up to its full potential.

bitsucker: I agree with most of this. I'm no...

2009-08-30T22:43:04.512+01:00

bitsucker: I agree with most of this. I'm not sure it requires an architectural change though. There is an object corresponding to each "abstract tag" in FluidDB (an abstract tag, in my terminology, being roughly the set of tags sharing a name). So if you use a tag bitsucker/rating, there is an object corresponding to the (abstract) tag bitsucker/rating, and you could attach a schema specifier on there.

There's a broad range of view on the importance of taxonomies in FluidDB. Personally, I veer towards the "librarian end", but others are more relaxed.

Anyway, thanks for the comment...we shall see how it evolves.

Though I'm sure this isn't novel, I'm ...

2009-08-30T18:39:28.560+01:00

Though I'm sure this isn't novel, I'm compelled to highlight that the challenge discussed in these comments -- the lack of (and need for) a truly canonical identifier for books -- applies to just about every kind of entity you might want to annotate in a db. This has been the source of my primary skepticism about the fluiddb concept since I first started thinking about it a few weeks ago: how can an open data-store be socially useful without standardized formulae for identifying the records?

In fact, this same problem arises for ALL fields in the DB: both in the tag names (why "rating" instead of "score?") and in their values (I might pick a rating scale 1-5, like many popular review websites, and pollute your rating data).

I anticipate that the social value offered by something like fluiddb will depend heavily on the availability (and adoption) of a supporting system for specifying and referring to SCHEMA descriptions. Something like XSD, allowing applications to describe the conventions employed for storing specific types of data. This may also require an additional data axis in the db, to allow a specific tag to be annotated with its datatype from a given schema.

Without some sort of metadata store like this, I fear your "open db" may quickly devolve into an unmanagable, unstructured mess with no more interoperability than today's "open web."

Comment as much as you like Owen: it's all goo...

2009-08-27T09:44:31.970+01:00

Comment as much as you like Owen: it's all good stuff.

But detailed replies will have to wait: I need to work.

I'll check out Library Thing though.

Leaning towards

about="The Name of the Book//The Book's Author"

as a temporary convention till the librarians of the world get their act together :-)

Thanks for all the input.

Final comment! (I promise) Obviously you weren'...

2009-08-27T09:36:01.108+01:00

Final comment! (I promise) Obviously you weren't setting out to solve all these problems, but rather do some stuff with FluidDB and add some interesting social functions to this list of books!

You asked about automatic grabbing of ISBNs. Most library systems support m2m interface. You could look at the WorldCat API (http://www.oclc.org/productworks/worldcatapi.htm), or possibly the LibraryThing API (http://www.librarything.com/services/webservices.php), or finally COPAC (catalogue from major research Unis in the UK) can return records in an XML format - http://copac.ac.uk/development-blog/tag/api/

I think you'd find LibraryThing is the best match to the kind of thing you want to do, although not sure about the T&C on their API (generally the guy behind LibraryThing seems pretty open to doing interesting stuff, but he has got a business to run!)

Oh - forgot the LibraryThing URL http://www.librar...

2009-08-27T09:27:39.415+01:00

Oh - forgot the LibraryThing URL http://www.librarything.com/

Also, you might be interested in this paper from Rob Styles et al on RDF representations of book data. Specifically the bits on creating URIs as identifiers might possibly be of relevance:

http://dynamicorange.com/uploads/Semantic%20Marcup.pdf
(Rob Styles works for http://www.talis.com who do lots of semantic web/linked data stuff but also do library systems)

Owen. Yes; it;s more of a minefield than I realis...

2009-08-27T09:27:28.587+01:00

Owen. Yes; it;s more of a minefield than I realised. Sanghyeon Seo pointed me at FRBR too. I'll take a look.

I wonder whether it's too fanciful to think that FluidDB could actually help with its 256-bit immutable object IDs.

Maybe we do just establish a convention for describing a work and then use the FluidDB ID as the reference code. Clearly there'd be a major problem defining the canonical form for a work, but starting with title and author in an agreed (algorithmic) encoding would be a start. Obviously, accents, punctuation, editions etc. would all be issues.

But that's probably simplistic.

I stepped into a minefield!

Thanks for the input.

Libraries and associated bodies have been dealing ...

2009-08-27T09:17:26.518+01:00

Libraries and associated bodies have been dealing with managing book data for quite a while! The current thinking in this area is something called 'FRBR' - which defines different levels of description in terms of the following entities:

Work
Expression
Manifestation
Item

(see http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records)

In this model, the Guardian list is a list of works, whereas ISBNs are part of a specific manifestation (as would be other details specific to a published edition).

However, the problem is that there is no agreed identifier for works.

You could look at:

LibraryThing - this has an API, and groups books into 'works' bringing together the various editions
FictionFinder - from OCLC (a library organisation), which presents 'work' level records for books (user interface at http://fictionfinder.oclc.org/ and some more information at http://www.oclc.org/research/projects/frbr/fictionfinder.htm)
Open Library - http://openlibrary.org/about/frbrization

To be honest it's tricky stuff, but there are people around doing a lot of work on it

Incidentally, I hope people have noticed that Flui...

2009-08-27T08:24:05.800+01:00

Incidentally, I hope people have noticed that FluidDB has allocated an ID for HHGTTG that includes a 42 hex pair, as well as an (admitttedly non-byte-aligned) 2a, which is 42 in hex.

It's scary.

Holger: These are all good points. I'll tak...

2009-08-27T08:02:41.300+01:00

Holger:

These are all good points. I'll take them in turn.

1. I agree (now) that the ISBN number isn't as good as I thought, for exactly the reason you state, namely that there is a many-to-one mapping from isbn numbers to (conceptual) books. I hadn't really realised this; at least not clearly enough. That's a huge problem with my scheme.

2. I know about ISBN10 vs. ISBN 13, but that worries me less. But you're right, it's an issue.

3. Formatting. Me culpa. The truth is, I took the format from Amazon and assumed it was standard (though if I'd thought about it, I'd have realised it wasn't). In general, I'm a big believer in software accepting separators in numbers because it makes them so much easier for humans. But I should have found the standard. You're right, there's definitely a case for makeing the tag without any separators, though in fact I think I'm more likely to go for separators in standard places, assuming there is a universal standard for ISBN-13 formatting. (I don't mean universally used; just universally applicable).

Lots to think about and all good points. It's early days, and fortunately (since these aren't global standards and its so early) it's easy for me to edit my recommendations.

The fundamental problem you point out (the non-uniqueness of ISBN numbers) is a big deal, and I may decide to try to find something better. Of course, it could be that this "something better" is a FLuidDB object ID, and that about tag really should be set to some title/author combination in standard form. (And yes, I realise all the problems with that "standard form"...)

Thanks for the input. I'll think a little before putting up the other 990, which will take me a couple of days anyway, I think.

What I am unlikely to be deflected from is the attempt to find a meaningful about tag that identifies books, because in my my view, that is pretty much necessary in order for cross-user queries to work reliably and take off.

David --- thanks!

2009-08-27T07:51:52.061+01:00

David --- thanks!

Thank you for this article describing your view of...

2009-08-27T07:35:39.859+01:00

Thank you for this article describing your view of tagging the universe.

Unfortunately this example also shows the difficulties of this endeavour, namely the discoverablity of objects via their about tag.
In this case you used the ISBN number which, while I cannot offer any better attribute of books, is not very suitable.
Firstly, your formatting included one hyphen. Why one? ISBN numbers are highly structured (read up on it on Wikipedia) and hyphens are usually used as separators between the parts of an ISBN but not everywhere and not consistently. So maybe it would have been better to just drop all hyphens and make it easier for others to discover the object even if they don't know how exactly you entered it.

Secondly, there are two kinds of ISBN -- 10 digit and 13 digit. IIRC I have seen books that had both printed on them. Now, as I understand it, you can convert a 10-digit version to a 13-digit by prefixing it with "987" but will users trying to find the object for their book know that?

Thirdly and most importantly ISBNs identify a publisher's version of the work. I hope we can agree that Mr Douglas's classic work is basically the same (at least for the purpose of general review that you outlined), no matter if you have the British version, the US version, a version part of an omnibus edition, etc. But all these will have different ISBNs (case in point: my HHGTTG copy's number is 0-671-52721-5).

So this begs the question if you should have made the ISBN a regular tag (list valued so one can associate more than one ISBN with a work)?

Thanks for that Nick, very useful. The most intere...

2009-08-27T01:41:03.077+01:00

Thanks for that Nick, very useful. The most interesting part for me was your attempt to set down some conventions for tag semantics and value ranges. I sense a disturbance in the (evolutionist) force, a glitch in the biological matrix, etc, etc. As the speaker said: order, order, order!