Books have held a special place in the affections and history of the FluidDB team from the very beginning and featured in a remarkably high proportion of Terry’s early motivational examples for FluidDB. Perhaps coincidentally, but perhaps not, at least five of the people involved (at least) also took a personal interest a list of 1,000 novels that everyone must read that The Guardian newspaper published in January 2009.
So it is with some pleasure that I last night finally published a dataset to FLuidDB containing those 1,000 novels, an object for each. Finally, @terrycojones,, @barshirtcliff, @rustlem, @anamosterin and I have objects on which to hang our various terrycojones/has-read and barshirtcliff/rating tags.
The first book in the Guardian 1,000 (sorted alphabetically, by author), is The Face of Another by Kobo Abe. This is tagged as follows:
> fdb tags -a "book:the face of another (kobo abe)"
Object with about="book:the face of another (kobo abe)" (id 0fe6c95d-9e6b-45c4-b228-b8fef5c42bff):
/miro/class="record"
/fluiddb/about="book:the face of another (kobo abe)"
/miro/books/surname="Abe"
/miro/books/title="The Face of Another"
/miro/books/forename="Kobo"
/miro/books/guardian-1000=True
/miro/books/year=1964
/miro/books/author="Kobo Abe"
or, graphically:
and the last is The Debacle by Émile Zola:
> fdb tags -a "book:the debacle (emile zola)"
Object with about="book:the debacle (emile zola)" (id 49cee61c-85e2-4bce-9380-0c07dc58dc86):
/miro/class="record"
/fluiddb/about="book:the debacle (emile zola)"
/miro/books/surname="Zola"
/miro/books/title="The Debacle"
/miro/books/forename="Émile"
/miro/books/guardian-1000=True
/miro/books/year=1892
/miro/books/author="Émile Zola"
[UPDATE: I should probably have mentioned when I posted this originally
that the tags command is not present in the version of fdb on github.
(Apologies for this.) I will add it, but it will take a little while.
I have two different implementation of fdb, the standalone one on github
and one that is a fully integrated part of my Miró software.
Commands lines starting with > in this posting come from the Miró
version; command lines starting with $ are from the standalone
version on github. There is actually some method to this madness,
but also some scope for confusion, for which apologies. Implemented in 1.26, now on guthub.]
Conventions¶
The convention for about tags that I’ve used is the one described in this previous posting. For what it’s worth, I’m warming to this convention, not least because (unlike the ISBN) if you know the title and author of the book, it is fairly trivial to construct the almost-always unique about tag. The python library used to generate the about tags is available at github.
Finding books in this list is fairly easy if you have any kind of access to FluidDB. Using fdb, there are various options:
Finding specifically books in the Guardian 1,000, simply look for objects with a miro/books/guardian-1000 tag set to True.
$ fdb count -q 'has miro/books/guardian-1000' 1000 objects matched Total: 1000 objectsMore generally, to find books in the books table published, look for objects with (for example) a miro/books/title tag. At the moment, this is the same set, since I haven’t added anything else; but I will!
$ fdb count -q 'has miro/books/author' 1000 objects matched Total: 1000 objectsTo find a specific book, you can obviously query the title, the author, both or use the about tag; because of the normalization on the about tag, that will often be the easiest and most reliable way of retrieving a particular book. Here, for example, are three queries, all of which find (inter alia) Zola’s La Bête Humaine:
$ fdb show -a 'book:la bete humaine (emile zola)' /about /miro/books/title /miro/books/author Object with about="book:la bete humaine (emile zola)": /fluiddb/about = "book:la bete humaine (emile zola)" /miro/books/title = "La Bête Humaine" /miro/books/author = "Émile Zola" $ fdb show -q 'miro/books/title="La Bête Humaine"' \ /about /miro/books/title /miro/books/author 1 object matched Object 58560935-d600-4921-a7d4-389e7bd068b5: /fluiddb/about = "book:la bete humaine (emile zola)" /miro/books/title = "La Bête Humaine" /miro/books/author = "Émile Zola" $ fdb show -q 'miro/books/surname="Zola"' /about /miro/books/title /miro/books/author 4 objects matched Object 58560935-d600-4921-a7d4-389e7bd068b5: /fluiddb/about = "book:la bete humaine (emile zola)" /miro/books/title = "La Bête Humaine" /miro/books/author = "Émile Zola" Object 49cee61c-85e2-4bce-9380-0c07dc58dc86: /fluiddb/about = "book:the debacle (emile zola)" /miro/books/title = "The Debacle" /miro/books/author = "Émile Zola" Object 57590e71-beff-4fab-9a4c-b23b9574dbb3: /fluiddb/about = "book:germinal (emile zola)" /miro/books/title = "Germinal" /miro/books/author = "Émile Zola" Object c5bba025-9c4a-4c3b-81df-1b7e1bed4653: /fluiddb/about = "book:therese raquin (emile zola)" /miro/books/title = "Therese Raquin" /miro/books/author = "Émile Zola"
The Table Structure¶
Like the elements from the periodic table and the planets datasets that I published previously, this dataset was published as a table straight from Miró. Unlike those datasets though, I haven’t added any record numbers, about-tag links or id links, as it seems to me that record order is completely immaterial here and I expect to update the books dataset regularly. (I’ll probably add the Orange Prize winners next, and perhaps booker winners; adding the Orange Prize winners will at least get Anne Michaels in.)
Other Notes¶
The raw data was scraped from the Guardian websites at the time of publication, as described here though they have more recently produced a definitive list themselves. I corrected a few errors, but largely the data is just as it appeared at the time.
In compiling this list, I have converted the publication year into a number, i.e. the miro/books/year tag is numeric. This required me to decide on an approach for the small number of cases in which the Guardian-supplied year was actually a range of dates; I simply took the earliest year, which obviously leads to a small loss of information. The 25 records affected were:
title | author | year |
---|---|---|
The New York Trilogy | Paul Auster | 1985-86 |
a Comédie Humaine | Honoré de Balzac | 1830-1848 |
Epileptic | David B | 1996-2003 |
Bleak House | Charles Dickens | 1852-53 |
Little Dorrit | Charles Dickens | 1855-57 |
The Count of Monte Cristo | Alexandre Dumas | 1844-45 |
Parade’s End | Ford Madox Ford | 1924-28 |
To the Ends of the Earth trilogy | William Golding | 1980-89 |
The Earthsea series | Ursula K Le Guin | 1968-1990 |
L’Histoire de Gil Blas de Santillane | Alain-René Lesage | 1715-1735 |
The Chronicles of Narnia | CS Lewis | 1950-56 |
Cairo trilogy | Naguib Mahfouz | 1956-57 |
The Fortunes of War novels | Olivia Manning | 1960-80 |
The Man Without Qualities | Robert Musil | 1930-32 |
U.S.A. | John Dos Passos | 1930-36 |
A Dance to the Music of Time | Anthony Powell | 1951-75 |
The Discworld series | Terry Pratchett | 1983- |
Remembrance of Things Past | Marcel Proust | 1913-27 |
His Dark Materials | Philip Pullman | 1995-2000 |
Gargantua and Pantagruel | François Rabelais | 1532-34 |
Happy tagging!
No comments:
Post a Comment