The perfect progress indicator (rarely seen in the wild).
The software installer progress bar
The software uninstaller progress bar
The media-download progress bar.
The software update progress bar.
No wonder so many cop out and use one of these:
* [Why non-Microsoft? Before going all HTML5 and SVG on this blog I made
a point of testing things with Internet Explorer 9 and was delighted
to find that everything seemed to work fine there. Unfortunately,
I didn’t think to test SMIL-based SVG animation, which works (you guessed it)
on Chrome, Safari, Firefox, Opera, iPhone, iPad and even newer Androids,
but not, in fact, Internet Explorer 9.]
I have added to Fluidinfo information on approximately 2.5 million books
drawn from the roughly 3 million records in the
British National Bibliography,
which documents the British Library’s Catalogue.
As ever, I have used the book-u convention
(implemented using the Python
abouttag library)
to select about tags for the objects, and have tagged the books
in Fluidinfo under the book user.
Data specific to the British National Biography (BNB) is stored in the
namespace book/bnb, while more generic data (derived from
the information contained in the Bibliography) is stored
directly in the book namespace.
Here is an example of a book that has been augmented with data
from the British National Library.
The book is George Orwell’s Animal Farm,
and it is illustrated using the
About Tag
visualizer.
(If you can’t see the picture below, upgrade to the latest version of
your browser or see
here
for information on why you might be having trouble.)
The green tags are the new ones.
Notice that, because of the careful normalization inherent in the
book-u convention, where the book is already in Fluidinfo,
the new data has generally been added to the existing object
corresponding to that book, as in the case above.
The core data that should almost always be present is:
the about tag fluiddb/about, normalized using the
book-u convention:
book:animalfarm(georgeorwell)
the book/author tag, containing the best author information
I was able to extract, in this case
GeorgeOrwell
Where there is more than one author, they are generally shown
separated by commas, with the last joined with an and (with
no Oxford Comma).
For example, The Feynman Lectures on Physics,
by Feynman, Leighton and Sands has
$ fish show -a 'book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)'
/book/author
Object with about="book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)":
/book/author = "Richard P. Feynman, Robert B. Leighton and Matthew L. Sands"
or, graphically:
The book/author tag has had a lot of processing done to it,
as described below.
the book/title field, which is usually almost identical to that
in the BNB data. In this case it is:
Animalfarm
I have not altered the capitalization, which is therefore
generally consistent with some entry in the BNB database
(though I would really prefer it were in Title Case).
the book/source tag shows where the base data was taken from.
This tag’s value is a set of strings, each of which corresponds
an entry in one of the 17 files from which the BNB data was extracted.
The entries consist of
the name of the file (always BNBrdfdcNN.xml)
where NN runs from 01 to 17
a dash -
the datestamp on that file (always 20101115 at present)
the digit zero (0) and a # sign
the record number in the file, starting from 1, with
six digits.
Since multiple bibliographic entries can correspond to the
same work, there is sometimes more than one of these.
the book/r tag is a pseudo-random floating point value with
0.0 ≤ book/r < 1.0.
Some of the raw data has also been added, with almost no cleaning up,
under the book/bnb namespace. The BNB data uses
the Dublin Core metadata standard,
and includes:
bnb/creator, which is the person or organization primarily responsible
for the creation of the work. This is sometimes blank, and is stored
as a single string value.
bnb/contributors, which is a list of contributors, sometimes
including the creator and sometimes not.
bnb/dewey is the set of Dewey Decimal classifications found on the
records corresponding to this book.
bnb/isbn is the set of international standard book numbers found on the
records corresponding to this book.
bnb/id is the set of British Library IDs found on the
records corresponding to this book. (I’m not entirely clear what
this identifier is, but it appears to be important and well populated.)
Other information is available in the data (including classification
information), and I will probably extract this and add it at a later time.
Finding, Inspecting and Tagging Books in Fluidinfo¶
There are multiple ways of retrieving book data from Fluidinfo
and of tagging it.
Probably the easiest and most general method is to go to
http://artoftagging.com and do a search that involves a
book and some keywords from the title and/or author.
A list of results should come back and you can see a visualization
of any of them by clicking the link
If you have a Fluidinfo account, you can create an account at
artoftagging.com and then save your Fluidinfo details there.
Once logged in, you will then be able to add your own tags to
any object you find.
If you just want to construct the about tag for a book,
you can do that using the
online version of
the Fluidinfo Shell, Fish.
Once there, type, for example:
fish> about book "Animal Farm" "George Orwell"
book:animal farm (george orwell)
fish> about book "The Feynman Lectures on Physics' 'Richard P. Feynman"
"Robert B. Leighton" "Matthew L. Sands"
book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)
(The quotes tell Fish that “Animal Farm” is the title and “George Orwell”
a single author.)
Alternatively, you can download and install Fish on your own machine.
(It is available from Github.)
You can then type the same commands, after fish, e.g.:
$ fish> about book "Animal Farm" "George Orwell"
book:animal farm (george orwell)
You can then use any Fluidinfo tool, including the new
Object Browser,
to work with that object, signing in with Twitter if you like.
Another easy way of finding an about tag for a book
is to find it on Amazon (US or UK, for now) and use the
az-fish bookmarklet available at the top of
the online Fish
(drag it to your browser’ toolbar).
The bookmarklet will take the item on the current Amazon page
and issue the appropriate Fish command to find the about tag.
(You don’t need to log into Fish or Fluidinfo to do this.)
“The entities defined as work (a distinct intellectual or artistic
creation) and expression (the intellectual or artistic realization
of a work) reflect intellectual or artistic content. The entities
defined as manifestation (the physical embodiment of an expression
of a work) and item (a single exemplar of a manifestation), on the
other hand, reflect physical form.”
Loosely, a work is the conceptual book, usually described by the
combination of a title and author—Animal Farm by George Orwell.
The report describes an expression of a work as “the intellectual or artistic
realization of a work in the form of alpha-numeric, musical, or
choreographic notation, sound, image, object, movement, etc., or any
combination of such forms.” Thus George Orwell’s Animal Farm
can be translated into different languages, laid out differently,
typeset on pages, or in digital form, or recorded as spoken words,
and these correspond to different expressions of that same book.
There may also be different editions, printings etc., which may have
slightly different content. Again, these are different expressions
of the same conceptual work. (Occasionally, expressions may encompass
several works, such as in the case of compendia.)
Moving down the hierarchy, a manifestation is a particular rendering
of a work into physical form — “the physical embodiment of an
expression of a work.” Note that “[A]s an entity, manifestation
represents all the physical objects that bear the same
characteristics, in respect to both intellectual content and physical
form.” Thus, all the copies of the same printing of the same edition
of Animal Farm that are essentially indistinguishable collectively
correspond to a manifestation of George Orwell’s Animal Farm.
Finally, an item is an individual copy of a book: “a single exemplar
of a manifestation.”
The entries in the British Library’s catalogue correspond literally to
items, but conceptually to manifestations,
but the objects to which I have attached the data in Fluidinfo correspond
to works. This is why the c. 3 million records reduce to c. 2.5 million
Fluidinfo objects, and why some of the objects have multiple ISBNs etc.
It is entirely possible to create further objects at the level of
manifestations (and even items, if someone really wants to do so),
and even more so at the level of expressions, but I have not done this
yet.
The reason I have concentrated on works rather than manifestations is
that this seems much the most important level to represent in a system
like Fluidinfo: with important exceptions, when people want to rate or
comment on a book, it is most often the work, rather than the
manifestation, that they are interested in. Moreover, collecting
together information about the different ISBNs associated with a
single work is positively helpful. That is not to say that there
isn’t a case for creating other objects at the level of expressions
or manifestations.
There is a great deal more that can be usefully done with the
fabulous data from the British Library. While I am not
committing to doing these, tasks on my list list include:
Authors. Creating an object corresponding to each
creator/author/contributor. I plan to use about tags of the
form author:normalizedname(birth-year) for these,
e.g. author:GeorgeOrwell(1903). The required data is
largely available in the BNB dataset. I would then plan to add a
book/related-authors tag to each book, pointing to its
authors’ objects and, on the author objects, corresponding sets of
book/related-books tags pointing back to their works.
Upload Checking. Checking the everything uploaded OK. I
count 2,558,738 unique books (as works) in the BNB dataset, and I
appeared to upload all of these successfully (getting HTTP 204
statuses back from Fluidinfo). However, when I count objects
having a book/r tag, I get only 2,468,661, a shortfall of
90,077.
Whether this indicates a problem or not is unclear, as if I count
the number of books with a book/source but no book/r,
with the query
has book/source except has book/r
it reports 18,921 such books, but as far as I can tell, all those it
finds in fact have a book/r, so it appears that Fluidinfo is
having some difficulty executing some queries correctly at the moment.
About Tag Checking. I had to use some fairly hairy code to
coerce the BNB data into the correct form to generate canonical
about tags in the book-u convention, and it has definitely
failed in some cases. For example, I have seen at least one example
where the surname of an author in the BNB data preceded the
forename but without a comma, so that forename and surname will
have been reversed. To the extent that I can detect these problems,
I will try to fix them.
Recent additions. I believe the British Library has issued
updates with recent additions (since November 2010); I certainly
plan to get that data and import it in a similar fashion, and
then to set up a CRON job to do that regularly.
In this way, I hope the dataset will be living and always current.
Categorizations. The BNB data includes subject categories for the
records, which I have not imported thus far. I will do so.
Year information. There is information about publication dates
in the BNB data, but it is not in a very structured form.
If I am able to extract it with a satisfactory degree of reliability,
I will get this too. Obviously, different manifestations will have
different publication dates, so this will probably be a set-valued tag.
Enjoy the data, and let me know if you find problems.
I expect I will write a number of other posts on issues associated with
this data.
This page uses HTML5 and an embedded Scalable Vector Graphics (SVG) diagram,
and acts as a test.
From this point on (27th December 2011) my plan is to use HTML5 with embedded
SVG as the default format for posts, and therefore if you use a browser
(or feed reader) that does not support this, you will miss out.
You should see an elegant diagram below.
If you not, it probably means that you are not using a modern, standards-compliant HTML5 Browser.
I have tested this page on the following:
A Macbook Pro running OS X 10.7.2 (Lion) with Safari 5.1.2, Chrome 16.0.912.63, Firefox 8.0.1, Opera 11.6
A Mac Pro running OS X 10.6.8 (Snow Leopard) with the same browsers.
An iPad running iOS 5.0.1 (in Safari, naturally)
An iPhone running iOS 5.0.1 (again, Safari, of course)
A Sony Vaio running Windows 7 with Internet Explorer 9
A prehistoric Dell latitude laptop running Chrome 16 and Firefox 9
and everything looks good. It does *not* work with Internet Explorer 8,
but then, what does?
It also does not work with many older versions of Firefox, Chrome,
Safari or Opera. So this is a good time to upgrade.
I don't know which Android or Linux growers it works with, but I would guess it will work with some.
I imagine the diagrams will not show up in most feed readers, and that is unfortunate, but I think this is a time to push forward, so that is what I am doing.
Among the other marvellous benefits of SVG, it is scalable (just like the S
says), which means that if you zoom the browser (generally command-plus on
macs, and control-plus on Windows) the diagram will scale too, without
looking terrible.
Wonder of wonders.
It has been an unconsionably long time since I last pushed a version
of Fish to Github. The reason for this is mostly that, while
adding various features, I broke a few things and wanted to get them
all fixed before inflicting them on people. I believe they are
now fixed (but do disabuse me of this notion if you discover otherwise.)
There is much that is new, though almost all changes are backwards compatible.
Everything is documented in the new green-themed documentation,
available (as usual) in Fluidinfo itself at
You can omit the -a or -i on the tag, untag, show,
get and tags commands. If you do so, Fish will use the first
argument instead, assuming that if it looks like a UUID, it is a
UUID, and if not, that it isn’t. For these purposes, UUIDs
must be expressed with lower-case hex digits and include the dashes
in 88888888-4444-4444-4444-cccccccccccc formation.
So now the following both work where before they would have generated
errors:
$ fish show Paris rating /about /id
Object with about="Paris":
/njr/rating = 10
/fluiddb/about = "Paris"
/id = "17ecdfbc-c148-41d3-b898-0b5396ebe6cc"
$ fish show 17ecdfbc-c148-41d3-b898-0b5396ebe6cc rating /about /id
Object 17ecdfbc-c148-41d3-b898-0b5396ebe6cc:
/njr/rating = 10
/fluiddb/about = "Paris"
/id = "17ecdfbc-c148-41d3-b898-0b5396ebe6cc"
Needless to say, the longer -a and -i forms work,
and are useful if you want to tag multiple objects in one command
(since they may be repeated).
The tags command now displays the tags in alphabetical order,
except for the about tag, which is always listed first.
Fish now supports simple aliases, which effectively allow you to
add commands to Fish. A simple example is:
fish alias eiffel 'show -a "Eiffel Tower"'
which allows commands like eiffelrating to be used in place of
show-a"EiffelTower"rating.
Aliases are stored in Fluidinfo, with private tags on objects
whose about tag is the name of the alias. For example, with the
alias definition above, the object with about tag paris has a tag
njr/.fish/alias added to it with its value set to the expansion text
for the alias:
$ fish alias paris
paris:
njr/.fish/alias = "show -a "Paris""
(Obviously, the quoting here is slightly unfortunate; I will fix that
some time.)
Aliases are also cached locally in the file-system; the cache is
updated from Fluidinfo using the new sync command or whenever
Fish is entered in interactive mode (by typing Fish).
Because aliases are stored in Fluidinfo, they can be shared between
multiple copies of Fish, and also with the online version
Shell-Fish.
The cache can be viewed with showcache.
Support for sequences has been added. Sequences provide a
convenient way of storing a numbered collection of items that are
added to over time. They are described in a previous blog post
(Sequences in Fluidinfo).
Briefly, in the simplest case, a sequence of remarks is defined
by saying:
$ fish mkseq remark
This creates two new aliases:
remark is used to add a new remark, using the alias
$ fish alias
remark:
njr/.fish/alias = "seq /njr/remark"
remarks is used to look at (or search) remarks, using the alias
remarks:
njr/.fish/alias = "listseq /njr/remark"
Thus if we say:
$ fish mkseq remark
Next remark number: 0
$ fish remark "Isn't this a remarkable first remark"
0: Isn't this a remarkable first remark
2011-12-18
$ fish remark "...and this only slightly less remarkable"
1: ...and this only slightly less remarkable
2011-12-18
then we will see:
$ fish remarks
0: Isn't this a remarkable first remark
2011-12-18
1: ...and this only slightly less remarkable
2011-12-18
By default, sequences are public, but you can easily make them
private by specifying a tag in a private namespace (typically
private); you can also specify the plural form. To set up the
sequence as private, and use myremarks to list and search
remarks, you would instead say:
Non-primitive types are now shown more sensibly (by show and tags).
Previously, Fish would attempt to print even non-primitive types,
with sometimes unfortunately consequences both in terms of the volume
of output and its effects on terminals. For non-primitive types,
output is now shown as below:
$ fish show fish /fish/index.html
Object with about="fish":
/fish/index.html = <Non-primitive value of type text/html (size 8907)>
Display of set-valued tags is also improved, e.g.:
The Fish API has been updated to take account of the renaming of
FluidDB to Fluidinfo, and various tests have been changed to
use more esoteric unicode characters.
Some operations are faster (because more use is made of the /values
endpoint).
Finally, when Fish starts it checks the environment for the presence
of the variable FISHUSER. If this is defined, the credentials in
the startup file identified by the string specified in FISHUSER
will be used, rather than the default ones. (This is mainly helpful
if you want to use Fish with different Fluidinfo accounts in different
shells concurrently.) Thus, if FISHUSER is set to foo (on UNIX),
the credentials from ~/.fluidDBcredentials.foo will be used, rather
than those in ~/.fluidDBcredentials.
So, obviously, there are quite a lot of changes, and though I’ve been
using it for a while, some things might have broken. (I fixed some
bugs yesterday; always dangerous!)
The twin evils that the abouttag.py library and this blog exist
to fight are fragmentation and overloading.
Fragmentation occurs in Fluidinfo when different users store
information about the same thing on different objects, while
overloading occurs when people store information about
different things on the same object. In general, both of these
are undesirable. Fragmentation reduces data sharing and makes
it harder to extract information from the system, whereas overloading
creates ambiguity and confusion.
One of the more common uses for Fluidinfo is for tagging web pages,
and it is very natural to use the URL as the about tag,
as almost everyone does. There is not much
of a problem with overloading in this case (except to the extent that
URLs point to web pages that change over time), but there is definitely
fragmentation.
I would distinguish between two kinds of fragmentation in the case of URLs.
Different representations of the same URL.
Perhaps the most obvious example is the trailing slash on many URLs.
Punctilious persons with good knowledge of W3C standards
(and in particular RFC3986)
prefer the inclusion of a trailing slash on URLs (and more generally,
on URIs)
where appropriate, and thus prefer
http://fluidinfo.com/
to the more colloquial
http://fluidinfo.com
Technically, these are different URLs, but web servers so routinely
and uniformly redirect the latter to the former that they can be
considered for all practical purposes the same.
It seems highly desirable for any convention for about tags for URLs
to map these two forms, along with other similar representational variants,
to a common about tag.
Different URLs that may or may not represent the same web page.
The most obvious example of this is the www. that used to be
de rigeur and is now commonly (but not reliably) redundant.
Most right-thinking webmasters (webmistresses?) routinely redirect
these to the same place,
there is no general guarantee that the www. form
(http://www.fluidinfo.com/)
and the bare form (http://fluidinfo.com/) will produce the same
page, nor even that they should both work.
Standardizing this would therefore seem to be a normalization too far.
Fluidinfo is far from the only system with an interest in developing
a canonical or normalized form for URLs. Search engines and social
bookmarking sites (such as Pinboard
and Delicious) work better if different
URLs representing the same resource are collapsed, and as mentioned
above, there is even a standard
(RFC3986)
for how to perform the canonicalization.
The relevant Wikipedia
page describes six normalizations that preserve URL semantics.
These are:
Converting the scheme and host to lower case.
(HTTP:// → http:// and FLUIDINFO.COM → fluidinfo.com).
Capitalizing letters in escape sequences (%3a → %3A)
Decoding percent-encoded octets of unreserved characters
(%7E → ~)
Adding a trailing slash where appropriate
(http://fluidinfo.com → http://fluidinfo.com/)
Removing the default port (http://fluidinfo.com:80/ →
http://fluidinfo.com/)
Happily, libraries to perform these normalizations already
exist and are freely for a number of programming languages,
including Python.
As noted above, Jehiah Czebotar’s
urlnorm.py performs
the task admirably in Python,
so in the version of abouttag.py
that I just pushed to Github (version 0.6) I have made added a new
convention, uri-2, corresponding to this behaviour and have made
that the default.
So now:
This is different from the old behaviour, which can be obtained by
explicitly adding a convention argument of ‘uri-1’:
>>> URI(u'http://fluidinfo.com',convention=u'uri-1')u'http://fluidinfo.com'# note no trailing slash>>> URI(u'HTTP://FLUIDINFO.com',convention=u'uri-1')u'http://fluidinfo.com'# Same downcasing, but again no trailing slash>>> URI(u'http://fluidinfo.com:80',convention=u'uri-1')u'http://fluidinfo.com:80'# uri-1 didn't strip default ports>>> URI(u'http://fluidinfo.com/a/./b/?arg=%7Ealice',convention='uri-1')u'http://fluidinfo.com/a/./b/?arg=%7Ealice'# nor did it undo unnecessary %-encoding or strip . & .. path segments.
Both the new and the old versions perform one additional normalization,
which is to add a leading http:// if no scheme is present in the input.
This is not because there is not a distinction between a domain
and a URL, but rather because by calling the URI function the user is
clearly indicating that this is a URI, which requires a scheme, and
http:// is clearly the appropriate default scheme:
The reader may be wondering why I did not adhere to the RFC previously,
and issued forth older versions of the abouttag library with the
altogether inferior behaviour of uri-1.
Ignorance, pure and simple.
Perhaps the biggest difference between the way in which “real” people
use computers and geeks use computers is this:
Real people use Graphical User Interfaces because they find them
intuitive and efficient.
Geeks generally prefer the command line,
which they find easier, more precise and faster than GUIs.
For geeks, GUIs tend to get in the way, limit and interfere.
Here’s a GUI for my files:
It has all the advantages of clickability, a set of icons for actions,
and a few extra things hidden on menus and right-click (“content”) menus.
But it is very limited.
Here, in contrast, is a command line, in its spare, minimalist glory:
If you don’t know what to type, the command line is intimidating,
unhelpful and limiting. But if you do know, you can do almost
anything: far from limiting, the command line is open and alive with
virtually unbounded possibilities. Instead of having to nagivate
menus and finders and buttons and icons, the command line allows you
to access almost anything the machine can do, all from one place,
just by typing.
There are two major things that stop real people from benefitting
from the command line and its liberating possibilities:
People don’t know the commands or the right syntax for them.
Just like a command line, Siri has the potential to allow me to access
anything my phone can do with no navigation: if I want to call Alex, I
say “call Alex”. If I want to find out the height of Mount Everest, I
just say “How high is Mount Everest?”. If I want to send a Tweet, I
say “text Bird” and it will send a Tweet for me. (OK, this last one
is a hack: if I say “Tweet this”, Siri knows exactly what I mean, but
refuses saying “Sorry, Nicholas, I can’t help you with Twitter.” But
I can send a tweet as a text message by saying “text Bird” because I
have the Twitter short-code listed under Bird Parker. “Why not under
Twitter?”, you ask? Because Siri still refuses if I list it under
Twitter! Go figure!)
Of course, unlike the command line, I don’t have to get the syntax right
with Siri. I just issue commands in plain English, and a reasonable
proportion of the time it “understands” me.
Before Siri, the nearest thing to a syntax-free command line for real
people was Google—a little box into which you can type anything in the
reasonable hope that the search will return some relevant
information. But Google is largely a one-trick pony, and even though
it’s a good trick, it’s nothing like as powerful as when the software
makes an attempt to understand the command and has the ability to take
actions. (By offering a list of the top dozen or so “hits”, Google
also hedges its bets, getting the user to pick the best-looking
“answer”: quite apart from speech recognition and “comprehension”,
Siri goes for broke by putting all its money on a single interpretation of
what you said, only asking for clarification occasionally.)
Horace Dediu, his podcast Getting to Know You makes the case that the significance
of Siri is that it allows Apple to learn much more about its users,
allowing a new level of lock-in, power and service. That’s an
interesting and important perspective that may prove to be right.
But after a day with Siri, I think the more direct and immediate
consequence is exactly that Siri could bring all the power of the
command line to the masses.
Actually, most geeks don’t really like typing either, and
have myriad ways to reduce typing, from globbing
(wild-card expansion) to command-line completion;
but the basic point stands.