30 December 2011

Lies, Damned Lies and Progress Indicators

Here are some you may have met. (Click start, in a modern non-Microsoft* browser, on each one.)

The perfect progress indicator (rarely seen in the wild).

START DONE predicted actual

The software installer progress bar

START DONE predicted actual

The software uninstaller progress bar

START DONE predicted actual

The media-download progress bar.

START DONE predicted actual

The software update progress bar.

START DONE predicted actual

No wonder so many cop out and use one of these:

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/red-spinner.gif

* [Why non-Microsoft? Before going all HTML5 and SVG on this blog I made a point of testing things with Internet Explorer 9 and was delighted to find that everything seemed to work fine there. Unfortunately, I didn’t think to test SMIL-based SVG animation, which works (you guessed it) on Chrome, Safari, Firefox, Opera, iPhone, iPad and even newer Androids, but not, in fact, Internet Explorer 9.]

27 December 2011

The British Library Catalogue / British National Bibliography

I have added to Fluidinfo information on approximately 2.5 million books drawn from the roughly 3 million records in the British National Bibliography, which documents the British Library’s Catalogue.

As ever, I have used the book-u convention (implemented using the Python abouttag library) to select about tags for the objects, and have tagged the books in Fluidinfo under the book user. Data specific to the British National Biography (BNB) is stored in the namespace book/bnb, while more generic data (derived from the information contained in the Bibliography) is stored directly in the book namespace.

Here is an example of a book that has been augmented with data from the British National Library. The book is George Orwell’s Animal Farm, and it is illustrated using the About Tag visualizer. (If you can’t see the picture below, upgrade to the latest version of your browser or see here for information on why you might be having trouble.) The green tags are the new ones.

fluidinfo 1529c459- f3f2- 45e1- 90f4- 3ff3040ad6df alice/comment="So disappointing." alice/has-read alice/likes=False alice/rating=2 bert/comment="What a book: I love it!" bert/has-read bert/rating=8 book/author="George Orwell" book/bnb/contributors={} book/bnb/creator="Orwell, George, 1903-1950" book/bnb/id={"GB9689279", "GBB005647", "GB8416414", "GBA0Y6010", "GB9330497", "GB7301513"} book/dewey={"823.912", "823/.912", "823/.9/1"} book/isbn={"070898200X", "185715150X", "0582275245", "0582434475", "0435121650", "978141...} book/r=0.170613118849 book/source={"BNBrdfdc13.xml-201011150#088316", "BNBrdfdc13.xml-201011150#029879", "BNBrdf...} book/title="Animal farm" fluiddb/about="book:animal farm (george orwell)" girafind/books/author={"George Orwell"} girafind/books/language="["$_english"]" girafind/books/title="Animal Farm" miro/books/author="George Orwell" miro/books/forename="George" miro/books/guardian-1000 miro/books/surname="Orwell" miro/books/title="Animal Farm" miro/books/year=1945 miro/class="record" njr/index/about njr/rating=10 otoburb/has-read otoburb/rating=8

Notice that, because of the careful normalization inherent in the book-u convention, where the book is already in Fluidinfo, the new data has generally been added to the existing object corresponding to that book, as in the case above.

The core data that should almost always be present is:

  • the about tag fluiddb/about, normalized using the book-u convention:

    book:animal farm (george orwell)

  • the book/author tag, containing the best author information I was able to extract, in this case

    George Orwell

    Where there is more than one author, they are generally shown separated by commas, with the last joined with an and (with no Oxford Comma). For example, The Feynman Lectures on Physics, by Feynman, Leighton and Sands has

    $ fish show -a 'book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)'
    /book/author
    
    Object with about="book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)":
      /book/author = "Richard P. Feynman, Robert B. Leighton and Matthew L. Sands"

    or, graphically:

    fluidinfo aeaa654c- 35b0- 4b00- 866b- c7deda8959c4 book/author="Richard P. Feynman, Robert B. Leighton and Matthew L. Sands" book/bnb/contributors={"Leighton, Robert B.", "Sands, Matthew L. (Matthew Linzee)"} book/bnb/creator="Feynman, Richard P. (Richard Phillips), 1918-1988." book/bnb/id={"GBA901036", "GBA645628"} book/dewey={"530"} book/isbn={"0805390499", "0805390669"} book/r=0.893236319082 book/source={"BNBrdfdc16.xml-201011150#096269", "BNBrdfdc14.xml-201011150#132062"} book/title="The Feynman lectures on physics" fluiddb/about="book:the feynman lectures on physics (richard p feynman; robert b leighton; m..."

    The book/author tag has had a lot of processing done to it, as described below.

  • the book/title field, which is usually almost identical to that in the BNB data. In this case it is:

    Animal farm

    I have not altered the capitalization, which is therefore generally consistent with some entry in the BNB database (though I would really prefer it were in Title Case).

  • the book/source tag shows where the base data was taken from. This tag’s value is a set of strings, each of which corresponds an entry in one of the 17 files from which the BNB data was extracted. The entries consist of

    • the name of the file (always BNBrdfdcNN.xml) where NN runs from 01 to 17
    • a dash -
    • the datestamp on that file (always 20101115 at present)
    • the digit zero (0) and a # sign
    • the record number in the file, starting from 1, with six digits.

    Since multiple bibliographic entries can correspond to the same work, there is sometimes more than one of these.

  • the book/r tag is a pseudo-random floating point value with 0.0 ≤ book/r < 1.0.

Some of the raw data has also been added, with almost no cleaning up, under the book/bnb namespace. The BNB data uses the Dublin Core metadata standard, and includes:

  • bnb/creator, which is the person or organization primarily responsible for the creation of the work. This is sometimes blank, and is stored as a single string value.
  • bnb/contributors, which is a list of contributors, sometimes including the creator and sometimes not.
  • bnb/dewey is the set of Dewey Decimal classifications found on the records corresponding to this book.
  • bnb/isbn is the set of international standard book numbers found on the records corresponding to this book.
  • bnb/id is the set of British Library IDs found on the records corresponding to this book. (I’m not entirely clear what this identifier is, but it appears to be important and well populated.)

Other information is available in the data (including classification information), and I will probably extract this and add it at a later time.

Finding, Inspecting and Tagging Books in Fluidinfo

There are multiple ways of retrieving book data from Fluidinfo and of tagging it.

  • Probably the easiest and most general method is to go to http://artoftagging.com and do a search that involves a book and some keywords from the title and/or author. A list of results should come back and you can see a visualization of any of them by clicking the link If you have a Fluidinfo account, you can create an account at artoftagging.com and then save your Fluidinfo details there. Once logged in, you will then be able to add your own tags to any object you find.

  • If you just want to construct the about tag for a book, you can do that using the online version of the Fluidinfo Shell, Fish. Once there, type, for example:

    fish> about book "Animal Farm" "George Orwell"
    book:animal farm (george orwell)
    
    fish> about book "The Feynman Lectures on Physics' 'Richard P. Feynman"
    "Robert B. Leighton" "Matthew L. Sands"
    book:the feynman lectures on physics (richard p feynman; robert b leighton; matthew l sands)

    (The quotes tell Fish that “Animal Farm” is the title and “George Orwell” a single author.) Alternatively, you can download and install Fish on your own machine. (It is available from Github.) You can then type the same commands, after fish, e.g.:

    $ fish> about book "Animal Farm" "George Orwell"
    book:animal farm (george orwell)

    You can then use any Fluidinfo tool, including the new Object Browser, to work with that object, signing in with Twitter if you like.

  • Another easy way of finding an about tag for a book is to find it on Amazon (US or UK, for now) and use the az-fish bookmarklet available at the top of the online Fish (drag it to your browser’ toolbar). The bookmarklet will take the item on the current Amazon page and issue the appropriate Fish command to find the about tag. (You don’t need to log into Fish or Fluidinfo to do this.)

The Hierarchy of Books: Works and Manifestations

The International Federation of Library Associations (IFLA) describes a hierarchy of four kinds of “book” entities in its report Functional Requirements for Bibliographic Records. These are:

  • works
  • expressions
  • manifestations
  • items.

Quoting from that report:

“The entities defined as work (a distinct intellectual or artistic creation) and expression (the intellectual or artistic realization of a work) reflect intellectual or artistic content. The entities defined as manifestation (the physical embodiment of an expression of a work) and item (a single exemplar of a manifestation), on the other hand, reflect physical form.”

Loosely, a work is the conceptual book, usually described by the combination of a title and author—Animal Farm by George Orwell.

The report describes an expression of a work as “the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms.” Thus George Orwell’s Animal Farm can be translated into different languages, laid out differently, typeset on pages, or in digital form, or recorded as spoken words, and these correspond to different expressions of that same book. There may also be different editions, printings etc., which may have slightly different content. Again, these are different expressions of the same conceptual work. (Occasionally, expressions may encompass several works, such as in the case of compendia.)

Moving down the hierarchy, a manifestation is a particular rendering of a work into physical form — “the physical embodiment of an expression of a work.” Note that “[A]s an entity, manifestation represents all the physical objects that bear the same characteristics, in respect to both intellectual content and physical form.” Thus, all the copies of the same printing of the same edition of Animal Farm that are essentially indistinguishable collectively correspond to a manifestation of George Orwell’s Animal Farm.

Finally, an item is an individual copy of a book: “a single exemplar of a manifestation.”

The entries in the British Library’s catalogue correspond literally to items, but conceptually to manifestations, but the objects to which I have attached the data in Fluidinfo correspond to works. This is why the c. 3 million records reduce to c. 2.5 million Fluidinfo objects, and why some of the objects have multiple ISBNs etc. It is entirely possible to create further objects at the level of manifestations (and even items, if someone really wants to do so), and even more so at the level of expressions, but I have not done this yet.

The reason I have concentrated on works rather than manifestations is that this seems much the most important level to represent in a system like Fluidinfo: with important exceptions, when people want to rate or comment on a book, it is most often the work, rather than the manifestation, that they are interested in. Moreover, collecting together information about the different ISBNs associated with a single work is positively helpful. That is not to say that there isn’t a case for creating other objects at the level of expressions or manifestations.

Further Work

There is a great deal more that can be usefully done with the fabulous data from the British Library. While I am not committing to doing these, tasks on my list list include:

  • Authors. Creating an object corresponding to each creator/author/contributor. I plan to use about tags of the form author:normalized name (birth-year) for these, e.g. author:George Orwell (1903). The required data is largely available in the BNB dataset. I would then plan to add a book/related-authors tag to each book, pointing to its authors’ objects and, on the author objects, corresponding sets of book/related-books tags pointing back to their works.

  • Upload Checking. Checking the everything uploaded OK. I count 2,558,738 unique books (as works) in the BNB dataset, and I appeared to upload all of these successfully (getting HTTP 204 statuses back from Fluidinfo). However, when I count objects having a book/r tag, I get only 2,468,661, a shortfall of 90,077.

    Whether this indicates a problem or not is unclear, as if I count the number of books with a book/source but no book/r, with the query

    has book/source except has book/r

    it reports 18,921 such books, but as far as I can tell, all those it finds in fact have a book/r, so it appears that Fluidinfo is having some difficulty executing some queries correctly at the moment.

  • About Tag Checking. I had to use some fairly hairy code to coerce the BNB data into the correct form to generate canonical about tags in the book-u convention, and it has definitely failed in some cases. For example, I have seen at least one example where the surname of an author in the BNB data preceded the forename but without a comma, so that forename and surname will have been reversed. To the extent that I can detect these problems, I will try to fix them.

  • Recent additions. I believe the British Library has issued updates with recent additions (since November 2010); I certainly plan to get that data and import it in a similar fashion, and then to set up a CRON job to do that regularly. In this way, I hope the dataset will be living and always current.

  • Categorizations. The BNB data includes subject categories for the records, which I have not imported thus far. I will do so.

  • Year information. There is information about publication dates in the BNB data, but it is not in a very structured form. If I am able to extract it with a satisfactory degree of reliability, I will get this too. Obviously, different manifestations will have different publication dates, so this will probably be a set-valued tag.

Enjoy the data, and let me know if you find problems.

I expect I will write a number of other posts on issues associated with this data.

About Tag Goes HTML5 with Embedded SVG: Browser Requirements

This page uses HTML5 and an embedded Scalable Vector Graphics (SVG) diagram, and acts as a test. From this point on (27th December 2011) my plan is to use HTML5 with embedded SVG as the default format for posts, and therefore if you use a browser (or feed reader) that does not support this, you will miss out.

You should see an elegant diagram below. If you not, it probably means that you are not using a modern, standards-compliant HTML5 Browser.

fluidinfo aeaa654c- 35b0- 4b00- 866b- c7deda8959c4 book/author="Richard P. Feynman, Robert B. Leighton and Matthew L. Sands" book/bnb/contributors={"Leighton, Robert B.", "Sands, Matthew L. (Matthew Linzee)"} book/bnb/creator="Feynman, Richard P. (Richard Phillips), 1918-1988." book/bnb/id={"GBA901036", "GBA645628"} book/dewey={"530"} book/isbn={"0805390499", "0805390669"} book/r=0.893236319082 book/source={"BNBrdfdc16.xml-201011150#096269", "BNBrdfdc14.xml-201011150#132062"} book/title="The Feynman lectures on physics" fluiddb/about="book:the feynman lectures on physics (richard p feynman; robert b leighton; m..."

I have tested this page on the following:

  • A Macbook Pro running OS X 10.7.2 (Lion) with Safari 5.1.2, Chrome 16.0.912.63, Firefox 8.0.1, Opera 11.6
  • A Mac Pro running OS X 10.6.8 (Snow Leopard) with the same browsers.
  • An iPad running iOS 5.0.1 (in Safari, naturally)
  • An iPhone running iOS 5.0.1 (again, Safari, of course)
  • A Sony Vaio running Windows 7 with Internet Explorer 9
  • A prehistoric Dell latitude laptop running Chrome 16 and Firefox 9

and everything looks good. It does *not* work with Internet Explorer 8, but then, what does? It also does not work with many older versions of Firefox, Chrome, Safari or Opera. So this is a good time to upgrade. I don't know which Android or Linux growers it works with, but I would guess it will work with some.

I imagine the diagrams will not show up in most feed readers, and that is unfortunate, but I think this is a time to push forward, so that is what I am doing.

Among the other marvellous benefits of SVG, it is scalable (just like the S says), which means that if you zoom the browser (generally command-plus on macs, and control-plus on Windows) the diagram will scale too, without looking terrible. Wonder of wonders.

19 December 2011

Fish 4.18 Released: Aliases, Sequences, Duck Typing and More

It has been an unconsionably long time since I last pushed a version of Fish to Github. The reason for this is mostly that, while adding various features, I broke a few things and wanted to get them all fixed before inflicting them on people. I believe they are now fixed (but do disabuse me of this notion if you discover otherwise.)

There is much that is new, though almost all changes are backwards compatible. Everything is documented in the new green-themed documentation, available (as usual) in Fluidinfo itself at

http://fluiddb.fluidinfo.com/about/fish/fish/index.html

The main changes are as follows:

  • You can omit the -a or -i on the tag, untag, show, get and tags commands. If you do so, Fish will use the first argument instead, assuming that if it looks like a UUID, it is a UUID, and if not, that it isn’t. For these purposes, UUIDs must be expressed with lower-case hex digits and include the dashes in 88888888-4444-4444-4444-cccccccccccc formation. So now the following both work where before they would have generated errors:

    $ fish show Paris rating /about /id
    Object with about="Paris":
      /njr/rating = 10
      /fluiddb/about = "Paris"
      /id = "17ecdfbc-c148-41d3-b898-0b5396ebe6cc"
    
    $ fish show 17ecdfbc-c148-41d3-b898-0b5396ebe6cc rating /about /id
    Object 17ecdfbc-c148-41d3-b898-0b5396ebe6cc:
      /njr/rating = 10
      /fluiddb/about = "Paris"
      /id = "17ecdfbc-c148-41d3-b898-0b5396ebe6cc"

    Needless to say, the longer -a and -i forms work, and are useful if you want to tag multiple objects in one command (since they may be repeated).

  • The tags command now displays the tags in alphabetical order, except for the about tag, which is always listed first.

  • Fish now supports simple aliases, which effectively allow you to add commands to Fish. A simple example is:

    fish alias eiffel 'show -a "Eiffel Tower"'

    which allows commands like eiffel rating to be used in place of show -a "Eiffel Tower" rating.

    Aliases are stored in Fluidinfo, with private tags on objects whose about tag is the name of the alias. For example, with the alias definition above, the object with about tag paris has a tag njr/.fish/alias added to it with its value set to the expansion text for the alias:

    $ fish alias paris
    paris:
      njr/.fish/alias = "show -a "Paris""

    (Obviously, the quoting here is slightly unfortunate; I will fix that some time.)

    Aliases are also cached locally in the file-system; the cache is updated from Fluidinfo using the new sync command or whenever Fish is entered in interactive mode (by typing Fish).

    Because aliases are stored in Fluidinfo, they can be shared between multiple copies of Fish, and also with the online version Shell-Fish.

    The cache can be viewed with showcache.

  • Support for sequences has been added. Sequences provide a convenient way of storing a numbered collection of items that are added to over time. They are described in a previous blog post (Sequences in Fluidinfo).

    Briefly, in the simplest case, a sequence of remarks is defined by saying:

    $ fish mkseq remark

    This creates two new aliases:

    • remark is used to add a new remark, using the alias

      $ fish alias
      remark:
        njr/.fish/alias = "seq /njr/remark"
    • remarks is used to look at (or search) remarks, using the alias remarks:

      njr/.fish/alias = "listseq /njr/remark"

    Thus if we say:

    $ fish mkseq remark
    Next remark number: 0
    
    $ fish remark "Isn't this a remarkable first remark"
    0: Isn't this a remarkable first remark
    2011-12-18
    
    $ fish remark "...and this only slightly less remarkable"
    1: ...and this only slightly less remarkable
    2011-12-18

    then we will see:

    $ fish remarks
    0: Isn't this a remarkable first remark
    2011-12-18
    
    1: ...and this only slightly less remarkable
    2011-12-18

    By default, sequences are public, but you can easily make them private by specifying a tag in a private namespace (typically private); you can also specify the plural form. To set up the sequence as private, and use myremarks to list and search remarks, you would instead say:

    $ mkseq remark remarks private/remark

    For more details, see the previous blog post or the documentation.

  • Non-primitive types are now shown more sensibly (by show and tags). Previously, Fish would attempt to print even non-primitive types, with sometimes unfortunately consequences both in terms of the volume of output and its effects on terminals. For non-primitive types, output is now shown as below:

    $ fish show fish /fish/index.html
    Object with about="fish":
      /fish/index.html = <Non-primitive value of type text/html (size 8907)>
  • Display of set-valued tags is also improved, e.g.:

    $ fish tags 'artist:led zeppelin'
    Object with about="artist:led zeppelin":
      /fluiddb/about = "artist:led zeppelin"
      /musicbrainz.org/artist
      /musicbrainz.org/artist/end-date = "1980-09-25"
      /musicbrainz.org/artist/members = {
        "Jimmy Page"
        "John Bonham"
        "John Paul Jones"
        "Robert Plant"
      }
      /musicbrainz.org/artist/name = "Led Zeppelin"
      /musicbrainz.org/artist/sort-name = "Led Zeppelin"
      /musicbrainz.org/artist/start-date = "1968-01-01"
      /musicbrainz.org/artist/type = "group"
      /musicbrainz.org/mbid = "678d88b2-87b0-403b-b63d-5da7465aecc3"
  • The Fish API has been updated to take account of the renaming of FluidDB to Fluidinfo, and various tests have been changed to use more esoteric unicode characters.

  • Some operations are faster (because more use is made of the /values endpoint).

  • Finally, when Fish starts it checks the environment for the presence of the variable FISHUSER. If this is defined, the credentials in the startup file identified by the string specified in FISHUSER will be used, rather than the default ones. (This is mainly helpful if you want to use Fish with different Fluidinfo accounts in different shells concurrently.) Thus, if FISHUSER is set to foo (on UNIX), the credentials from ~/.fluidDBcredentials.foo will be used, rather than those in ~/.fluidDBcredentials.

So, obviously, there are quite a lot of changes, and though I’ve been using it for a while, some things might have broken. (I fixed some bugs yesterday; always dangerous!)

15 December 2011

Fragmentation and URL Normalization

I have updated the abouttag.py library to use a new, better convention for normalizing URLs. The two main changes people will notice are:

  1. URLs that represent directories will now include, rather than exclude, a trailing slash:

    http://fluidinfo.com/

    rather than

    http://fluidinfo.com
  2. There is now a dependency on the excellent urlnorm.py, by Jehiah Czebotar.

The Issue: Fragmentation

The twin evils that the abouttag.py library and this blog exist to fight are fragmentation and overloading.

Fragmentation occurs in Fluidinfo when different users store information about the same thing on different objects, while overloading occurs when people store information about different things on the same object. In general, both of these are undesirable. Fragmentation reduces data sharing and makes it harder to extract information from the system, whereas overloading creates ambiguity and confusion.

One of the more common uses for Fluidinfo is for tagging web pages, and it is very natural to use the URL as the about tag, as almost everyone does. There is not much of a problem with overloading in this case (except to the extent that URLs point to web pages that change over time), but there is definitely fragmentation.

I would distinguish between two kinds of fragmentation in the case of URLs.

  1. Different representations of the same URL. Perhaps the most obvious example is the trailing slash on many URLs. Punctilious persons with good knowledge of W3C standards (and in particular RFC3986) prefer the inclusion of a trailing slash on URLs (and more generally, on URIs) where appropriate, and thus prefer

    http://fluidinfo.com/

    to the more colloquial

    http://fluidinfo.com

    Technically, these are different URLs, but web servers so routinely and uniformly redirect the latter to the former that they can be considered for all practical purposes the same. It seems highly desirable for any convention for about tags for URLs to map these two forms, along with other similar representational variants, to a common about tag.

  2. Different URLs that may or may not represent the same web page. The most obvious example of this is the www. that used to be de rigeur and is now commonly (but not reliably) redundant. Most right-thinking webmasters (webmistresses?) routinely redirect these to the same place, there is no general guarantee that the www. form (http://www.fluidinfo.com/) and the bare form (http://fluidinfo.com/) will produce the same page, nor even that they should both work.

    Standardizing this would therefore seem to be a normalization too far.

The Old and New Behaviour of abouttag.py

Fluidinfo is far from the only system with an interest in developing a canonical or normalized form for URLs. Search engines and social bookmarking sites (such as Pinboard and Delicious) work better if different URLs representing the same resource are collapsed, and as mentioned above, there is even a standard (RFC3986) for how to perform the canonicalization.

The relevant Wikipedia page describes six normalizations that preserve URL semantics. These are:

  • Converting the scheme and host to lower case. (HTTP://http:// and FLUIDINFO.COMfluidinfo.com).
  • Capitalizing letters in escape sequences (%3a%3A)
  • Decoding percent-encoded octets of unreserved characters (%7E~)
  • Adding a trailing slash where appropriate (http://fluidinfo.comhttp://fluidinfo.com/)
  • Removing the default port (http://fluidinfo.com:80/http://fluidinfo.com/)
  • Removing dot-segments (http://fluidinfo.com/accounts/./new/http://fluidinfo.com/accounts/new/)

Happily, libraries to perform these normalizations already exist and are freely for a number of programming languages, including Python. As noted above, Jehiah Czebotar’s urlnorm.py performs the task admirably in Python, so in the version of abouttag.py that I just pushed to Github (version 0.6) I have made added a new convention, uri-2, corresponding to this behaviour and have made that the default. So now:

>>> from abouttag.uri import URI

>>> URI(u'http://fluidinfo.com')
u'http://fluidinfo.com/'

>>> URI(u'HTTP://FLUIDINFO.com:80')
u'http://fluidinfo.com/'

>>> URI(u'HTTP://FLUIDINFO.com:80')
u'http://fluidinfo.com/'

>>> URI(u'http://fluidinfo.com/a/./b/?arg=%7Ealice')
u'http://fluidinfo.com/a/b/?arg=~alice'

This is different from the old behaviour, which can be obtained by explicitly adding a convention argument of ‘uri-1’:

>>> URI(u'http://fluidinfo.com', convention=u'uri-1')
u'http://fluidinfo.com'
# note no trailing slash

>>> URI(u'HTTP://FLUIDINFO.com', convention=u'uri-1')
u'http://fluidinfo.com'
# Same downcasing, but again no trailing slash

>>> URI(u'http://fluidinfo.com:80', convention=u'uri-1')
u'http://fluidinfo.com:80'
# uri-1 didn't strip default ports

>>> URI(u'http://fluidinfo.com/a/./b/?arg=%7Ealice', convention='uri-1')
u'http://fluidinfo.com/a/./b/?arg=%7Ealice'
# nor did it undo unnecessary %-encoding or strip . & .. path segments.

Both the new and the old versions perform one additional normalization, which is to add a leading http:// if no scheme is present in the input. This is not because there is not a distinction between a domain and a URL, but rather because by calling the URI function the user is clearly indicating that this is a URI, which requires a scheme, and http:// is clearly the appropriate default scheme:

>>> URI(u'fluidinfo.com')
u'http://fluidinfo.com/'

Why...?

The reader may be wondering why I did not adhere to the RFC previously, and issued forth older versions of the abouttag library with the altogether inferior behaviour of uri-1. Ignorance, pure and simple.

10 December 2011

Siri: The Command Line for Everyone Else

Perhaps the biggest difference between the way in which “real” people use computers and geeks use computers is this:

Real people use Graphical User Interfaces because they find them intuitive and efficient.

Geeks generally prefer the command line, which they find easier, more precise and faster than GUIs. For geeks, GUIs tend to get in the way, limit and interfere.

Here’s a GUI for my files:

Finder

It has all the advantages of clickability, a set of icons for actions, and a few extra things hidden on menus and right-click (“content”) menus. But it is very limited.

Here, in contrast, is a command line, in its spare, minimalist glory:

CommandLine

If you don’t know what to type, the command line is intimidating, unhelpful and limiting. But if you do know, you can do almost anything: far from limiting, the command line is open and alive with virtually unbounded possibilities. Instead of having to nagivate menus and finders and buttons and icons, the command line allows you to access almost anything the machine can do, all from one place, just by typing.

There are two major things that stop real people from benefitting from the command line and its liberating possibilities:

  1. People don’t know the commands or the right syntax for them.
  2. People don’t like typing. [1]

Enter Siri

Just like a command line, Siri has the potential to allow me to access anything my phone can do with no navigation: if I want to call Alex, I say “call Alex”. If I want to find out the height of Mount Everest, I just say “How high is Mount Everest?”. If I want to send a Tweet, I say “text Bird” and it will send a Tweet for me. (OK, this last one is a hack: if I say “Tweet this”, Siri knows exactly what I mean, but refuses saying “Sorry, Nicholas, I can’t help you with Twitter.” But I can send a tweet as a text message by saying “text Bird” because I have the Twitter short-code listed under Bird Parker. “Why not under Twitter?”, you ask? Because Siri still refuses if I list it under Twitter! Go figure!)

Of course, unlike the command line, I don’t have to get the syntax right with Siri. I just issue commands in plain English, and a reasonable proportion of the time it “understands” me.

Before Siri, the nearest thing to a syntax-free command line for real people was Google—a little box into which you can type anything in the reasonable hope that the search will return some relevant information. But Google is largely a one-trick pony, and even though it’s a good trick, it’s nothing like as powerful as when the software makes an attempt to understand the command and has the ability to take actions. (By offering a list of the top dozen or so “hits”, Google also hedges its bets, getting the user to pick the best-looking “answer”: quite apart from speech recognition and “comprehension”, Siri goes for broke by putting all its money on a single interpretation of what you said, only asking for clarification occasionally.)

Horace Dediu, his podcast Getting to Know You makes the case that the significance of Siri is that it allows Apple to learn much more about its users, allowing a new level of lock-in, power and service. That’s an interesting and important perspective that may prove to be right. But after a day with Siri, I think the more direct and immediate consequence is exactly that Siri could bring all the power of the command line to the masses.

[1]Actually, most geeks don’t really like typing either, and have myriad ways to reduce typing, from globbing (wild-card expansion) to command-line completion; but the basic point stands.

Labels