02 December 2013

An Evening With Ray Bradbury

Yesterday, I came across this rather wonderful 2001 talk from Ray Bradbury, from University of California Television.

He appears to be talking to an audience of aspiring writers, and while the talk is wide-ranging, his two key recommendations seem to be

  1. Write lots of short stories (one a week, anyway) and delay starting novels
  2. Each night, for the next thousand nights, read one short story, one essay and one poem.

This is interesting advice, and the programme is attractive.

He makes various specific suggestions, mostly for authors, which I thought I’d capture and note here:

Short Story Authors

  • H G Wells
  • Roald Dahl
  • Guy de Maupassant
  • John Cheever
  • Richard Matheson
  • Nigel Kneale
  • John Collier
  • Edith Wharton
  • Eudora Welty
  • Washington Irving
  • (Herman) Melville
  • Edgar Allan Poe
  • Katherine Ann Porter
  • Nathaniel Hawthorne
  • G K Chesterton

Poetry

  • Shakespeare
  • Alexander Pope
  • (Robert) Frost

Nothing modern: “it’s all crap”. [I don’t agree, but then I’m not Ray Bradbury.]

Essays

  • Aldus Huxley
  • Loren Eiseley
  • George Bernard Shaw (especially George Bernard Shaw vs. G K Chestson (debates))
  • The great philosophers

I may have missed one or two, but I think these lists contain most of his specific recommendations.

06 June 2012

Amir Khella: Dead Wrong on Voice Memos for iPhone

Amir Khella, a User Experience consultant, and producer the excellent Keynotopia prototyping templates for iOS and other interfaces, produced an interesting post called Don't Violate Design Laws — Even If You're Apple. While he's an expert in UX and I'm not, I think he's dead wrong in this case.

He starts by saying ". . . when I started using the voice recording app that comes with the iPhone, I was frustrated by its horrible usability".

Really? Voice recorder has horrible usability? My own experience is that it's a simple and very well designed app that is extremely easy to use. I've also heard others specifically talking about how good and easy to use it is (e.g. Andy Ihnatko, on The Ihnatko Almanac...).

So what's Amir's complaint? He continues

"the application dedicates the largest screen real-estate to a giant microphone screenshot that does nothing, and places the functional buttons of the app in the two bottom corners, occupying less than 5% of the screen space."

Well, that's true, but I think Amir makes two or three serious errors here:

  1. He argues from a rather special and unusual situation (wanting to operate his phone while driving—illegal where I am, in the UK, and ill-advised anywhere) for a general change to the interface
  2. He implictly assumes that making a target "easier to reach" is an unalloyed good and the fundamental goal of user interaction, while ignoring the (in my view) serious negatives of this approach (primarily accidental engagement and loss of work)
  3. He argues for a design that would probably be very unexpected and surprising for people, and which arguably violates design norms, which would be likely to lead to loss of work.

I think to call the user experience horrible is manifest hyperbole: the interface has a prominent, unmistakable, easy-to-hit, highly recognisable button that does exactly what you expect, and just one other button to access a list of voice memos. The app makes it unmistakably clear when you are recording with a big red banner saying "Recording" (with the elapsed time) and the swapping of the record button to the universal pause button, and the memo list button to the universal stop button.

Admittedly, when paused the banner is still red, and it would perhaps be clearer if it changed to black or grey at this time, but I think it's pretty good.

Is the record button small and hard to hit? No; it's a fairly standard size for a touch target on the iPhone. It could certainly be twice the area without looking ridiculous, but few buttons on Apple apps are larger than that: there are full-screen-width buttons for some functions, but even critical things like "Answer call" vs. "Reject call" are only half-screen width.

Amir is talking about usability in the special and unusual situation where he doesn't want to look at the screen.

What would happen if Apple actually adopted Amir's suggestion?

First, I think people would find themselves starting and stopping recordings frequently when they didn't mean to. People touch their phones all the time—intentionally, absent-mindedly and accidentally. Having a situation where almost any touch on the interface activated or deactivated the recording would be a nightmare, especially if the whole screen didn't look like a huge button. I am very confident that would drive people nuts.

Secondly, on phones, the screen dims quickly to save battery life—a good thing. As a result, when you want to see information, people are used to touching the screen to undim it. (In fact, the screen will even turn off while recording.) In fact, when the screen is dimmed, touching the pause button does not pause the app; it first undims the screen and then requires a second tap to pause the recording. (I didn't know this; I had to try it to find out. But the point is, it's not confusing or problematical at all: you get very clear visual feedback and press it again. And I bet lots of people try not to press an active area when undimming the screen.)

All this might work the same way if the whole screen (or most of it) were active, but the point is that you'd still have to look in order to see what state the app is in.

Finally, I would even dispute Amir's contention that the main area of the screen "does nothing". Sure, it's not an active area, but the whole design of the app screams at you "I am a recording app. This is a microphone. I have a big red record button and a VU meter to show you what I can hear." Yes, it's a bit skeuomorphic, but more defensibly so than the faux-leather in the calendar app.

In fact, the only real usability problem I would cite with the app is knowing how to finalize/stop recording if (as I suspect is common) people don't notice the stop button and hit pause instead. But then again, there are only two buttons, so given that the one on the left is obviously record, I think most people are going to figure out that hitting the other one will probably do the job (as it does).

Maybe Amir just wrote it as link bait (in which case, I was caught). But I think he protests too much.

15 March 2012

Book: Getting Started with Fluidinfo

I wrote a book, with my friend and colleague Nicholas Tollervey (@ntoll). It’s published by O’Reilly Media and is available both as a printed tome and a DRM-free, multi-format electronic book direct from O’Reilly. If you use the code AUTHD at checkout, you can get a discount, as described here:

http://fluiddb.fluidinfo.com/about/book:getting%20started%20with%20fluidinfo%20(nicholas%20j%20radcliffe;%20nicholas%20h%20tollervey)/njr/image/flier.png

The book is also available from Amazon.com, Amazon (UK), Waterstones, Barnes & Noble and all good booksellers, even local ones staffed by real people who love books. Ordering direct from O’Reilly is probably quickest and lets you use the discount code.

All O’Reilly “animal” books come be known by the species on their covers: the rather striking animal on our cover is “a jellyfish-like animal of the genus Stephalia”, and you can read all about it in the book’s Colophon. The image is from Lydekker’s Royal Natural History.

For avoidance of doubt, the animal appearing on this work is real. Any resemblance to fictitious persons, animals or deities, is purely coincidental.

It is acceptable to refer to the book as “The Jellyfish Book” or the “The Stephalia Book”, but definitely not “The Flying Spaghetti Monster Book”.

12 February 2012

xmltextnorm: A Simple XML Text Normalizer to Support Diff for Docbook, XHTML, HTML etc.

The difficulty with diffing DocBook Sources

As you may know, Nicholas Tollervey (@ntoll) and I have been working on an O’Reilly book called Getting Started with Fluidinfo, which goes for printing next week.

When you write for O’Reilly, you have a choice of using DocBook, an XML format, or AsciiDoc as your source format. We chose DocBook, which is what O’Reilly uses for production. For the purposes of this article, DocBook is like a richer, more powerful version of HTML or XHTML that can be transformed to produce output in multiple formats, including PDF and ePub.

One frustration with that process, for me, is the lack of a good way of viewing changes made between versions. The two main options O’Reilly suggests are either to use some kind of diff tool on the sources, or to use something like pdfdiff to look at differences in the formatted output.

Unfortunately, pdfdiff doesn’t work well if text moves between pages, which it tends to do with all but the most trivial editing.

Line-based tools like Unix diff or graphical equivalents (opendiff, xdiff etc.) are to some extent inherently unsuitable because they focus on lines, and line breaks have no special significance in DocBook, just as they don’t in HTML: they are just whitespace. Either, like me, you break the lines in convenient places (in my case, usually using Emacs’s fill function, M-Q), or you use long lines. Neither is very satisfactory for a line-based diff tool, because the first strategy makes small changes look large, and the second hides changes in long lines that are hard to read in the diff output.

A third option, in principle, is to use an XML diff tool. There are some, the most interesting of which looks to be diffxml, but unfortunately the output from those appeats not to be primarily targeted at humans.

A partial solution: xmltextnorm.py

Today I did what I should have done at the start of the process, and wrote a simple script to normalize the text in an XML source file in such a way that line-based diff tools will be more useful. The idea is that diffing the normalized text from two XML source files should produce a meaningful diff (of the text) which is relatively insensitive to changes to the source files that won’t affect the formatted PDF, HTML or whatever.

Of course, this is only part of the story: if you want to see changes to the XML markup, you’ll need something else entirely, but I found that using this tool on each chapter of the book, I was able to see very quickly exactly what changes our copy editor had made, something I had been completely unable to do before.

The script is a very short Python program (requiring Python 2.7, or an older Python with a modern version of ElementTree), and it is available either direct from Fluidinfo at:

or from Github.

Usage is simple. The command line has the form

python xmltextnorm.py [infile.xml [outfile.txt]]

which will cause xmltextnorm to write the normalized text from infile.xml to outfile.txt.

If you don’t specify an outfile, it will use the same path as infile, changing the extension to .txt. If you don’t specify either, it will read from stdin and write to stdout, just a like well-behaved Unix utility.

There’s actually nothing DocBook-specific about it, and I suspect it will be just as useful for looking at textual changes to HTML (as long it is well-formed XML) or similar.

It’s MIT licensed.

Entities

One point worth noting is that ElementTree doesn’t like non-standard XML entities (reasonably enough). So there’s a dictionary called ENTITIES near the start of the code that allows you to specify any non-standard entities used in your XML input, and something to translate them to. (It doesn’t really matter what you translate them too.) I’ve included — and … since they occur in our book, but you can add others if you need to do so.

25 January 2012

Fish Fixes and setuptools

I have a few pieces of good news. And one piece of not-so-good news.

  1. I’ve pushed Fish 4.33 to Github. This version includes two main bug fixes and some useful reorganization. The bug fixes are:

    • One test was failing under Python 2.7 (but not 2.6 or 2.5) That turns out to be because I was lazily doing some json encoding “by hand” and failing to percent-encode non-ASCII characters. httplib in the earlier versions didn’t seem to mind, but in Python 2.7 it does. I am now using a proper json serializer and things are better.

    • It transpires that starting Fish interactively was failing for new users. This is because the interactive version of Fish (the one you get if you just type fish <return>) was doing a sync, and that sync assumes that you have a tag called .fish/alias. New users tend not to have that. Obviously, the solution is to create it, which Fish now does. (It makes it private, too. You can change it to public if you prefer.

      fish perms public .fish/alias

      will do the job.)

    My thanks to Rodrigo Barnes (@rodrigobarnes) for noticing and pointing out both problems.

  2. I have switched the default/preferred HTTP library used by Fish from httplib2, which had always caused me problems, to Kenneth Reitz’s requests. I don’t know whether httplib2 is just broken or whether it’s a case of user error, but I have never managed to persuade it to work reliably. It took under 15 minutes to swap it out and replace it with requests, and the process fixed all the problems I knew about with httplib2, so that was pretty fantastic.

    I have talked to various people about requests and the most common response has been “It’s awesome”. Do not misunderstand me when I say, I disagree. My experience merely that requests is well-designed, functional, pythonic and easy-to-use. In those respects it is like most Python libraries. I suspect that the main reason requests generates reactions of awe is that all the other main Python libraries for handling HTTP requests are sub-par [1]. As Kenneth Reitz himself says (specifically of urllib2):

    Things shouldn’t be this way. Not in Python.

    Requests is excellent and I strongly recommend you use it, with one caveat:

    Version 0.9.2, which I used to make the change, works perfectly. Use that.

    I have not yet been able to make version 1.0.1 work with Fish. It is entirely possible that is because I have done something stupid. But at this point, using 1.0.1 will not move you forward.

    The way I have upgraded Fish is first to try to import requests and then to check that the version is less that 1.0.0. If requests is unavailable, or you have 1.0.0 or newer, Fish falls back (silently) to httplib2. To be fair, that mostly works; but requests works better.

    [“What does that mean?”, you ask. Well, I have two problems with httplib2. First, it consistently fails with request bodies over slightly less than 64KB. This is a problem with some file uploads to Fluidinfo. Secondly, when performing bulk uploads, particularly if I use multiple threads, I get occasional failures with httplib2. Retrying sometimes, but not always, works. (This was a major pain uploading, for example, the 2.5 million items in the British National Bibliography. The upload spent about 50% of the time uploading the first 98%, and the other 50% retrying the troublesome 2%. I have yet to see any random failures with requests.]

  3. I finally got round to packaging up Fish with setuptools. So if you download it now, you should be able to do a standard

    python setup.py install

    in the package directory and standard, good things will happen. These include installing the fish script, so if you used an alias or specially added it to your path before, this is no longer necessary.

    I also cleaned up the import structure a bit to make this work.

So much for the good news.

The bad news is that I can’t upload it to PyPI, which had been my intention, because there is already a package called fish on PyPI.

Obviously, I could rename Fish (for a second time), but that seems undesirable when it is about to appear in print in the form of an O’Reilly book (blog post upcoming). So I guess we’ll have to do things the old way.

Needless to say, in the process of all these lovely upgrades, it is possible I broke something. Please let me know if I did and I will endeavour to fix it.

[1]Funny term, “subpar”. Every golfer aspires to shoot under par: in golf, less is definitely more. But I didn’t feel using the term “super-par” would really have conveyed my meaning.

22 January 2012

Tags, The Tagosphere and Every Thing: Towards an rcp for Fish

In the first of this series of articles we discussed the analogy between the Unix File System and Fluidinfo. The Fluidinfo Shell, Fish, is largely a working through of that analogy. A new insight from that article, upon which we will now expand, is that the objects in Fluidinfo can usefully be viewed as analogues of host computers in a computer network.

In the second article, we discussed how to augment Fish with Unix-like copy (cp) and move (mv) commands for Fluidinfo’s abstract tags and namespaces.

In this third, and—with luck—final part of the trilogy, we will tackle the more complex question of how to copy and move data within and among objects and concrete tags in Fluidinfo. Just as we leant on the behaviour of Unix’s cp and mv commands when considering similar commands for abstract tags in Fluidinfo, we will look to rcp, rsync and ftp as sources of wisdom, guidance and precedent as we try to design similar commands for concrete tags in Fluidinfo.

Recapitulation

In the previous article in this series, we concluded that we could sensibly model cp and mv commands for namespaces and abstract tags in Fluidinfo on their Unix counterparts provided we resolve three issues:

  1. How should we handle ambiguity, given that a tag and a namespace can share the same name (and path) in Fluidinfo?
  2. How should we handle permissions on copied and moved tags, given the rather different way permissions work in Fluidinfo and Unix?
  3. How destructive should overwriting behaviour be? If we copy a (whole, abstract) tag “on top” of another, should all old values of that tag be destroyed, or should the result be a union in which the new values take precedence, but the old values remain on objects that were not tagged with the “source” tag?

Our candidate solutions are (respectively):

  1. Where paths are ambiguous for either source or destination items, we will demand disambiguation by adding a trailing slash (/) to denote a namespace or a trailing dot (.) to denote a tag.
  2. We propose a reverse Facebook principle, whereby whenever the degree of access granted to data unclear, we should grant the smallest set of rights that the user might reasonably expect, the logic being that while overly restrictive permissions can easily be relaxed, unintentional release of private information is potentially irreversible and more harmful.
  3. We didn’t really resolve overwrite behaviour beyond a meal-mouthed suggestion that we might need to provide options. While we should do that, this doesn’t answer the question, because unless we force the user to make an explicit choice on a case-by-case basis, we will have to decide on the default behaviour, and that will be the main behaviour people will experience (unless we make such a poor choice that it is normally overridden). I am currently leaning toward the less descructive behaviour as the default.

The Jobs to be Done

Thus far, we have considered moving and copying whole tags (abstract tags together with all their concrete instances). But at least as important, and probably more common in practice, will be the need to move or copy just some concrete tags. Concentrating first on the movement and copying within Fluidinfo (as opposed to transfers between Fluidinfo and the local file system), some examples of things Alice might wish to do include:

  • Copy one or more tag(s) from on object to another. For example:

    • Copy Alice’s description tag from paris to city:paris.
    • Move Alice’s description tag from paris to city:paris.
    • Move Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).
  • Rename or duplicate tags on one an object or a set of objects. For example:

    • Rename Alice’s rating tag on city:paris as star-rating
    • Copy Alice’s rating tag on city:paris to star-rating
    • Move Alice’s rating tag on city:paris to private/rating
    • Perform any of the above operations on all objects Alice has tagged with alice/personal.
  • Systematically move tags from objects using on about tag convention to those using another. For example:

    • Move all Alice’s rating tags on objects she has tagged with alice/city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).

Alice might also be interested in exchanging information between her local file system and Fluidinfo. (In this case, we would probably support only copying, not movement, in much the same way as there is no rmv command, and indeed even drag on drop usually copies rather than moves when source and destination are different volumes or hosts.) Examples of upload and download to and from Fluidinfo might include

  • Copy a local set of files to an object.

    • Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.
    • The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the same nominated object
  • Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files—the precise inverse of the cases above.

  • Download tags from a number of objects in Fluidinfo to different parts of the local file system.

    • Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.
  • Again, conversely, Alice might wish to upload a directory structure, taking one or more of the upper levels of the directory structure either as object specifiers or tags in some systematic way.

It is unlikely that the first version of any rcp analogue in Fish will support all of these fully, but it is useful to think about what our aspirations might be.

Still another case might be to map between the contents of a local file (e.g. a CSV file) and a set of objects in Fluidinfo. However, I currently regard that as a separate kind of task, and not really suitable for analogues of cp and mv.

Lessons from rcp, rsync, scp etc.

I have argued that Fluidinfo objects (which represent things) can usefully be viewed as analogues of hosts (computers) with Unix File systems. That immediately suggests that we might usefully look to rcp and its variants (rsync, scp and possibly ftp) for inspiration.

The basic addressing scheme in rcp simply identifies a remote file as

host:path/to/file.ext

Fr example, to copy a file drink.me from a host wonderland, Alice can say, for example:

rcp wonderland:drink.me .

In fact, in normal use, rcp can replace cp, solong as your file names don’t contain colons. In the case of Fluidinfo, we won’t introduce a different command at this stage but will simply extend cp (though later we will propose adding an rcp for different purposes).

Remembering that we are mapping hosts to Fluidinfo objects, which are usually identified by their about tags, this suggests we might naïvely consider a directly similar rcp-like cp command for Fluidinfo. The first use case we suggested for Alice was copying her description from the object for paris to the object for city:paris. So that might be:

cp paris:description city:paris:description

Unfortunately, in this particular case the “solution” doesn’t work well because the target about tag includes a colon. Obviously, in principle, we could escape the colon in some way. If we did the ‘obvious’ thing of escaping with a backslash, and were typing into an interactive Fish shell (rather than from the command line), that would mean we would need to type:

cp paris:description city\:paris:description

which is already tedious. However, if we were working from a Unix shell, which itself uses backslash for escaping, we would need to use either

cp paris:description city\\:paris:description

or perhaps

cp paris:description 'city\:paris':description

This is OK in principle, but would get very tiresome (and be rather error-prone) in practice, particularly since colons are widely used in about tags.

Another option we might consider is to use Fluidinfo’s full path convention. Recall that the value for alice/description on paris is available at

http://fluiddb.fluidinfo.com/about/paris/alice/description

So it might seem promising to base things on that. Our cp command would then look something like:

cp /about/paris/alice/description /about/city:paris/alice/description

Unfortunately, that doesn’t look too promising either. First, Fish already uses a leading slash to introduce absolute paths to other users’ tags (allowing Alice can omit alice/ from her own tags, a significant convenience, that it would be awkward to retain using this scheme).

Perhaps worse, however, is that one of the most common kinds of about tags is a URL. That would cause real pain getting an about tag through the shell with the slashes protected as part of a path such as above. Imagine having to say:

cp /about/http:\\/\\/example.com\\/foo\\/bar/description /about/http:\\/\\/example.com\\/foo\\/bas

This is not really a problem for the API, where the about tag typically gets passed into some function separately and escaped automatically by the machine, but it would be a significant inconvenience for a Fish user.

An alternative that might work better is to swap the host and the path. A colon still wouldn’t make a good separator since colon is a legal character in a tag name, and it would be confusing to reverse rcp‘s convention anyway. We could, however, use @, which would perhaps seem more natural in that order. So Alice might say:

cp description@paris description@city:paris

I think I remember using that as an alternative form with some version of rcp, but I can’t see it in the documentation for any version I’m currently using, so I may be mistaken. Nevertheless, the form does feel reasonably natural from both @‘s use in email addresses and its more general use to mean “at”. Although the @ is used in reasonably common about tags (particularly Twitter names), it doesn’t really cause a problem, since it is not a valid character in a tag name. It would be fine to say:

cp chess@@RedQueen chess@@WhiteQueen

since the second @, in each case, it clearly part of the object identifier. This feels like the first contender.

Another possibility we might mention is specifying objects using -a, -i or -q, as is possible with other Fish commands such as tag. The issue with that is that it only works well if all the tags are on the same object. We could have either repeated flags or different names flags for source and destination, but if we want to allow Unix-like

mv src1 src2 ... srcN dest

that will be less good. But it might be useful for some cases, particularly when the source and destination objects are the same.

Let’s try some other examples from our shopping list, keeping in mind the possibility of using -a, -q etc. as well as the @ convention:

  • Move Alice’s description tag from paris to city:paris.

    cp description@paris description@city:paris

    or potentially:

    cp description@paris @city:paris
  • Move Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).

    cp rating@paris private/rating@city:paris
  • Rename Alice’s rating tag on city:paris as star-rating

    mv rating@city:paris star-rating@city:paris
    mv -a city:paris rating star-rating
    mv -q 'fluiddb/about matches "city:paris"' rating star-rating

    We could also allow a trailing at to mean ‘the same as last time’ on the second or subsequence use, so that

    mv rating@city:paris star-rating@

    might be acceptable, or even just

    mv rating@city:paris star-rating
  • Copy Alice’s rating tag on city:paris to star-rating

    cp rating@city:paris star-rating@city:paris
    cp rating@city:paris star-rating@
    cp rating@city:paris star-rating
  • Move Alice’s rating tag on city:paris to private/rating

    mv rating@city:paris private/rating@city:paris
    mv rating@city:paris private/rating@
    mv rating@city:paris private/rating
    mv -a 'city:paris' rating private/rating
  • Perform any of the above operations on all objects alice has tagged with alice/personal. For example, let’s take the moving of rating to private/star-rating

    mv -q 'has alice/personal' rating private/star-rating
  • We could also do the same thing with namespaces, though I didn’t list that as an example. But it seem fairly clear that

    cp -q 'has alice/personal' private secret

    or perhaps

    cp -q 'has alice/personal' -r private secret

    would mean copy all the tags under personal to a new secret namespace (if it doesn’t exist) or to under secret if it does. [The -r would mean recurse, as with cp, but whether it should be required I’m not sure.]

  • Move all Alice’s rating tags on objects she has tagged with city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).

    This one is completely different and is probably something we need to admit defeat on for now. Though it is not the same, it reminds me of a Unix command I’ve always thought was missing, which is a pattern-based bulk rename (and bulk copy). I actually have to allow systematic renaming of files using commands like:

    bmv foo@.txt bar@.txt

    which in which the @ is a wildcard and on the right has the same value as that on the left (like a tagged regular expression). So if a directory had food.txt and fool.txt, this would match rename food.txt as bard.txt, and fool.txt as barl.txt. Systematic construction of about tags could follow a similar pattern, but useful though it would be, is probably out of scope here.

    I can, however, imagine allowing a more manual version. For example, we might allow braces and a syntax like:

    cp rating@{foo,bar,baz} star-rating@{Foo,Bar,Baz}

    to move and rename tags from things like rating on foo to star-rating on Foo etc.

Although perhaps not pretty, this @ syntax, combined with -a, -i and -q flags (for when the same object is to be used both sides of the cp or mv) seems to get us a long way. Adding in paired lists in the {a,b,c} form would go a step further.

Upload and Download

Ironically, although we modelled our extended cp on rcp, we have concluded that we don’t really need the “r” but can just use cp for all copying and movement within Fluidinfo. Curiously, this leaves open the possibility of using rcp to perform upload and download. The idea would be that paths without an @ in the rcp command would be taken as paths on the local file system while paths that include an @ refer to tags on an object in Fluidinfo.

In case it’s not clear, the key point is that since Fluidinfo tags can have arbitrary MIME types, they can be used to store files: the analogy we have been pursuing is not merely structural. For example, the documentation for Fish is stored in Fluidinfo, as a collection of tags on the object with the about tag fish, with its index.html file at

The be completely clear, this is not a URL pointing to a static HTML file: this is the fish user’s tag index.html on the object with the about tag fish, being retrieved directly from the Fluidinfo database.

A partial visualization of the tags on that object is shown below.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at103-tree.gif

The point of commands for uploading to Fluidinfo is to make it easy to publish a tree of files such as this.

Let’s see how the example use cases for upload and download might work in this case. I am going to assume that we have no difficult cases with ambiguous paths in Fluidinfo, and that if we were to we would either require disambiguation or the commands would fail. (This would obviously be an issue only on download.)

  • Copy a local set of files to an object.
    • Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.

      rcp blog blog@"Alice's Blog"
    • The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the nominated object

      rcp ~alice/blogs/drinkingblog/html blog@"Alice's blog"
  • Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files, the precise inverse of the cases above.

    rcp blog@"Alice's Blog" blog
    rcp blog@"Alice's blog" ~alice/blogs/drinkingblog/html
  • Download tags from a number of objects in Fluidinfo to different parts of the local file system.

    • Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.

      This is a more interesting and difficult case, and again is fundamentally about constructing paths semi-programmatically. Although it is not too hard to think of ways it might be done (perhaps with regular expressions), we might be moving beyond what it’s reasonabe to expect Fish to do. That goes for the final example too.

These examples suggest that modelling a Fish rcp command on a modified Unix-like rcp could work well. It’s almost as if we are considering our host machine as part of Fluidinfo, with its file system representing a concrete tag hierarchy on an anonymous local object. There are certainly other issues to consider, including MIME types and files for potential omission (dot files? backup files ending in ~, symbolic links etc.), but we seem to have a promising way forward.

I was going to consider ftp as an alternative source of inspiration, but its model seems more cumbersome in general, and better suited to situations where you can considering only one host, whereas in Fluidinfo it is normal to consider many objects, so I think we can probably forget ftp.

Summary

Perhaps surprisingly, we have largely got there.

Over the last three articles, we have probed and extended the analogy between Fluidinfo and the Unix file system and come up with plausible syntaxes for building mv and cp commands that give us pretty powerful ways of moving and copying data within Fluidinfo, drawing on ideas from cp, rcp, mv and rsync as well as the current Fish.

Despite borrowing from rcp in designing Fish’s cp command, we had left rcp itself free and can now use it as the basis for moving data between a local file system and Fluidinfo, using almost identical conventions save for the interpretation of a path that does not contain an @ (“the same as object as before” for cp, and “the local file system” for rcp).

There are definitely holes to fill, but it feels like there is a reasonable outline spec. If people see problems, or have better ideas or comments, do let me know.

Otherwise, all the remains is the trivial task of implementation. How hard could that be?

19 January 2012

Movement and Copying in Fluidinfo

In the previous article, we discussed the analogy between the Unix File System and Fluidinfo’s tag hierarchy. This analogy forms the basis and inspiration for the Fluidinfo Shell, Fish. But a file system without move and copy commands would be a sad and contemptible thing, and at the moment Fish, like Fluidinfo, is impoverished by the lack of such basic functionality as cp and mv. Here we will try to design such functionality, building on the analogy.

Copying and Moving

In Unix we can

  • copy files with the cp command
  • copy directories (and their contents) with cp -R
  • move files to a different location with mv
  • move directories (and their contents) to a different location with mv
  • rename files, also with the mv command
  • rename directories with mv
  • delete files with rm (“remove”)
  • delete empty directories with the rmdir command and delete directories together with their contents with the rm -r command.

In general, the functionality of mv is conceptually equivalent to copying and then removing an item.

We can also copy files and directories between different machines using the rcp and rsync commands, which are both similar to cp but understand a host: prefix. An alternative to these commands is the ftp command, which operates in a very different manner, and uses different mechanisms, to ultimately similar effect.

Fish, today, offers no commands for moving, renaming or copying tags or namespaces, but does provide an rm command that performs the combined functions of rm and rmdir (also requiring a -r and -f flags in some cases). It is worth noting, briefly, that in some sense Fish’s rm goes much further than rm on Unix, in that the command

fish rm rating

removes not only the abstract rating tag, but every occurrence of that tag in Fluidinfo, potentially on millions of objects. This is why Fish requires the -f flag to force the removal of a tag that is in use. In the prevous article, we argued that Fluidinfo’s objects play the natural analogues of computers in a network. From that perspective, if we think of rcp as a more powerful version remote version of cp, Fish’s rm command is more like a remote rm (presumably rrm) that allows you to remove all files with a given path on all hosts simultaneously. It’s as if you could say something like:

rrm -f *:~/.bashrc

to remove the .bashrc in your home directory on every machine on which you have an account. Indeed, if Linus Torvalds were not merely Linux’s creator but the superuser on all copies of the OS, with such a command he could remove everything on all Linux hosts with

rrm -rf *:/

Let’s think about the Unix commands cp and mv, and their possible generalizations to the realm of Fluidinfo. Recall that, when we want to be precise, we need to distinguish between two different senses of the word “tag”. Ordinarily when we attach a tag, possibly with a value, to an object, we create what we might variously call a “tag instance” or a “concrete tag”. Fluidinfo, however, maintains a user’s tag hierarchy independent of whether tags are actually in use. When discussing tags in this sense, independent of objects, I call them abstract or platonic tags. These are quite real and can persist even when the tag is not in use. The diagram below shows the abstract tag hierarchy for Alice, on the right, and her file system, on the left. Note carefully that in her private namespace Alice has both a tag and a namespace called moments.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-tags.png

In what follows, assume Alice’s current working directory on Unix is her home directory /home/alice, and for concreteness, assume that her shell is the Bourne Again Shell, Bash.

Copying a file or a tag

On Unix, Alice can copy her has-drunk file into her private directory by saying any of the following:

cp has-drunk private/has-drunk
cp has-drunk private/
cp has-drunk private

Obvious though it is, let us spell out what this means. After running one of these commands, Alice will have two files, /home/alice/has-drunk and /home/alice/private/has-drunk, where previously she had only one, and each will contain a separate copy of the same data.

We could plausibly adopt any or all of these in Fish to copy Alice’s has-drunk tag to her private namespace. But what would that do? I think the most obvious action would be to create a new abstract tag called alice/private/has-drunk and then tag all the objects currently tagged with alice/private/has-drunk with the new alice/private/has-drunk tag, copying their values if any. We would need to consider quite carefully how to handle permissions when performing such copying. The case is rather different from Unix, because in Unix permissions are hierarchical in the sense that a file with public read permission in an fully private directory cannot be read. This is an important detail. On Unix, let’s assume that Alice’s private/things file has open read permissions (644):

alice$ ls -l private/things
-rw-r--r--  1 alice  staff   0 17 Jan 18:57 things

and that she then locks her private directory so that only even she can look at it.

alice$ chmod 700 private

If Bert now trys to look at alice/private/things he will find that he cannot:

bert$ cat ~alice/private/things
cat: /home/alice/private/things: Permission denied

In particular, on Unix this means that if Alice moves a non-private file to a private directory (by which I mean, one with neither read nor execute permission) it becomes unreadable.

In Fluidinfo, the permissions hierarchy is consulted only when new tags and namespaces are created. So if Alice creates a new tag in her private namespace, it will default to being private; if we copy the permissions of a tag when copying the tag, its permissions will be unaltered, and potentially different from if we created the tag afresh in the new location.

The correct behaviour is not clear, and either way there is potential for surprising the user in unpleasant ways, most obviously by making public data that the user intended to be private. We have seen above how by copying permissions with tags we could violate a (reasonable) assumption that moving a tag into a private namespace would make it private. If we fail, however, to copy permissions, copying a private tag to a non-private namespace would result in a non-private tag, which might also be a nasty surprise.

My first inclination here is to do a “reverse Facebook” by, when in doubt, setting the permissions on the destination to the more restrictive of the two possibilities, on the assumption that revealing data that Alice wanted to keep private is both worse and less correctable than making data more private than intended, given the inability to make people unsee (or even, uncopy) things. Needless to say, we could also have options to allow the user to choose what behaviour she wants.

Q1. How should permissions behave when tags and namespaces are copied or moved? Should we go for:

  1. The permission of the destination is copied from the source?
  2. We follow mv = rm + cp + rm and create the new tag or namespace in the new location according to default rules?
  3. Maximum privacy: apply the more restrictive of the permissions suggested by a. and b. (or, if necessary, their most restrictive intersection).

Moving or renaming a file or a tag

Going back to Unix, Alice can move her has-drunk file from her home directory to her private directory with any of these commands:

mv has-drunk private/has-drunk
mv has-drunk private/
mv has-drunk private

Again, we could plausibly adopt all of these forms in Fish to move Alice’s has-drunk name to her private namespace. In this case, there seems no real issue about what should happen. We can’t sensibly “move” the abstract tag but not the concrete ones. Using our rule of thumb that

move = copy to new location + remove the original

or

mv src dest = cp src dest; rm src

this reinforces the case for making cp copy all the concrete tags as well as the abstract tag.

Renaming really raises no extra issues: just as in Unix Alice can rename here has-drunk tag to drunk with a simple

mv has-drunk drunk

she should be able to rename her has-drunk abstract tag as drunk with the same command in Fish, and in the process rename all its concrete instances.

Copying and Moving and Renaming Directories

What about copying a directory in Unix? We use cp for that, but now we need to use -R to force the directory and all its contents to be copied recursively: without this -R, we can’t even copy and empty directory such as things:

$ cp things thangs
cp: things is a directory (not copied).

But with -R, we can copy a directory hierarchy as easily as a file. Let’s suppose Alice wants a duplicate of her private directory in her things directory. She can use any of

cp -R private things
cp -R private things/
cp -R private things/private
cp -R private things/private/

and the result will be a duplicate:

$ ls -RF things
private/

things/private:
moments/      things          thoughts/

things/private/moments:

things/private/thoughts:

Move works essentially the same way and needs no example. Again, so far there seems to be no reason why we shouldn’t build analogous functionality in Fish for copying and moving namespaces and their contents. We would certainly allow the -R flag, but might not require it, and would certainly allow -r to be used as a synonym. As with copying simple files, and following our mv = cp + rm dictum, concrete tags in the hierarchy would be copied, together with their values, on all objects to which they are attached.

Clobbering

Now consider the following commands on Unix, in the context of the same directory structure shown in the original figure:

cp has-drunk private/things
mv has-drunk private/things

The destination, private/things is a file that already exists: it will be clobbered (overwritten) by both cp and mv. The same would be true if Alice copied her private/moments/things file to her private directory with any of

cp private/moments/things private
cp private/moments/things private/
cp private/moments/things private/things

or their mv counterparts. So in Unix, the rule is

When the destination specified is a directory, move or copy the source into that directory. If there was already a file with that name in the directory, delete it first.

When the destination specified is a file, first remove that file if it exists, then copy or move the source to that destination.

Except that this isn’t quite true: you can’t clobber a file with a directory. So

$ cp -R things private/things
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory

(the four failures being as each of the entities in private fails to be be copied), and

$ mv things private/things
mv: rename things to private/things: Not a directory

Why can’t a hulking great directory clobber a puny file? I don’t know. Unix has many wonderful attributes, but consistency is not foremost among them. To my surprise, even adding -f cannot persuade the system to do it. Whether Fish should copy this apparently anomalous behaviour is not completely clear to me: logic suggests not, but fidelity to Unix conventions suggests maybe so. The point may be moot anyway, as there’s a good chance I will require a -f to clobber even a tag, just as I do with rm, if it is in use. This is because whereas on Unix, clobbering a single file removes a single entity, however big. In Fluidinfo, a single abstract tag could have a million instances or more, and I feel requiring a -f flag to encourage the user to confirm her intent before engaging in such (potentially) wide-spread destruction is not unreasonable.

These minor exceptions notwithstanding, the way files get clobbered suggests that we might extend our recipe to include the rule:

If dest is a file:
   mv src dest = rm -f dest; cp -R src dest; rm -r src

Does that make sense for tags in Fish?

This, I think, is an interesting question. We could certainly make Fish remove all the abstract destination tag and all its concrete tags before moving or copying another tag. But it also seems reasonable to consider the possibility of replacing those concrete tags present in the source, but not those absent in the source.

To make this real: suppose Alice says (in Fish)

$ cp has-drunk moments

and at the time she does the state of her has-drunk and drunk tags is as follows:

$ fish show -q 'has alice/has-drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object ec430756-e110-4bc4-b882-544afda1cce8:
  /fluiddb/about = "drink me"

$ fish show -q 'has alice/drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object 49126b6d-18bd-457f-af55-a251cf400fc9:
  /fluiddb/about = "drink me not"

or, diagramatically:

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-has-drunk.png

It seems clear that the value of the has-drunk tag on "drink me (not poison)" (no value) should be replaced with the value of the drunk tag (true), and that a new has-drunk tag should be placed on "drink me not", also with the value true. It is less clear, however, that the has-drunk tag on "drink me" needs to be deleted. We will be moving on to discuss selective copying and moving later anyway, but we have certainly formed a question:

Q2. If a tag is clobbered by a mv or cp command, should all of its instances be clobbered, or only those necessary to make way for the tag values from the source?

Shared Paths

Where things get more interesting is when the source or destination is ambiguous, because it specifies both a tag and a namespace. This can’t occur in Unix, because each path resolves unambiguously to either a file or a namespace. Let’s think through the cases with the aid of our diagram of Alice’s tag structure above, which I’ll repeat for easier reference (sod the cost!)

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-alicetags.png

What would Alice reasonably expect to happen if she issued the following commands?

mv private/moments private/thoughts
mv moments private
mv moments private/moments
mv private/moments /alice

There is ambiguity everywhere. Both the source and the destination can be ambiguous, so we have to consider all of:

  • unambiguous source, unambiguous destination
  • unambiguous source, ambiguous destination
  • ambiguous source, unambiguous destination
  • ambiguous source, ambiguous destination

Only the first of these is straightforward.

In the case of the existing ls and rm commands in Fish, I have taken that view that an ambiguous path refers to both the possible targets, but while this seems unobjectionable in the case of ls, it clearly leads to the possibility of removing more than the user intended. I plan to reconsider that in the light of these ruminations on cp and mv.

It seems to me that probably a better way forward than taking an ambiguous specification as referring to both its targets is to demand disambiguation. The question is: how would that be achieved?

I have made a point, in the examples above, of listing subtly different alternative forms for some commands, e.g.

mv has-drunk private
mv has-drunk private/

Over recent years (particularly with the rise of tab completion in shells) it has become increasing common to allow directories to be specified including a trailing slash, and no harm derives from this practice. The question is we can we exploit this tend this as a way of disambiguating between tags and namespaces.

In Unix shells and commands, in most cases, the inclusion or omission of the slash makes no difference to behaviour, though I am aware of at least one case where this is not so—rsync.

To quote from the man page for rsync (on Mac OS 10.6.8):

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the contain- ing directory on the destination. In other words, each of the follow- ing commands copies the files in the same way, including their setting of the attributes of /dest/foo:

I personally find this behaviour bizarre and it always make me slightly nervous, using rsync, which is otherwise superior in every way to rcp. However, the idea of using a trailing slash to specify the namespace (cf. directory) rather than the tag is different: it is not changing the behaviour of the command according to which of two unanambiguous specifications of a directory (namespace) is used, but rather using the slash to disambiguate; this seems less objectionable.

The question then would be, how would unambiguously specify the tag? It would seem very ill advised to require that namespaces should always be specified with a trailing slash, so we cannot sensibly say that a path not ending a slash will be taken to be a tag: that way madness lies.

My temptation is to use a trailing dot. I like this as a solution partly because I can recall no case, in nearly thirty years of Unix use, of ever meeting a file (other than the directories . and ..) whose name ended in a dot. I also feel that, while files do not always have extensions, and directory names may include them, by convention most filenames do contain a period and most directory names do not. Admittedly, this is not true of tags, but for me, at least, some association between dots-in-tag-names and “file-ness” survives through the analogy on which Fish is built. If we adopt this idea, our problems all but disappear. We can imagine:

$ mv private/moments private/thoughts
Error: private/moments is ambiguous; use private/moments. or private/moments/

$ mv private/moments. private/thoughts
# moves the tag private/moments to the tag private/thoughts/moments

$ mv private/moments/ private/thoughts
# moves the namespace private/moments to the namespace private/thoughts/moments

$ mv private/moments/ private/moments. private/thoughts
# moves private/moments. to private/thoughts/moments.
# and private/moments/ to private/thoughts/moments/

If this were adopted in Fish, I think there would be an overwhelming case for making rm work the same way; the current behaviour of ls might stay the same, as it is not descructive, or might change in the interest of slavish consistency.

That concludes our discussion of cp and mv for abstract tags in Fish. In the third and perhaps final part of this “trilogy”, we will discuss moving and copying tags and namespaces in the context of the object hierarchy, i.e., how we might copy or move tags from one object to another, or within or among objects.

Labels