25 January 2012

Fish Fixes and setuptools

I have a few pieces of good news. And one piece of not-so-good news.

  1. I’ve pushed Fish 4.33 to Github. This version includes two main bug fixes and some useful reorganization. The bug fixes are:

    • One test was failing under Python 2.7 (but not 2.6 or 2.5) That turns out to be because I was lazily doing some json encoding “by hand” and failing to percent-encode non-ASCII characters. httplib in the earlier versions didn’t seem to mind, but in Python 2.7 it does. I am now using a proper json serializer and things are better.

    • It transpires that starting Fish interactively was failing for new users. This is because the interactive version of Fish (the one you get if you just type fish <return>) was doing a sync, and that sync assumes that you have a tag called .fish/alias. New users tend not to have that. Obviously, the solution is to create it, which Fish now does. (It makes it private, too. You can change it to public if you prefer.

      fish perms public .fish/alias

      will do the job.)

    My thanks to Rodrigo Barnes (@rodrigobarnes) for noticing and pointing out both problems.

  2. I have switched the default/preferred HTTP library used by Fish from httplib2, which had always caused me problems, to Kenneth Reitz’s requests. I don’t know whether httplib2 is just broken or whether it’s a case of user error, but I have never managed to persuade it to work reliably. It took under 15 minutes to swap it out and replace it with requests, and the process fixed all the problems I knew about with httplib2, so that was pretty fantastic.

    I have talked to various people about requests and the most common response has been “It’s awesome”. Do not misunderstand me when I say, I disagree. My experience merely that requests is well-designed, functional, pythonic and easy-to-use. In those respects it is like most Python libraries. I suspect that the main reason requests generates reactions of awe is that all the other main Python libraries for handling HTTP requests are sub-par [1]. As Kenneth Reitz himself says (specifically of urllib2):

    Things shouldn’t be this way. Not in Python.

    Requests is excellent and I strongly recommend you use it, with one caveat:

    Version 0.9.2, which I used to make the change, works perfectly. Use that.

    I have not yet been able to make version 1.0.1 work with Fish. It is entirely possible that is because I have done something stupid. But at this point, using 1.0.1 will not move you forward.

    The way I have upgraded Fish is first to try to import requests and then to check that the version is less that 1.0.0. If requests is unavailable, or you have 1.0.0 or newer, Fish falls back (silently) to httplib2. To be fair, that mostly works; but requests works better.

    [“What does that mean?”, you ask. Well, I have two problems with httplib2. First, it consistently fails with request bodies over slightly less than 64KB. This is a problem with some file uploads to Fluidinfo. Secondly, when performing bulk uploads, particularly if I use multiple threads, I get occasional failures with httplib2. Retrying sometimes, but not always, works. (This was a major pain uploading, for example, the 2.5 million items in the British National Bibliography. The upload spent about 50% of the time uploading the first 98%, and the other 50% retrying the troublesome 2%. I have yet to see any random failures with requests.]

  3. I finally got round to packaging up Fish with setuptools. So if you download it now, you should be able to do a standard

    python setup.py install

    in the package directory and standard, good things will happen. These include installing the fish script, so if you used an alias or specially added it to your path before, this is no longer necessary.

    I also cleaned up the import structure a bit to make this work.

So much for the good news.

The bad news is that I can’t upload it to PyPI, which had been my intention, because there is already a package called fish on PyPI.

Obviously, I could rename Fish (for a second time), but that seems undesirable when it is about to appear in print in the form of an O’Reilly book (blog post upcoming). So I guess we’ll have to do things the old way.

Needless to say, in the process of all these lovely upgrades, it is possible I broke something. Please let me know if I did and I will endeavour to fix it.

[1]Funny term, “subpar”. Every golfer aspires to shoot under par: in golf, less is definitely more. But I didn’t feel using the term “super-par” would really have conveyed my meaning.

22 January 2012

Tags, The Tagosphere and Every Thing: Towards an rcp for Fish

In the first of this series of articles we discussed the analogy between the Unix File System and Fluidinfo. The Fluidinfo Shell, Fish, is largely a working through of that analogy. A new insight from that article, upon which we will now expand, is that the objects in Fluidinfo can usefully be viewed as analogues of host computers in a computer network.

In the second article, we discussed how to augment Fish with Unix-like copy (cp) and move (mv) commands for Fluidinfo’s abstract tags and namespaces.

In this third, and—with luck—final part of the trilogy, we will tackle the more complex question of how to copy and move data within and among objects and concrete tags in Fluidinfo. Just as we leant on the behaviour of Unix’s cp and mv commands when considering similar commands for abstract tags in Fluidinfo, we will look to rcp, rsync and ftp as sources of wisdom, guidance and precedent as we try to design similar commands for concrete tags in Fluidinfo.

Recapitulation

In the previous article in this series, we concluded that we could sensibly model cp and mv commands for namespaces and abstract tags in Fluidinfo on their Unix counterparts provided we resolve three issues:

  1. How should we handle ambiguity, given that a tag and a namespace can share the same name (and path) in Fluidinfo?
  2. How should we handle permissions on copied and moved tags, given the rather different way permissions work in Fluidinfo and Unix?
  3. How destructive should overwriting behaviour be? If we copy a (whole, abstract) tag “on top” of another, should all old values of that tag be destroyed, or should the result be a union in which the new values take precedence, but the old values remain on objects that were not tagged with the “source” tag?

Our candidate solutions are (respectively):

  1. Where paths are ambiguous for either source or destination items, we will demand disambiguation by adding a trailing slash (/) to denote a namespace or a trailing dot (.) to denote a tag.
  2. We propose a reverse Facebook principle, whereby whenever the degree of access granted to data unclear, we should grant the smallest set of rights that the user might reasonably expect, the logic being that while overly restrictive permissions can easily be relaxed, unintentional release of private information is potentially irreversible and more harmful.
  3. We didn’t really resolve overwrite behaviour beyond a meal-mouthed suggestion that we might need to provide options. While we should do that, this doesn’t answer the question, because unless we force the user to make an explicit choice on a case-by-case basis, we will have to decide on the default behaviour, and that will be the main behaviour people will experience (unless we make such a poor choice that it is normally overridden). I am currently leaning toward the less descructive behaviour as the default.

The Jobs to be Done

Thus far, we have considered moving and copying whole tags (abstract tags together with all their concrete instances). But at least as important, and probably more common in practice, will be the need to move or copy just some concrete tags. Concentrating first on the movement and copying within Fluidinfo (as opposed to transfers between Fluidinfo and the local file system), some examples of things Alice might wish to do include:

  • Copy one or more tag(s) from on object to another. For example:

    • Copy Alice’s description tag from paris to city:paris.
    • Move Alice’s description tag from paris to city:paris.
    • Move Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).
  • Rename or duplicate tags on one an object or a set of objects. For example:

    • Rename Alice’s rating tag on city:paris as star-rating
    • Copy Alice’s rating tag on city:paris to star-rating
    • Move Alice’s rating tag on city:paris to private/rating
    • Perform any of the above operations on all objects Alice has tagged with alice/personal.
  • Systematically move tags from objects using on about tag convention to those using another. For example:

    • Move all Alice’s rating tags on objects she has tagged with alice/city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).

Alice might also be interested in exchanging information between her local file system and Fluidinfo. (In this case, we would probably support only copying, not movement, in much the same way as there is no rmv command, and indeed even drag on drop usually copies rather than moves when source and destination are different volumes or hosts.) Examples of upload and download to and from Fluidinfo might include

  • Copy a local set of files to an object.

    • Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.
    • The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the same nominated object
  • Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files—the precise inverse of the cases above.

  • Download tags from a number of objects in Fluidinfo to different parts of the local file system.

    • Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.
  • Again, conversely, Alice might wish to upload a directory structure, taking one or more of the upper levels of the directory structure either as object specifiers or tags in some systematic way.

It is unlikely that the first version of any rcp analogue in Fish will support all of these fully, but it is useful to think about what our aspirations might be.

Still another case might be to map between the contents of a local file (e.g. a CSV file) and a set of objects in Fluidinfo. However, I currently regard that as a separate kind of task, and not really suitable for analogues of cp and mv.

Lessons from rcp, rsync, scp etc.

I have argued that Fluidinfo objects (which represent things) can usefully be viewed as analogues of hosts (computers) with Unix File systems. That immediately suggests that we might usefully look to rcp and its variants (rsync, scp and possibly ftp) for inspiration.

The basic addressing scheme in rcp simply identifies a remote file as

host:path/to/file.ext

Fr example, to copy a file drink.me from a host wonderland, Alice can say, for example:

rcp wonderland:drink.me .

In fact, in normal use, rcp can replace cp, solong as your file names don’t contain colons. In the case of Fluidinfo, we won’t introduce a different command at this stage but will simply extend cp (though later we will propose adding an rcp for different purposes).

Remembering that we are mapping hosts to Fluidinfo objects, which are usually identified by their about tags, this suggests we might naïvely consider a directly similar rcp-like cp command for Fluidinfo. The first use case we suggested for Alice was copying her description from the object for paris to the object for city:paris. So that might be:

cp paris:description city:paris:description

Unfortunately, in this particular case the “solution” doesn’t work well because the target about tag includes a colon. Obviously, in principle, we could escape the colon in some way. If we did the ‘obvious’ thing of escaping with a backslash, and were typing into an interactive Fish shell (rather than from the command line), that would mean we would need to type:

cp paris:description city\:paris:description

which is already tedious. However, if we were working from a Unix shell, which itself uses backslash for escaping, we would need to use either

cp paris:description city\\:paris:description

or perhaps

cp paris:description 'city\:paris':description

This is OK in principle, but would get very tiresome (and be rather error-prone) in practice, particularly since colons are widely used in about tags.

Another option we might consider is to use Fluidinfo’s full path convention. Recall that the value for alice/description on paris is available at

http://fluiddb.fluidinfo.com/about/paris/alice/description

So it might seem promising to base things on that. Our cp command would then look something like:

cp /about/paris/alice/description /about/city:paris/alice/description

Unfortunately, that doesn’t look too promising either. First, Fish already uses a leading slash to introduce absolute paths to other users’ tags (allowing Alice can omit alice/ from her own tags, a significant convenience, that it would be awkward to retain using this scheme).

Perhaps worse, however, is that one of the most common kinds of about tags is a URL. That would cause real pain getting an about tag through the shell with the slashes protected as part of a path such as above. Imagine having to say:

cp /about/http:\\/\\/example.com\\/foo\\/bar/description /about/http:\\/\\/example.com\\/foo\\/bas

This is not really a problem for the API, where the about tag typically gets passed into some function separately and escaped automatically by the machine, but it would be a significant inconvenience for a Fish user.

An alternative that might work better is to swap the host and the path. A colon still wouldn’t make a good separator since colon is a legal character in a tag name, and it would be confusing to reverse rcp‘s convention anyway. We could, however, use @, which would perhaps seem more natural in that order. So Alice might say:

cp description@paris description@city:paris

I think I remember using that as an alternative form with some version of rcp, but I can’t see it in the documentation for any version I’m currently using, so I may be mistaken. Nevertheless, the form does feel reasonably natural from both @‘s use in email addresses and its more general use to mean “at”. Although the @ is used in reasonably common about tags (particularly Twitter names), it doesn’t really cause a problem, since it is not a valid character in a tag name. It would be fine to say:

cp chess@@RedQueen chess@@WhiteQueen

since the second @, in each case, it clearly part of the object identifier. This feels like the first contender.

Another possibility we might mention is specifying objects using -a, -i or -q, as is possible with other Fish commands such as tag. The issue with that is that it only works well if all the tags are on the same object. We could have either repeated flags or different names flags for source and destination, but if we want to allow Unix-like

mv src1 src2 ... srcN dest

that will be less good. But it might be useful for some cases, particularly when the source and destination objects are the same.

Let’s try some other examples from our shopping list, keeping in mind the possibility of using -a, -q etc. as well as the @ convention:

  • Move Alice’s description tag from paris to city:paris.

    cp description@paris description@city:paris

    or potentially:

    cp description@paris @city:paris
  • Move Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).

    cp rating@paris private/rating@city:paris
  • Rename Alice’s rating tag on city:paris as star-rating

    mv rating@city:paris star-rating@city:paris
    mv -a city:paris rating star-rating
    mv -q 'fluiddb/about matches "city:paris"' rating star-rating

    We could also allow a trailing at to mean ‘the same as last time’ on the second or subsequence use, so that

    mv rating@city:paris star-rating@

    might be acceptable, or even just

    mv rating@city:paris star-rating
  • Copy Alice’s rating tag on city:paris to star-rating

    cp rating@city:paris star-rating@city:paris
    cp rating@city:paris star-rating@
    cp rating@city:paris star-rating
  • Move Alice’s rating tag on city:paris to private/rating

    mv rating@city:paris private/rating@city:paris
    mv rating@city:paris private/rating@
    mv rating@city:paris private/rating
    mv -a 'city:paris' rating private/rating
  • Perform any of the above operations on all objects alice has tagged with alice/personal. For example, let’s take the moving of rating to private/star-rating

    mv -q 'has alice/personal' rating private/star-rating
  • We could also do the same thing with namespaces, though I didn’t list that as an example. But it seem fairly clear that

    cp -q 'has alice/personal' private secret

    or perhaps

    cp -q 'has alice/personal' -r private secret

    would mean copy all the tags under personal to a new secret namespace (if it doesn’t exist) or to under secret if it does. [The -r would mean recurse, as with cp, but whether it should be required I’m not sure.]

  • Move all Alice’s rating tags on objects she has tagged with city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).

    This one is completely different and is probably something we need to admit defeat on for now. Though it is not the same, it reminds me of a Unix command I’ve always thought was missing, which is a pattern-based bulk rename (and bulk copy). I actually have to allow systematic renaming of files using commands like:

    bmv foo@.txt bar@.txt

    which in which the @ is a wildcard and on the right has the same value as that on the left (like a tagged regular expression). So if a directory had food.txt and fool.txt, this would match rename food.txt as bard.txt, and fool.txt as barl.txt. Systematic construction of about tags could follow a similar pattern, but useful though it would be, is probably out of scope here.

    I can, however, imagine allowing a more manual version. For example, we might allow braces and a syntax like:

    cp rating@{foo,bar,baz} star-rating@{Foo,Bar,Baz}

    to move and rename tags from things like rating on foo to star-rating on Foo etc.

Although perhaps not pretty, this @ syntax, combined with -a, -i and -q flags (for when the same object is to be used both sides of the cp or mv) seems to get us a long way. Adding in paired lists in the {a,b,c} form would go a step further.

Upload and Download

Ironically, although we modelled our extended cp on rcp, we have concluded that we don’t really need the “r” but can just use cp for all copying and movement within Fluidinfo. Curiously, this leaves open the possibility of using rcp to perform upload and download. The idea would be that paths without an @ in the rcp command would be taken as paths on the local file system while paths that include an @ refer to tags on an object in Fluidinfo.

In case it’s not clear, the key point is that since Fluidinfo tags can have arbitrary MIME types, they can be used to store files: the analogy we have been pursuing is not merely structural. For example, the documentation for Fish is stored in Fluidinfo, as a collection of tags on the object with the about tag fish, with its index.html file at

The be completely clear, this is not a URL pointing to a static HTML file: this is the fish user’s tag index.html on the object with the about tag fish, being retrieved directly from the Fluidinfo database.

A partial visualization of the tags on that object is shown below.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at103-tree.gif

The point of commands for uploading to Fluidinfo is to make it easy to publish a tree of files such as this.

Let’s see how the example use cases for upload and download might work in this case. I am going to assume that we have no difficult cases with ambiguous paths in Fluidinfo, and that if we were to we would either require disambiguation or the commands would fail. (This would obviously be an issue only on download.)

  • Copy a local set of files to an object.
    • Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.

      rcp blog blog@"Alice's Blog"
    • The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the nominated object

      rcp ~alice/blogs/drinkingblog/html blog@"Alice's blog"
  • Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files, the precise inverse of the cases above.

    rcp blog@"Alice's Blog" blog
    rcp blog@"Alice's blog" ~alice/blogs/drinkingblog/html
  • Download tags from a number of objects in Fluidinfo to different parts of the local file system.

    • Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.

      This is a more interesting and difficult case, and again is fundamentally about constructing paths semi-programmatically. Although it is not too hard to think of ways it might be done (perhaps with regular expressions), we might be moving beyond what it’s reasonabe to expect Fish to do. That goes for the final example too.

These examples suggest that modelling a Fish rcp command on a modified Unix-like rcp could work well. It’s almost as if we are considering our host machine as part of Fluidinfo, with its file system representing a concrete tag hierarchy on an anonymous local object. There are certainly other issues to consider, including MIME types and files for potential omission (dot files? backup files ending in ~, symbolic links etc.), but we seem to have a promising way forward.

I was going to consider ftp as an alternative source of inspiration, but its model seems more cumbersome in general, and better suited to situations where you can considering only one host, whereas in Fluidinfo it is normal to consider many objects, so I think we can probably forget ftp.

Summary

Perhaps surprisingly, we have largely got there.

Over the last three articles, we have probed and extended the analogy between Fluidinfo and the Unix file system and come up with plausible syntaxes for building mv and cp commands that give us pretty powerful ways of moving and copying data within Fluidinfo, drawing on ideas from cp, rcp, mv and rsync as well as the current Fish.

Despite borrowing from rcp in designing Fish’s cp command, we had left rcp itself free and can now use it as the basis for moving data between a local file system and Fluidinfo, using almost identical conventions save for the interpretation of a path that does not contain an @ (“the same as object as before” for cp, and “the local file system” for rcp).

There are definitely holes to fill, but it feels like there is a reasonable outline spec. If people see problems, or have better ideas or comments, do let me know.

Otherwise, all the remains is the trivial task of implementation. How hard could that be?

19 January 2012

Movement and Copying in Fluidinfo

In the previous article, we discussed the analogy between the Unix File System and Fluidinfo’s tag hierarchy. This analogy forms the basis and inspiration for the Fluidinfo Shell, Fish. But a file system without move and copy commands would be a sad and contemptible thing, and at the moment Fish, like Fluidinfo, is impoverished by the lack of such basic functionality as cp and mv. Here we will try to design such functionality, building on the analogy.

Copying and Moving

In Unix we can

  • copy files with the cp command
  • copy directories (and their contents) with cp -R
  • move files to a different location with mv
  • move directories (and their contents) to a different location with mv
  • rename files, also with the mv command
  • rename directories with mv
  • delete files with rm (“remove”)
  • delete empty directories with the rmdir command and delete directories together with their contents with the rm -r command.

In general, the functionality of mv is conceptually equivalent to copying and then removing an item.

We can also copy files and directories between different machines using the rcp and rsync commands, which are both similar to cp but understand a host: prefix. An alternative to these commands is the ftp command, which operates in a very different manner, and uses different mechanisms, to ultimately similar effect.

Fish, today, offers no commands for moving, renaming or copying tags or namespaces, but does provide an rm command that performs the combined functions of rm and rmdir (also requiring a -r and -f flags in some cases). It is worth noting, briefly, that in some sense Fish’s rm goes much further than rm on Unix, in that the command

fish rm rating

removes not only the abstract rating tag, but every occurrence of that tag in Fluidinfo, potentially on millions of objects. This is why Fish requires the -f flag to force the removal of a tag that is in use. In the prevous article, we argued that Fluidinfo’s objects play the natural analogues of computers in a network. From that perspective, if we think of rcp as a more powerful version remote version of cp, Fish’s rm command is more like a remote rm (presumably rrm) that allows you to remove all files with a given path on all hosts simultaneously. It’s as if you could say something like:

rrm -f *:~/.bashrc

to remove the .bashrc in your home directory on every machine on which you have an account. Indeed, if Linus Torvalds were not merely Linux’s creator but the superuser on all copies of the OS, with such a command he could remove everything on all Linux hosts with

rrm -rf *:/

Let’s think about the Unix commands cp and mv, and their possible generalizations to the realm of Fluidinfo. Recall that, when we want to be precise, we need to distinguish between two different senses of the word “tag”. Ordinarily when we attach a tag, possibly with a value, to an object, we create what we might variously call a “tag instance” or a “concrete tag”. Fluidinfo, however, maintains a user’s tag hierarchy independent of whether tags are actually in use. When discussing tags in this sense, independent of objects, I call them abstract or platonic tags. These are quite real and can persist even when the tag is not in use. The diagram below shows the abstract tag hierarchy for Alice, on the right, and her file system, on the left. Note carefully that in her private namespace Alice has both a tag and a namespace called moments.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-tags.png

In what follows, assume Alice’s current working directory on Unix is her home directory /home/alice, and for concreteness, assume that her shell is the Bourne Again Shell, Bash.

Copying a file or a tag

On Unix, Alice can copy her has-drunk file into her private directory by saying any of the following:

cp has-drunk private/has-drunk
cp has-drunk private/
cp has-drunk private

Obvious though it is, let us spell out what this means. After running one of these commands, Alice will have two files, /home/alice/has-drunk and /home/alice/private/has-drunk, where previously she had only one, and each will contain a separate copy of the same data.

We could plausibly adopt any or all of these in Fish to copy Alice’s has-drunk tag to her private namespace. But what would that do? I think the most obvious action would be to create a new abstract tag called alice/private/has-drunk and then tag all the objects currently tagged with alice/private/has-drunk with the new alice/private/has-drunk tag, copying their values if any. We would need to consider quite carefully how to handle permissions when performing such copying. The case is rather different from Unix, because in Unix permissions are hierarchical in the sense that a file with public read permission in an fully private directory cannot be read. This is an important detail. On Unix, let’s assume that Alice’s private/things file has open read permissions (644):

alice$ ls -l private/things
-rw-r--r--  1 alice  staff   0 17 Jan 18:57 things

and that she then locks her private directory so that only even she can look at it.

alice$ chmod 700 private

If Bert now trys to look at alice/private/things he will find that he cannot:

bert$ cat ~alice/private/things
cat: /home/alice/private/things: Permission denied

In particular, on Unix this means that if Alice moves a non-private file to a private directory (by which I mean, one with neither read nor execute permission) it becomes unreadable.

In Fluidinfo, the permissions hierarchy is consulted only when new tags and namespaces are created. So if Alice creates a new tag in her private namespace, it will default to being private; if we copy the permissions of a tag when copying the tag, its permissions will be unaltered, and potentially different from if we created the tag afresh in the new location.

The correct behaviour is not clear, and either way there is potential for surprising the user in unpleasant ways, most obviously by making public data that the user intended to be private. We have seen above how by copying permissions with tags we could violate a (reasonable) assumption that moving a tag into a private namespace would make it private. If we fail, however, to copy permissions, copying a private tag to a non-private namespace would result in a non-private tag, which might also be a nasty surprise.

My first inclination here is to do a “reverse Facebook” by, when in doubt, setting the permissions on the destination to the more restrictive of the two possibilities, on the assumption that revealing data that Alice wanted to keep private is both worse and less correctable than making data more private than intended, given the inability to make people unsee (or even, uncopy) things. Needless to say, we could also have options to allow the user to choose what behaviour she wants.

Q1. How should permissions behave when tags and namespaces are copied or moved? Should we go for:

  1. The permission of the destination is copied from the source?
  2. We follow mv = rm + cp + rm and create the new tag or namespace in the new location according to default rules?
  3. Maximum privacy: apply the more restrictive of the permissions suggested by a. and b. (or, if necessary, their most restrictive intersection).

Moving or renaming a file or a tag

Going back to Unix, Alice can move her has-drunk file from her home directory to her private directory with any of these commands:

mv has-drunk private/has-drunk
mv has-drunk private/
mv has-drunk private

Again, we could plausibly adopt all of these forms in Fish to move Alice’s has-drunk name to her private namespace. In this case, there seems no real issue about what should happen. We can’t sensibly “move” the abstract tag but not the concrete ones. Using our rule of thumb that

move = copy to new location + remove the original

or

mv src dest = cp src dest; rm src

this reinforces the case for making cp copy all the concrete tags as well as the abstract tag.

Renaming really raises no extra issues: just as in Unix Alice can rename here has-drunk tag to drunk with a simple

mv has-drunk drunk

she should be able to rename her has-drunk abstract tag as drunk with the same command in Fish, and in the process rename all its concrete instances.

Copying and Moving and Renaming Directories

What about copying a directory in Unix? We use cp for that, but now we need to use -R to force the directory and all its contents to be copied recursively: without this -R, we can’t even copy and empty directory such as things:

$ cp things thangs
cp: things is a directory (not copied).

But with -R, we can copy a directory hierarchy as easily as a file. Let’s suppose Alice wants a duplicate of her private directory in her things directory. She can use any of

cp -R private things
cp -R private things/
cp -R private things/private
cp -R private things/private/

and the result will be a duplicate:

$ ls -RF things
private/

things/private:
moments/      things          thoughts/

things/private/moments:

things/private/thoughts:

Move works essentially the same way and needs no example. Again, so far there seems to be no reason why we shouldn’t build analogous functionality in Fish for copying and moving namespaces and their contents. We would certainly allow the -R flag, but might not require it, and would certainly allow -r to be used as a synonym. As with copying simple files, and following our mv = cp + rm dictum, concrete tags in the hierarchy would be copied, together with their values, on all objects to which they are attached.

Clobbering

Now consider the following commands on Unix, in the context of the same directory structure shown in the original figure:

cp has-drunk private/things
mv has-drunk private/things

The destination, private/things is a file that already exists: it will be clobbered (overwritten) by both cp and mv. The same would be true if Alice copied her private/moments/things file to her private directory with any of

cp private/moments/things private
cp private/moments/things private/
cp private/moments/things private/things

or their mv counterparts. So in Unix, the rule is

When the destination specified is a directory, move or copy the source into that directory. If there was already a file with that name in the directory, delete it first.

When the destination specified is a file, first remove that file if it exists, then copy or move the source to that destination.

Except that this isn’t quite true: you can’t clobber a file with a directory. So

$ cp -R things private/things
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory

(the four failures being as each of the entities in private fails to be be copied), and

$ mv things private/things
mv: rename things to private/things: Not a directory

Why can’t a hulking great directory clobber a puny file? I don’t know. Unix has many wonderful attributes, but consistency is not foremost among them. To my surprise, even adding -f cannot persuade the system to do it. Whether Fish should copy this apparently anomalous behaviour is not completely clear to me: logic suggests not, but fidelity to Unix conventions suggests maybe so. The point may be moot anyway, as there’s a good chance I will require a -f to clobber even a tag, just as I do with rm, if it is in use. This is because whereas on Unix, clobbering a single file removes a single entity, however big. In Fluidinfo, a single abstract tag could have a million instances or more, and I feel requiring a -f flag to encourage the user to confirm her intent before engaging in such (potentially) wide-spread destruction is not unreasonable.

These minor exceptions notwithstanding, the way files get clobbered suggests that we might extend our recipe to include the rule:

If dest is a file:
   mv src dest = rm -f dest; cp -R src dest; rm -r src

Does that make sense for tags in Fish?

This, I think, is an interesting question. We could certainly make Fish remove all the abstract destination tag and all its concrete tags before moving or copying another tag. But it also seems reasonable to consider the possibility of replacing those concrete tags present in the source, but not those absent in the source.

To make this real: suppose Alice says (in Fish)

$ cp has-drunk moments

and at the time she does the state of her has-drunk and drunk tags is as follows:

$ fish show -q 'has alice/has-drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object ec430756-e110-4bc4-b882-544afda1cce8:
  /fluiddb/about = "drink me"

$ fish show -q 'has alice/drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object 49126b6d-18bd-457f-af55-a251cf400fc9:
  /fluiddb/about = "drink me not"

or, diagramatically:

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-has-drunk.png

It seems clear that the value of the has-drunk tag on "drink me (not poison)" (no value) should be replaced with the value of the drunk tag (true), and that a new has-drunk tag should be placed on "drink me not", also with the value true. It is less clear, however, that the has-drunk tag on "drink me" needs to be deleted. We will be moving on to discuss selective copying and moving later anyway, but we have certainly formed a question:

Q2. If a tag is clobbered by a mv or cp command, should all of its instances be clobbered, or only those necessary to make way for the tag values from the source?

Shared Paths

Where things get more interesting is when the source or destination is ambiguous, because it specifies both a tag and a namespace. This can’t occur in Unix, because each path resolves unambiguously to either a file or a namespace. Let’s think through the cases with the aid of our diagram of Alice’s tag structure above, which I’ll repeat for easier reference (sod the cost!)

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-alicetags.png

What would Alice reasonably expect to happen if she issued the following commands?

mv private/moments private/thoughts
mv moments private
mv moments private/moments
mv private/moments /alice

There is ambiguity everywhere. Both the source and the destination can be ambiguous, so we have to consider all of:

  • unambiguous source, unambiguous destination
  • unambiguous source, ambiguous destination
  • ambiguous source, unambiguous destination
  • ambiguous source, ambiguous destination

Only the first of these is straightforward.

In the case of the existing ls and rm commands in Fish, I have taken that view that an ambiguous path refers to both the possible targets, but while this seems unobjectionable in the case of ls, it clearly leads to the possibility of removing more than the user intended. I plan to reconsider that in the light of these ruminations on cp and mv.

It seems to me that probably a better way forward than taking an ambiguous specification as referring to both its targets is to demand disambiguation. The question is: how would that be achieved?

I have made a point, in the examples above, of listing subtly different alternative forms for some commands, e.g.

mv has-drunk private
mv has-drunk private/

Over recent years (particularly with the rise of tab completion in shells) it has become increasing common to allow directories to be specified including a trailing slash, and no harm derives from this practice. The question is we can we exploit this tend this as a way of disambiguating between tags and namespaces.

In Unix shells and commands, in most cases, the inclusion or omission of the slash makes no difference to behaviour, though I am aware of at least one case where this is not so—rsync.

To quote from the man page for rsync (on Mac OS 10.6.8):

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the contain- ing directory on the destination. In other words, each of the follow- ing commands copies the files in the same way, including their setting of the attributes of /dest/foo:

I personally find this behaviour bizarre and it always make me slightly nervous, using rsync, which is otherwise superior in every way to rcp. However, the idea of using a trailing slash to specify the namespace (cf. directory) rather than the tag is different: it is not changing the behaviour of the command according to which of two unanambiguous specifications of a directory (namespace) is used, but rather using the slash to disambiguate; this seems less objectionable.

The question then would be, how would unambiguously specify the tag? It would seem very ill advised to require that namespaces should always be specified with a trailing slash, so we cannot sensibly say that a path not ending a slash will be taken to be a tag: that way madness lies.

My temptation is to use a trailing dot. I like this as a solution partly because I can recall no case, in nearly thirty years of Unix use, of ever meeting a file (other than the directories . and ..) whose name ended in a dot. I also feel that, while files do not always have extensions, and directory names may include them, by convention most filenames do contain a period and most directory names do not. Admittedly, this is not true of tags, but for me, at least, some association between dots-in-tag-names and “file-ness” survives through the analogy on which Fish is built. If we adopt this idea, our problems all but disappear. We can imagine:

$ mv private/moments private/thoughts
Error: private/moments is ambiguous; use private/moments. or private/moments/

$ mv private/moments. private/thoughts
# moves the tag private/moments to the tag private/thoughts/moments

$ mv private/moments/ private/thoughts
# moves the namespace private/moments to the namespace private/thoughts/moments

$ mv private/moments/ private/moments. private/thoughts
# moves private/moments. to private/thoughts/moments.
# and private/moments/ to private/thoughts/moments/

If this were adopted in Fish, I think there would be an overwhelming case for making rm work the same way; the current behaviour of ls might stay the same, as it is not descructive, or might change in the interest of slavish consistency.

That concludes our discussion of cp and mv for abstract tags in Fish. In the third and perhaps final part of this “trilogy”, we will discuss moving and copying tags and namespaces in the context of the object hierarchy, i.e., how we might copy or move tags from one object to another, or within or among objects.

17 January 2012

Like ℵ0 File Systems for Everyone

If the ideal blog post is but a screenful or two, this one fails rather badly. So badly, in fact, that I’ve decided to split it into three, each of which will itself be amply proportioned. This post discusses the analogy between the Unix File System and Fluidinfo, as a way of preparing the ground for the following posts. The second post will build on that to discuss options for syntax of potential cp and mv commands in Fish. The final post in the series will discuss upload and download, or maybe rcp, rsync and ftp; as well as possibly fsync, i.e., commands for copying parts of a local file system to or from Fluidinfo.

Everyone loves the Unix File System, even through the expletives we utter at those moments when our ardour is temporarily diminished.

The Fluidinfo Shell, Fish, is largely built on the idea of mapping Fluidinfo namespaces to Unix directories, tags to files, and tag values to the contents of files. Fish’s commands are mostly constructed with close and deliberate reference to corresponding Unix commands.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at101-FSvsFluidinfo.png

Fluidinfo and Unix diverge in several small and one large way when viewed through the lens of this analogy.

  • In Unix, a path (such as /Users/njr/foo) resolves either to a file or to a directory, but not both; in Fluidinfo, a file and a namespace may share a path. This difference causes a number of complications in mapping Unix commands to Fluidinfo.
  • In Unix, full paths begin with a slash (/) and the Unix kernel works only with such full paths. The shell maintains a notion of a current working directory and allows the user to refer to files relative to that. The Fluidinfo Shell, Fish, does something similar in the context of Fluidinfo, but effectively clamps the working directory to the user’s home namespace (njr, in my case).
  • To a first approximation, Unix provides only a single hierarchical file structure for each user: Fluidinfo provides an infinite number of them—one for every possible about tag text and one for each UUID. (There are 2128, or some 340 decillion = 3.4 ⨉ 1038, UUIDs). It is this infinite number of potential about tags motivates the title of this piece: ℵ0 (“aleph zero”) is the smallest infinity—the number of counting numbers. Non-mathematicians may be unaware, and skeptical, of the idea that there can be infinities of different sizes, but thanks the remarkable work of Georg Cantor, we can be quite confident that this is so. Infinite sets require some care, and have surprising properties: for example, ℵ0 is not only the number of positive whole numbers, but also the number of whole numbers (positive and negative). Even the number of rational numbers (fractions) is the same. It is only when we add in the irrationals, to form the real numbers, which are vastly more numerous than the rationals, that we get a larger cardinal, ℵ1. Needless to say, while Fluidinfo does in principle offer ℵ0 tag hierarchies to each user, just as real-world Turing machines are limited by the inability of Maxell to supply an infinite tape, the maximum possible size of Fluidinfo itself is limited by the physical properties of the universe, not to mention the finite capacity of even Amazon’s Simple Storage System, S3.

It is this last difference—a tag hierarchy for each Fluidinfo object—that is most significant. One way of thinking about it is this: on Unix (at least considering a single host, with a single file system, in isolation) there is a single directory hierarchy containing everything, as illustrated on the left in the diagram below.

In contrast, we can think of Fluidinfo as having a tag hierarchy, with (potentiall) different tag values for the same tag, on each of a very large number of objects.

Some tags in the tag heirarchy are present on many objects, others on only one, or even none (since there is a notion of an abstract tag in Fluidinfo, separate from any concrete instances of it actually attached to objects).

The tag heirarchies on all the objects can be viewed as a single structure, only part of which is present on any object. I’ve attempted to show this with dark tags and directories for instantiated parts of the Fluidinfo hierarchy on each objects, with parts instantiated elsewhere left pale in the diagram below. I’ve illustrated two whiskies (which use the same parts of the overall hierarchy) and one city, which uses mostly different tags.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at101-OneFSvsMany.png

We might call this way of thinking about the Fluidinfo tag structure as the the many tag hierarchies view. Sometimes this seems like the natural way to think about Fluidinfo’s tag hierarchy, with the tag structure being in some sense inferior to the objects. The objects might then be likened to different hosts in a computer network, and tags could be thought of as having a single value in each separate hierarchy, just as the contents of a single file is well-defined and unambiguous on a specific computer.

At other times, it seems more natural to think of the tag hierarchy as the primary entity with the with tags having different values on different objects. I’ve tried to illustrate this below, with different collections of objects hanging off tags in a single hierarchy.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at101-OneFSforFI.gif

We might call this the single tag hierarchy view.

Neither of these views is really more correct that the other—the underlying storage mechanism looks little like either, though is perhaps slightly closer to the second view, in the sense that all the values for a given tag are stored together in the database.

Fluidinfo’s addressing mechanism, however, is very like the “many tag hierarchies” view. After all, the address to access Alice’s whisky/age tag on Ardbeg is

http://fluiddb.fluidinfo.com/about/Ardbeg/alice/whisky/rating

Ignoring the base URL (http://fluiddb.fluidinfo.com) we then have something remarkably familiar from Unix, and comparable either to a host or a volume (/about/Ardbeg), followed (after a separating slash) by the tag path (/alice/whisky/rating). Notice how similar this is, for example, to the way we address files across systems using Unix’s remote copy, rcp. If Alice wanted to copy a file from the host ardbeg on her local area network, she might say

rcp ardbeg:whisky/rating ./whisky/ardbeg-rating

or:

rcp ardbeg:/Users/alice/whisky/rating ./whisky/ardbeg-rating

or even, if she were on host talisker, and wished to be more explicit

rcp ardbeg:/Users/alice/whisky/rating talisker:/Users/talisker/whisky/ardbeg-rating

Similarly, if she has multiple file systems mounted on her machine, Alice might say something like:

cp /Volumes/whiskydata/whisky/ardbeg/rating /Users/alice/whisky/ardbeg-rating

where now /Volumes/whiskydata is a disk (or file system).

We could also note that an alternative way of specifying the previous rcp command would be:

rcp /Users/alice/whisky/rating@ardbeg ./whisky/ardbeg-rating

This is entirely equivalent, but looks slightly more like the “single tag hierarchy” view, at least in the sense that the disambiguating host comes after—rather than before—the tag path.

However we lay things out, it feels fairly clear that there is a good case for extending our analogy to include either volumes, or more likely hosts as the file-system analogues of Fluidinfo’s objects.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at101-ExtendedFSvsFI.png

That concludes the introduction to this series. In the next part, I’ll discuss how these alternative views might help us to decide the best way to implement mv and cp commands in Fish.

15 January 2012

Updates to the Web Apps Shell-Fish and About Tag

I just pushed new versions of Shell-Fish (the online version of Fish) and the About Tag app.

There are some quite significant changes, which I will summarize briefly.

  • Shell-Fish has been updated to the latest version 4.26. It has sat at 4.00.0 for a ridiculously long time, and that became self-sustaining because non-trivial changes were needed to the web version to make it use the new one.

  • The About Tag app now includes a clone of Shell-Fish. Click the bottom tag on the logo to use this.

  • The search functionality in About Tag has been overhauled and remodelled on the much better version used at yet another of my sites, Art of Tagging, which is hosted on the intriguing PythonAnywhere platform. Changes include:

    • When you click a link, the diagram now appears inline
    • If you re-click, it refreshes
    • If you’re signed in, you can tag items and the diagram updates.
  • The changes are quite major, and I haven’t tested them as thoroughly as I would wish, so I’ve probably broken some thing. Sorry. Let me know and I’ll try to fix them.

  • The three sites all use the same model of requiring your Fluidinfo credentials; unfortunately they don’t share databases, so if you want to use all three, you need put them in three times. This is crazy; all three sites should merge, but that will take time.

  • Although the search functionality on the About Tag site mostly works well, Google’s hard 10-second time limit on queries means that queries that match a lot of objects tend to time out. The version of Art of Tagging doesn’t suffer from this limit, but unfortunately that site is quite often down for maintenance.

  • There are many small changes to Shell-Fish but two really major bits of new functionality:

    • Aliases and syncing now work. If you create aliases on a local version, type sync in Shell-Fish and they will appear there, and vice-versa. Among other things, this makes it easy to use sequences and access them from both sites.

    • You can drop the -i and -a in almost all cases. So (in the most common case), rather than:

      fish> tag -a "artist:melody gardot" rating=10

      you can simply use:

      fish> tag "artist:melody gardot" rating=10

      though the old form will, of course, continue to work, along with -q and the new -@ (for anonymous objects).

  • My process of turning everything from XHTML to HTML5 continues. Shell-Fish and the About Tag we app are now pure HTML5, like this blog. Previously, they were all XHTML, which worked for everything under the Sun except Microsoft Browsers; at least now Internet Explorer 9 should be able to use some of the functionality. Users of Internet Explorer 1–8, and therefore of Windows XP, remain out of luck (in more ways than I can possibly enumerate, most of which are rather more significant than not being able to use these web sites).

12 January 2012

Dial @ for Anonymous

I have added a new flag to Fish in version 4.25, which I have just pushed to the Fish repository on Github. The feature allows you to ask Fluidinfo to usher into existence a brand new, tagless object with no about tag, and is accessed using the new -@ flag as follows:

$ fish tag -@ private/note="Dial @ for Anonymous"

Tagged object d679b99d-fe5e-43e9-88dc-47334df776c7 with private/note = "Dial @ for Anonymous"

Although Fish usually exhibits an almost monastic silence when it successfully completes a non-reporting task you have set it, in this case, there seems quite a high likelihood that you will want to know the ID of the object it has created, so it tells you.

The new flag also works with the tag command’s recently added -f flag to accept input from a file or from standard input. In particular, this provides a convenient way of attaching a multi-line note to an anonymous object:

$ fish tag -@f private/note
Everyone needs a little privacy sometimes.
And you don't get much more private than a private tag
on an anonymous object.
^D

Tagged object afcf1f23-7c4b-44ea-a251-1bc99e959436 with private/note = "Everyone needs a little privacy sometimes.
And you don't get much more private than a private tag
on an anonymous object.
"

It will not have escaped the notice of the attentive reader of this About Tag blog that its author has spent perhaps five years proselytizing on behalf of Fluidinfo’s about tag (fluiddb/about) as the one true way to identify an object in Fluidinfo, benefiting as it does from the system-guaranteed properties of uniqueness and persistence. Nothing has changed: I still believe that data that is to have any social aspect, i.e., which might ever be shared with someone else, or associated with someone else’s data, is almost always better placed on an object with an about tag.

For personal data, however, particularly when private, I think the situation reverses. If the data I am writing is not intended to be in any way social, it may be that it is much better to put it on an anonymous object. It also provides an conventient way of guaranteeing that an object, at least at creation time, currently has no other tags.

I plan shortly to revamp the sequence command to allow it to use anonymous objects, which probably fit better with some of its use cases.

09 January 2012

From Sets to Lists

The Fluidinfo API has just been updated from version 1.13 to version 1.14. There is a single change to the API: where previously it supported sets of strings as a primitive type for tag values, it now instead supports lists of strings. The differences are:

  • in lists, the order of elements is significant, whereas is sets it is not
  • in lists, repetition is allowed, whereas in sets an element is either present of absent.

In mathematics, sets are conventionally denoted using braces {...} whereas lists (sequences) are more often denoted using either square brackets [...] or parentheses (...), and this is also true in some more mathematical programming languages, such as Python.

The following pair of interactions with Fish illustrates the difference. Here is the old behaviour, with a slightly older version of Fish:

$ fish --version
fish 4.20

$ fish show fluidinfo /fluiddb/version /fluiddb/release-date
Object with about="fluidinfo":
  /fluiddb/api-version = "1.13"
  /fluiddb/release-date = "2011-12-02T02:28:17Z"

$ fish tag Paris airports='{"Orly", "Charles de Gaulle", "Orly"}'

$ fish show Paris airports
Object with about="Paris":
  /njr/airports = {
    "Charles de Gaulle",
    "Orly"
  }

Notice how the set specified as {"Orly", "Charles de Gaulle", "Orly"} has been deduplicated and reordered, and that Fish uses braces to denote the value.

Here is the new behaviour, with a new version of Fish that I’ve just pushed to Github:

$ fish --version
fish 4.24

$ fish show fluidinfo /fluiddb/version /fluiddb/release-date
Object with about="fluidinfo":
  /fluiddb/api-version = "1.14"
  /fluiddb/release-date = "2012-01-10T00:34:00Z"

$ fish tag Paris airports='["Orly", "Charles de Gaulle", "Orly"]'

$ fish show Paris airportsObject with about="Paris":
  /njr/airports = [
    "Orly",
    "Charles de Gaulle",
    "Orly"
  ]

In this case, order has been preserved, and duplicates in the list have been retained.

For the moment, with an eye to backwards compatibility, Fish allows you to use either braces or square brackets when specifying a compound tag value, but always reports the result using square brackets.

For people interfacing with Fluidinfo directly through the API, there is no syntactic change since compound values were already sent to and returned from the API using JSON lists. The difference is behavioural, with order and duplicates now being preserved in lists sent to Fluidinfo where previously they were not.

Existing compound values will be returned in an unspecified order.

Fish 4.23

I’ve just pushed a new version, 4.23, of Fish to Github.

It contains three changes:

  • It now shows non-primitive tag values when they are textual, where before it simply showed their MIME type and size.
  • It allows standard input (stdin) to be used to specify long string values for tags, which are then written, by default, with MIME type text/plain.
  • Where previously it used {...} to denote tag values that are sets of strings (both for input and output), it now uses [...] instead, reflecting a forthcoming change to the API. The change is that Fluidinfo’s primitive compound values will transmogrify from unordered sets of strings to ordered lists of strings. I will blog separately about that when the release occurs. (The change to Fish is backwards compatible, and braces will still be allowed on input.)

I’ll explain a little more about the first two changes.

Non-primitive text types

In the early days of Fluidinfo, almost all tag values were primitive, i.e. they were either numbers, strings, booleans, sets of strings or valueless (NULL, if you prefer). However, the API has always supported tag values with arbitrary MIME types.

Fish largely ignored the MIME type in the early days, and since almost all values were primitive, this was not a great problem. However, as Fluidinfo has developed, ever more non-primitive tag values are used, and the result has been that using Fish’s show, tags and get commands has caused binary data to be dumped to terminals with at best meaningless, and at worst destructive consequences.

As a result, I previously changed Fish to show only information about the size and MIME type for non-primitive tag values. For example:

$ fish show abouttag image/red-spinner.gif
Object with about="abouttag":
  /njr/image/red-spinner.gif = <Non-primitive value of type image/gif (size 3208)>

While this is good for MIME types such as GIF, it is less useful for textual MIME types. For example, instead of using primitive strings for Fluidinfo’s record of its API version and release date, the team decided to use MIME type text/plain so that these show up nicely in browers, as you can verify by visiting:

But with the old Fish behaviour, this meant that the result was the following, less than helpful, output:

$ fish show fluidinfo /fluiddb/version /fluiddb/release-date
Object with about="fluidinfo":
  /fluiddb/version = <Non-primitive value of type text/plain (size 4)>
  /fluiddb/release-date = <Non-primitive value of type text/plain (size 20)>

So with the change in 4.23, Fish now provides more helpful output whenever it recognizes a MIME type as textual:

$ fish show fluidinfo /fluiddb/version /fluiddb/release-date
Object with about="fluidinfo":
  /fluiddb/version = "1.13"
  /fluiddb/release-date = "2011-12-02T02:28:17Z"

Setting Tag Values from Standard Input

I added the -f flag to Fish’s tag command the other day to allow the value of a tag to be set from a file. This left the slightly anomalous situation that when -f was used, any value specified would be taken as a filename, but any valueless tags specified were simply set as old-style valueless tags.

Since that useful change, I was reflecting on two further anomalies. First, in Unix, it is usually possible to use standard input instead of a file, which is helpful both for pipelining data and for typing multi-line input. Fish provided no mechanism for this. Secondly, there is no easy way in Fish of directly specifying a multi-line string value.

I realised that all three of these anomalies can be pleasingly remedied by changing things so that when a valueless tag is given to Fish’s tag command, in combination with the -f flag, Fish will read stdin to get a value for the tag. (If more than one tag is specified this way, the same value from stdin is be used for all of them.) By default, the MIME type will be set to text/plain, though the -M flag can be used to specify something different.

So that’s what I’ve done. Here is an example:

$ fish tag -f foo foo
This is line one.
This is line two.
...and that's your lot!
^D

$ fish show foo foo
Object with about="foo":
  /njr/foo = "This is line one.
This is line two.
...and that's your lot!
"

For those less familiar with the Unix command line, here I typed fish tag -f foo foo at the command promt ($). Fish then read what I typed until I terminated the input with the end-of-file character (CTRL+D on Unix, including Mac OS X, CTRL+Z on Windows). The input was then used as the tag value.

Here is an example using a pipe:

$ cat > foo.txt
One
Two
Buckle my shoe

$ cat foo.txt | fish tag -f foo foo

$ fish show foo foo
Object with about="foo":
  /njr/foo = "One
Two
Buckle my shoe
"

Labels