In the first of this series of articles we discussed the analogy between the Unix File System and Fluidinfo. The Fluidinfo Shell, Fish, is largely a working through of that analogy. A new insight from that article, upon which we will now expand, is that the objects in Fluidinfo can usefully be viewed as analogues of host computers in a computer network.
In the second article, we discussed how to augment Fish with Unix-like copy (cp) and move (mv) commands for Fluidinfo’s abstract tags and namespaces.
In this third, and—with luck—final part of the trilogy, we will tackle the more complex question of how to copy and move data within and among objects and concrete tags in Fluidinfo. Just as we leant on the behaviour of Unix’s cp and mv commands when considering similar commands for abstract tags in Fluidinfo, we will look to rcp, rsync and ftp as sources of wisdom, guidance and precedent as we try to design similar commands for concrete tags in Fluidinfo.
Recapitulation¶
In the previous article in this series, we concluded that we could sensibly model cp and mv commands for namespaces and abstract tags in Fluidinfo on their Unix counterparts provided we resolve three issues:
- How should we handle ambiguity, given that a tag and a namespace can share the same name (and path) in Fluidinfo?
- How should we handle permissions on copied and moved tags, given the rather different way permissions work in Fluidinfo and Unix?
- How destructive should overwriting behaviour be? If we copy a (whole, abstract) tag “on top” of another, should all old values of that tag be destroyed, or should the result be a union in which the new values take precedence, but the old values remain on objects that were not tagged with the “source” tag?
Our candidate solutions are (respectively):
- Where paths are ambiguous for either source or destination items, we will demand disambiguation by adding a trailing slash (/) to denote a namespace or a trailing dot (.) to denote a tag.
- We propose a reverse Facebook principle, whereby whenever the degree of access granted to data unclear, we should grant the smallest set of rights that the user might reasonably expect, the logic being that while overly restrictive permissions can easily be relaxed, unintentional release of private information is potentially irreversible and more harmful.
- We didn’t really resolve overwrite behaviour beyond a meal-mouthed suggestion that we might need to provide options. While we should do that, this doesn’t answer the question, because unless we force the user to make an explicit choice on a case-by-case basis, we will have to decide on the default behaviour, and that will be the main behaviour people will experience (unless we make such a poor choice that it is normally overridden). I am currently leaning toward the less descructive behaviour as the default.
The Jobs to be Done¶
Thus far, we have considered moving and copying whole tags (abstract tags together with all their concrete instances). But at least as important, and probably more common in practice, will be the need to move or copy just some concrete tags. Concentrating first on the movement and copying within Fluidinfo (as opposed to transfers between Fluidinfo and the local file system), some examples of things Alice might wish to do include:
Copy one or more tag(s) from on object to another. For example:
- Copy Alice’s description tag from paris to city:paris.
- Move Alice’s description tag from paris to city:paris.
- Move Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).
Rename or duplicate tags on one an object or a set of objects. For example:
- Rename Alice’s rating tag on city:paris as star-rating
- Copy Alice’s rating tag on city:paris to star-rating
- Move Alice’s rating tag on city:paris to private/rating
- Perform any of the above operations on all objects Alice has tagged with alice/personal.
Systematically move tags from objects using on about tag convention to those using another. For example:
- Move all Alice’s rating tags on objects she has tagged with alice/city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).
Alice might also be interested in exchanging information between her local file system and Fluidinfo. (In this case, we would probably support only copying, not movement, in much the same way as there is no rmv command, and indeed even drag on drop usually copies rather than moves when source and destination are different volumes or hosts.) Examples of upload and download to and from Fluidinfo might include
Copy a local set of files to an object.
- Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.
- The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the same nominated object
Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files—the precise inverse of the cases above.
Download tags from a number of objects in Fluidinfo to different parts of the local file system.
- Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.
Again, conversely, Alice might wish to upload a directory structure, taking one or more of the upper levels of the directory structure either as object specifiers or tags in some systematic way.
It is unlikely that the first version of any rcp analogue in Fish will support all of these fully, but it is useful to think about what our aspirations might be.
Still another case might be to map between the contents of a local file (e.g. a CSV file) and a set of objects in Fluidinfo. However, I currently regard that as a separate kind of task, and not really suitable for analogues of cp and mv.
Lessons from rcp, rsync, scp etc.¶
I have argued that Fluidinfo objects (which represent things) can usefully be viewed as analogues of hosts (computers) with Unix File systems. That immediately suggests that we might usefully look to rcp and its variants (rsync, scp and possibly ftp) for inspiration.
The basic addressing scheme in rcp simply identifies a remote file as
host:path/to/file.ext
Fr example, to copy a file drink.me from a host wonderland, Alice can say, for example:
rcp wonderland:drink.me .
In fact, in normal use, rcp can replace cp, solong as your file names don’t contain colons. In the case of Fluidinfo, we won’t introduce a different command at this stage but will simply extend cp (though later we will propose adding an rcp for different purposes).
Remembering that we are mapping hosts to Fluidinfo objects, which are usually identified by their about tags, this suggests we might naïvely consider a directly similar rcp-like cp command for Fluidinfo. The first use case we suggested for Alice was copying her description from the object for paris to the object for city:paris. So that might be:
cp paris:description city:paris:description
Unfortunately, in this particular case the “solution” doesn’t work well because the target about tag includes a colon. Obviously, in principle, we could escape the colon in some way. If we did the ‘obvious’ thing of escaping with a backslash, and were typing into an interactive Fish shell (rather than from the command line), that would mean we would need to type:
cp paris:description city\:paris:description
which is already tedious. However, if we were working from a Unix shell, which itself uses backslash for escaping, we would need to use either
cp paris:description city\\:paris:description
or perhaps
cp paris:description 'city\:paris':description
This is OK in principle, but would get very tiresome (and be rather error-prone) in practice, particularly since colons are widely used in about tags.
Another option we might consider is to use Fluidinfo’s full path convention. Recall that the value for alice/description on paris is available at
http://fluiddb.fluidinfo.com/about/paris/alice/description
So it might seem promising to base things on that. Our cp command would then look something like:
cp /about/paris/alice/description /about/city:paris/alice/description
Unfortunately, that doesn’t look too promising either. First, Fish already uses a leading slash to introduce absolute paths to other users’ tags (allowing Alice can omit alice/ from her own tags, a significant convenience, that it would be awkward to retain using this scheme).
Perhaps worse, however, is that one of the most common kinds of about tags is a URL. That would cause real pain getting an about tag through the shell with the slashes protected as part of a path such as above. Imagine having to say:
cp /about/http:\\/\\/example.com\\/foo\\/bar/description /about/http:\\/\\/example.com\\/foo\\/bas
This is not really a problem for the API, where the about tag typically gets passed into some function separately and escaped automatically by the machine, but it would be a significant inconvenience for a Fish user.
An alternative that might work better is to swap the host and the path. A colon still wouldn’t make a good separator since colon is a legal character in a tag name, and it would be confusing to reverse rcp‘s convention anyway. We could, however, use @, which would perhaps seem more natural in that order. So Alice might say:
cp description@paris description@city:paris
I think I remember using that as an alternative form with some version of rcp, but I can’t see it in the documentation for any version I’m currently using, so I may be mistaken. Nevertheless, the form does feel reasonably natural from both @‘s use in email addresses and its more general use to mean “at”. Although the @ is used in reasonably common about tags (particularly Twitter names), it doesn’t really cause a problem, since it is not a valid character in a tag name. It would be fine to say:
cp chess@@RedQueen chess@@WhiteQueen
since the second @, in each case, it clearly part of the object identifier. This feels like the first contender.
Another possibility we might mention is specifying objects using -a, -i or -q, as is possible with other Fish commands such as tag. The issue with that is that it only works well if all the tags are on the same object. We could have either repeated flags or different names flags for source and destination, but if we want to allow Unix-like
mv src1 src2 ... srcN dest
that will be less good. But it might be useful for some cases, particularly when the source and destination objects are the same.
Let’s try some other examples from our shopping list, keeping in mind the possibility of using -a, -q etc. as well as the @ convention:
Move Alice’s description tag from paris to city:paris.
cp description@paris description@city:parisor potentially:
cp description@paris @city:parisMove Alice’s rating tag from paris to city:paris while moving it into her private namespace (on the destination object).
cp rating@paris private/rating@city:parisRename Alice’s rating tag on city:paris as star-rating
mv rating@city:paris star-rating@city:paris mv -a city:paris rating star-rating mv -q 'fluiddb/about matches "city:paris"' rating star-ratingWe could also allow a trailing at to mean ‘the same as last time’ on the second or subsequence use, so that
mv rating@city:paris star-rating@might be acceptable, or even just
mv rating@city:paris star-ratingCopy Alice’s rating tag on city:paris to star-rating
cp rating@city:paris star-rating@city:paris cp rating@city:paris star-rating@ cp rating@city:paris star-ratingMove Alice’s rating tag on city:paris to private/rating
mv rating@city:paris private/rating@city:paris mv rating@city:paris private/rating@ mv rating@city:paris private/rating mv -a 'city:paris' rating private/ratingPerform any of the above operations on all objects alice has tagged with alice/personal. For example, let’s take the moving of rating to private/star-rating
mv -q 'has alice/personal' rating private/star-ratingWe could also do the same thing with namespaces, though I didn’t list that as an example. But it seem fairly clear that
cp -q 'has alice/personal' private secretor perhaps
cp -q 'has alice/personal' -r private secretwould mean copy all the tags under personal to a new secret namespace (if it doesn’t exist) or to under secret if it does. [The -r would mean recurse, as with cp, but whether it should be required I’m not sure.]
Move all Alice’s rating tags on objects she has tagged with city to corresponding objects with the same about tag, but preceded by city: (e.g. from paris to city:paris).
This one is completely different and is probably something we need to admit defeat on for now. Though it is not the same, it reminds me of a Unix command I’ve always thought was missing, which is a pattern-based bulk rename (and bulk copy). I actually have to allow systematic renaming of files using commands like:
bmv foo@.txt bar@.txtwhich in which the @ is a wildcard and on the right has the same value as that on the left (like a tagged regular expression). So if a directory had food.txt and fool.txt, this would match rename food.txt as bard.txt, and fool.txt as barl.txt. Systematic construction of about tags could follow a similar pattern, but useful though it would be, is probably out of scope here.
I can, however, imagine allowing a more manual version. For example, we might allow braces and a syntax like:
cp rating@{foo,bar,baz} star-rating@{Foo,Bar,Baz}to move and rename tags from things like rating on foo to star-rating on Foo etc.
Although perhaps not pretty, this @ syntax, combined with -a, -i and -q flags (for when the same object is to be used both sides of the cp or mv) seems to get us a long way. Adding in paired lists in the {a,b,c} form would go a step further.
Upload and Download¶
Ironically, although we modelled our extended cp on rcp, we have concluded that we don’t really need the “r” but can just use cp for all copying and movement within Fluidinfo. Curiously, this leaves open the possibility of using rcp to perform upload and download. The idea would be that paths without an @ in the rcp command would be taken as paths on the local file system while paths that include an @ refer to tags on an object in Fluidinfo.
In case it’s not clear, the key point is that since Fluidinfo tags can have arbitrary MIME types, they can be used to store files: the analogy we have been pursuing is not merely structural. For example, the documentation for Fish is stored in Fluidinfo, as a collection of tags on the object with the about tag fish, with its index.html file at
The be completely clear, this is not a URL pointing to a static HTML file: this is the fish user’s tag index.html on the object with the about tag fish, being retrieved directly from the Fluidinfo database.
A partial visualization of the tags on that object is shown below.
The point of commands for uploading to Fluidinfo is to make it easy to publish a tree of files such as this.
Let’s see how the example use cases for upload and download might work in this case. I am going to assume that we have no difficult cases with ambiguous paths in Fluidinfo, and that if we were to we would either require disambiguation or the commands would fail. (This would obviously be an issue only on download.)
- Copy a local set of files to an object.
Copy the files and directories in Alice’s ~alice/blog directory as corresponding tags and namespaces on the Alice's Blog object in Fluidinfo.
rcp blog blog@"Alice's Blog"The same operation, but now taking the files from ~alice/blogs/drinkingblog/html and placing them in a new alice/blog namspace on the nominated object
rcp ~alice/blogs/drinkingblog/html blog@"Alice's blog"Conversely, Alice might wish to download a blog, stored as a collection of tags and namespaces in Fluidinfo as a set of local files, the precise inverse of the cases above.
rcp blog@"Alice's Blog" blog rcp blog@"Alice's blog" ~alice/blogs/drinkingblog/htmlDownload tags from a number of objects in Fluidinfo to different parts of the local file system.
Alice might have a description tag on objects corresponding to each element of the periodic table and wish to download them either to different local directories or different files within a directory. For example, she might want the description from the element element:mercury to go to elements/mercury/description.txt or to elements/mercury-description on the local file system.
This is a more interesting and difficult case, and again is fundamentally about constructing paths semi-programmatically. Although it is not too hard to think of ways it might be done (perhaps with regular expressions), we might be moving beyond what it’s reasonabe to expect Fish to do. That goes for the final example too.
These examples suggest that modelling a Fish rcp command on a modified Unix-like rcp could work well. It’s almost as if we are considering our host machine as part of Fluidinfo, with its file system representing a concrete tag hierarchy on an anonymous local object. There are certainly other issues to consider, including MIME types and files for potential omission (dot files? backup files ending in ~, symbolic links etc.), but we seem to have a promising way forward.
I was going to consider ftp as an alternative source of inspiration, but its model seems more cumbersome in general, and better suited to situations where you can considering only one host, whereas in Fluidinfo it is normal to consider many objects, so I think we can probably forget ftp.
Summary¶
Perhaps surprisingly, we have largely got there.
Over the last three articles, we have probed and extended the analogy between Fluidinfo and the Unix file system and come up with plausible syntaxes for building mv and cp commands that give us pretty powerful ways of moving and copying data within Fluidinfo, drawing on ideas from cp, rcp, mv and rsync as well as the current Fish.
Despite borrowing from rcp in designing Fish’s cp command, we had left rcp itself free and can now use it as the basis for moving data between a local file system and Fluidinfo, using almost identical conventions save for the interpretation of a path that does not contain an @ (“the same as object as before” for cp, and “the local file system” for rcp).
There are definitely holes to fill, but it feels like there is a reasonable outline spec. If people see problems, or have better ideas or comments, do let me know.
Otherwise, all the remains is the trivial task of implementation. How hard could that be?
No comments:
Post a Comment