19 January 2012

Movement and Copying in Fluidinfo

In the previous article, we discussed the analogy between the Unix File System and Fluidinfo’s tag hierarchy. This analogy forms the basis and inspiration for the Fluidinfo Shell, Fish. But a file system without move and copy commands would be a sad and contemptible thing, and at the moment Fish, like Fluidinfo, is impoverished by the lack of such basic functionality as cp and mv. Here we will try to design such functionality, building on the analogy.

Copying and Moving

In Unix we can

  • copy files with the cp command
  • copy directories (and their contents) with cp -R
  • move files to a different location with mv
  • move directories (and their contents) to a different location with mv
  • rename files, also with the mv command
  • rename directories with mv
  • delete files with rm (“remove”)
  • delete empty directories with the rmdir command and delete directories together with their contents with the rm -r command.

In general, the functionality of mv is conceptually equivalent to copying and then removing an item.

We can also copy files and directories between different machines using the rcp and rsync commands, which are both similar to cp but understand a host: prefix. An alternative to these commands is the ftp command, which operates in a very different manner, and uses different mechanisms, to ultimately similar effect.

Fish, today, offers no commands for moving, renaming or copying tags or namespaces, but does provide an rm command that performs the combined functions of rm and rmdir (also requiring a -r and -f flags in some cases). It is worth noting, briefly, that in some sense Fish’s rm goes much further than rm on Unix, in that the command

fish rm rating

removes not only the abstract rating tag, but every occurrence of that tag in Fluidinfo, potentially on millions of objects. This is why Fish requires the -f flag to force the removal of a tag that is in use. In the prevous article, we argued that Fluidinfo’s objects play the natural analogues of computers in a network. From that perspective, if we think of rcp as a more powerful version remote version of cp, Fish’s rm command is more like a remote rm (presumably rrm) that allows you to remove all files with a given path on all hosts simultaneously. It’s as if you could say something like:

rrm -f *:~/.bashrc

to remove the .bashrc in your home directory on every machine on which you have an account. Indeed, if Linus Torvalds were not merely Linux’s creator but the superuser on all copies of the OS, with such a command he could remove everything on all Linux hosts with

rrm -rf *:/

Let’s think about the Unix commands cp and mv, and their possible generalizations to the realm of Fluidinfo. Recall that, when we want to be precise, we need to distinguish between two different senses of the word “tag”. Ordinarily when we attach a tag, possibly with a value, to an object, we create what we might variously call a “tag instance” or a “concrete tag”. Fluidinfo, however, maintains a user’s tag hierarchy independent of whether tags are actually in use. When discussing tags in this sense, independent of objects, I call them abstract or platonic tags. These are quite real and can persist even when the tag is not in use. The diagram below shows the abstract tag hierarchy for Alice, on the right, and her file system, on the left. Note carefully that in her private namespace Alice has both a tag and a namespace called moments.

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-tags.png

In what follows, assume Alice’s current working directory on Unix is her home directory /home/alice, and for concreteness, assume that her shell is the Bourne Again Shell, Bash.

Copying a file or a tag

On Unix, Alice can copy her has-drunk file into her private directory by saying any of the following:

cp has-drunk private/has-drunk
cp has-drunk private/
cp has-drunk private

Obvious though it is, let us spell out what this means. After running one of these commands, Alice will have two files, /home/alice/has-drunk and /home/alice/private/has-drunk, where previously she had only one, and each will contain a separate copy of the same data.

We could plausibly adopt any or all of these in Fish to copy Alice’s has-drunk tag to her private namespace. But what would that do? I think the most obvious action would be to create a new abstract tag called alice/private/has-drunk and then tag all the objects currently tagged with alice/private/has-drunk with the new alice/private/has-drunk tag, copying their values if any. We would need to consider quite carefully how to handle permissions when performing such copying. The case is rather different from Unix, because in Unix permissions are hierarchical in the sense that a file with public read permission in an fully private directory cannot be read. This is an important detail. On Unix, let’s assume that Alice’s private/things file has open read permissions (644):

alice$ ls -l private/things
-rw-r--r--  1 alice  staff   0 17 Jan 18:57 things

and that she then locks her private directory so that only even she can look at it.

alice$ chmod 700 private

If Bert now trys to look at alice/private/things he will find that he cannot:

bert$ cat ~alice/private/things
cat: /home/alice/private/things: Permission denied

In particular, on Unix this means that if Alice moves a non-private file to a private directory (by which I mean, one with neither read nor execute permission) it becomes unreadable.

In Fluidinfo, the permissions hierarchy is consulted only when new tags and namespaces are created. So if Alice creates a new tag in her private namespace, it will default to being private; if we copy the permissions of a tag when copying the tag, its permissions will be unaltered, and potentially different from if we created the tag afresh in the new location.

The correct behaviour is not clear, and either way there is potential for surprising the user in unpleasant ways, most obviously by making public data that the user intended to be private. We have seen above how by copying permissions with tags we could violate a (reasonable) assumption that moving a tag into a private namespace would make it private. If we fail, however, to copy permissions, copying a private tag to a non-private namespace would result in a non-private tag, which might also be a nasty surprise.

My first inclination here is to do a “reverse Facebook” by, when in doubt, setting the permissions on the destination to the more restrictive of the two possibilities, on the assumption that revealing data that Alice wanted to keep private is both worse and less correctable than making data more private than intended, given the inability to make people unsee (or even, uncopy) things. Needless to say, we could also have options to allow the user to choose what behaviour she wants.

Q1. How should permissions behave when tags and namespaces are copied or moved? Should we go for:

  1. The permission of the destination is copied from the source?
  2. We follow mv = rm + cp + rm and create the new tag or namespace in the new location according to default rules?
  3. Maximum privacy: apply the more restrictive of the permissions suggested by a. and b. (or, if necessary, their most restrictive intersection).

Moving or renaming a file or a tag

Going back to Unix, Alice can move her has-drunk file from her home directory to her private directory with any of these commands:

mv has-drunk private/has-drunk
mv has-drunk private/
mv has-drunk private

Again, we could plausibly adopt all of these forms in Fish to move Alice’s has-drunk name to her private namespace. In this case, there seems no real issue about what should happen. We can’t sensibly “move” the abstract tag but not the concrete ones. Using our rule of thumb that

move = copy to new location + remove the original

or

mv src dest = cp src dest; rm src

this reinforces the case for making cp copy all the concrete tags as well as the abstract tag.

Renaming really raises no extra issues: just as in Unix Alice can rename here has-drunk tag to drunk with a simple

mv has-drunk drunk

she should be able to rename her has-drunk abstract tag as drunk with the same command in Fish, and in the process rename all its concrete instances.

Copying and Moving and Renaming Directories

What about copying a directory in Unix? We use cp for that, but now we need to use -R to force the directory and all its contents to be copied recursively: without this -R, we can’t even copy and empty directory such as things:

$ cp things thangs
cp: things is a directory (not copied).

But with -R, we can copy a directory hierarchy as easily as a file. Let’s suppose Alice wants a duplicate of her private directory in her things directory. She can use any of

cp -R private things
cp -R private things/
cp -R private things/private
cp -R private things/private/

and the result will be a duplicate:

$ ls -RF things
private/

things/private:
moments/      things          thoughts/

things/private/moments:

things/private/thoughts:

Move works essentially the same way and needs no example. Again, so far there seems to be no reason why we shouldn’t build analogous functionality in Fish for copying and moving namespaces and their contents. We would certainly allow the -R flag, but might not require it, and would certainly allow -r to be used as a synonym. As with copying simple files, and following our mv = cp + rm dictum, concrete tags in the hierarchy would be copied, together with their values, on all objects to which they are attached.

Clobbering

Now consider the following commands on Unix, in the context of the same directory structure shown in the original figure:

cp has-drunk private/things
mv has-drunk private/things

The destination, private/things is a file that already exists: it will be clobbered (overwritten) by both cp and mv. The same would be true if Alice copied her private/moments/things file to her private directory with any of

cp private/moments/things private
cp private/moments/things private/
cp private/moments/things private/things

or their mv counterparts. So in Unix, the rule is

When the destination specified is a directory, move or copy the source into that directory. If there was already a file with that name in the directory, delete it first.

When the destination specified is a file, first remove that file if it exists, then copy or move the source to that destination.

Except that this isn’t quite true: you can’t clobber a file with a directory. So

$ cp -R things private/things
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory

(the four failures being as each of the entities in private fails to be be copied), and

$ mv things private/things
mv: rename things to private/things: Not a directory

Why can’t a hulking great directory clobber a puny file? I don’t know. Unix has many wonderful attributes, but consistency is not foremost among them. To my surprise, even adding -f cannot persuade the system to do it. Whether Fish should copy this apparently anomalous behaviour is not completely clear to me: logic suggests not, but fidelity to Unix conventions suggests maybe so. The point may be moot anyway, as there’s a good chance I will require a -f to clobber even a tag, just as I do with rm, if it is in use. This is because whereas on Unix, clobbering a single file removes a single entity, however big. In Fluidinfo, a single abstract tag could have a million instances or more, and I feel requiring a -f flag to encourage the user to confirm her intent before engaging in such (potentially) wide-spread destruction is not unreasonable.

These minor exceptions notwithstanding, the way files get clobbered suggests that we might extend our recipe to include the rule:

If dest is a file:
   mv src dest = rm -f dest; cp -R src dest; rm -r src

Does that make sense for tags in Fish?

This, I think, is an interesting question. We could certainly make Fish remove all the abstract destination tag and all its concrete tags before moving or copying another tag. But it also seems reasonable to consider the possibility of replacing those concrete tags present in the source, but not those absent in the source.

To make this real: suppose Alice says (in Fish)

$ cp has-drunk moments

and at the time she does the state of her has-drunk and drunk tags is as follows:

$ fish show -q 'has alice/has-drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object ec430756-e110-4bc4-b882-544afda1cce8:
  /fluiddb/about = "drink me"

$ fish show -q 'has alice/drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
  /fluiddb/about = "drink me (not poison)"
Object 49126b6d-18bd-457f-af55-a251cf400fc9:
  /fluiddb/about = "drink me not"

or, diagramatically:

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-has-drunk.png

It seems clear that the value of the has-drunk tag on "drink me (not poison)" (no value) should be replaced with the value of the drunk tag (true), and that a new has-drunk tag should be placed on "drink me not", also with the value true. It is less clear, however, that the has-drunk tag on "drink me" needs to be deleted. We will be moving on to discuss selective copying and moving later anyway, but we have certainly formed a question:

Q2. If a tag is clobbered by a mv or cp command, should all of its instances be clobbered, or only those necessary to make way for the tag values from the source?

Shared Paths

Where things get more interesting is when the source or destination is ambiguous, because it specifies both a tag and a namespace. This can’t occur in Unix, because each path resolves unambiguously to either a file or a namespace. Let’s think through the cases with the aid of our diagram of Alice’s tag structure above, which I’ll repeat for easier reference (sod the cost!)

http://fluiddb.fluidinfo.com/about/abouttag/njr/image/at102-alicetags.png

What would Alice reasonably expect to happen if she issued the following commands?

mv private/moments private/thoughts
mv moments private
mv moments private/moments
mv private/moments /alice

There is ambiguity everywhere. Both the source and the destination can be ambiguous, so we have to consider all of:

  • unambiguous source, unambiguous destination
  • unambiguous source, ambiguous destination
  • ambiguous source, unambiguous destination
  • ambiguous source, ambiguous destination

Only the first of these is straightforward.

In the case of the existing ls and rm commands in Fish, I have taken that view that an ambiguous path refers to both the possible targets, but while this seems unobjectionable in the case of ls, it clearly leads to the possibility of removing more than the user intended. I plan to reconsider that in the light of these ruminations on cp and mv.

It seems to me that probably a better way forward than taking an ambiguous specification as referring to both its targets is to demand disambiguation. The question is: how would that be achieved?

I have made a point, in the examples above, of listing subtly different alternative forms for some commands, e.g.

mv has-drunk private
mv has-drunk private/

Over recent years (particularly with the rise of tab completion in shells) it has become increasing common to allow directories to be specified including a trailing slash, and no harm derives from this practice. The question is we can we exploit this tend this as a way of disambiguating between tags and namespaces.

In Unix shells and commands, in most cases, the inclusion or omission of the slash makes no difference to behaviour, though I am aware of at least one case where this is not so—rsync.

To quote from the man page for rsync (on Mac OS 10.6.8):

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the contain- ing directory on the destination. In other words, each of the follow- ing commands copies the files in the same way, including their setting of the attributes of /dest/foo:

I personally find this behaviour bizarre and it always make me slightly nervous, using rsync, which is otherwise superior in every way to rcp. However, the idea of using a trailing slash to specify the namespace (cf. directory) rather than the tag is different: it is not changing the behaviour of the command according to which of two unanambiguous specifications of a directory (namespace) is used, but rather using the slash to disambiguate; this seems less objectionable.

The question then would be, how would unambiguously specify the tag? It would seem very ill advised to require that namespaces should always be specified with a trailing slash, so we cannot sensibly say that a path not ending a slash will be taken to be a tag: that way madness lies.

My temptation is to use a trailing dot. I like this as a solution partly because I can recall no case, in nearly thirty years of Unix use, of ever meeting a file (other than the directories . and ..) whose name ended in a dot. I also feel that, while files do not always have extensions, and directory names may include them, by convention most filenames do contain a period and most directory names do not. Admittedly, this is not true of tags, but for me, at least, some association between dots-in-tag-names and “file-ness” survives through the analogy on which Fish is built. If we adopt this idea, our problems all but disappear. We can imagine:

$ mv private/moments private/thoughts
Error: private/moments is ambiguous; use private/moments. or private/moments/

$ mv private/moments. private/thoughts
# moves the tag private/moments to the tag private/thoughts/moments

$ mv private/moments/ private/thoughts
# moves the namespace private/moments to the namespace private/thoughts/moments

$ mv private/moments/ private/moments. private/thoughts
# moves private/moments. to private/thoughts/moments.
# and private/moments/ to private/thoughts/moments/

If this were adopted in Fish, I think there would be an overwhelming case for making rm work the same way; the current behaviour of ls might stay the same, as it is not descructive, or might change in the interest of slavish consistency.

That concludes our discussion of cp and mv for abstract tags in Fish. In the third and perhaps final part of this “trilogy”, we will discuss moving and copying tags and namespaces in the context of the object hierarchy, i.e., how we might copy or move tags from one object to another, or within or among objects.

No comments:

Post a Comment

Labels