In the previous article, we discussed the
analogy between the Unix File System and Fluidinfo’s tag hierarchy.
This analogy forms the basis and inspiration for the Fluidinfo Shell,
Fish.
But a file system without move and copy commands would be a sad
and contemptible thing, and at the moment Fish, like Fluidinfo,
is impoverished by the lack of such basic functionality as
cp and mv. Here we will try to design such functionality,
building on the analogy.
Copying and Moving
In Unix we can
- copy files with the cp command
- copy directories (and their contents) with cp -R
- move files to a different location with mv
- move directories (and their contents) to a different location with mv
- rename files, also with the mv command
- rename directories with mv
- delete files with rm (“remove”)
- delete empty directories with the rmdir command and
delete directories together with their contents
with the rm -r command.
In general, the functionality of mv is conceptually equivalent
to copying and then removing an item.
We can also copy files and directories between different machines
using the rcp and rsync commands, which are both similar to
cp but understand a host: prefix. An alternative to these
commands is the ftp command, which operates in a very different
manner, and uses different mechanisms, to ultimately similar effect.
Fish, today, offers no commands for moving, renaming or copying tags
or namespaces, but does provide an rm command that performs the
combined functions of rm and rmdir (also requiring a -r
and -f flags in some cases). It is worth noting, briefly,
that in some sense Fish’s rm goes much further than rm on
Unix, in that the command
removes not only the abstract rating tag, but every occurrence
of that tag in Fluidinfo, potentially on millions of objects.
This is why Fish requires the -f flag to force the removal
of a tag that is in use. In the prevous article, we argued that
Fluidinfo’s objects play the natural analogues of computers in a network.
From that perspective, if we think of rcp as a more powerful
version remote version of cp, Fish’s rm command is more like
a remote rm (presumably rrm) that allows you to remove all files with
a given path on all hosts simultaneously. It’s as if you could say
something like:
to remove the .bashrc in your home directory on every machine on which
you have an account. Indeed, if Linus Torvalds were not merely Linux’s
creator but the superuser on all copies of the OS, with such a command he
could remove everything on all Linux hosts with
Let’s think about the Unix commands cp and mv, and their
possible generalizations to the realm of Fluidinfo. Recall that, when
we want to be precise, we need to distinguish between two different
senses of the word “tag”. Ordinarily when we attach a tag, possibly
with a value, to an object, we create what we might variously call a
“tag instance” or a “concrete tag”. Fluidinfo, however, maintains a
user’s tag hierarchy independent of whether tags are actually in use.
When discussing tags in this sense, independent of objects, I call
them abstract or platonic tags. These are quite real and can
persist even when the tag is not in use. The diagram below shows the
abstract tag hierarchy for Alice, on the right, and her file system,
on the left. Note carefully that in her private namespace Alice
has both a tag and a namespace called moments.
In what follows, assume Alice’s current working directory on Unix is
her home directory /home/alice, and for concreteness,
assume that her shell is the Bourne Again Shell, Bash.
Copying a file or a tag
On Unix, Alice can copy her has-drunk file into her private directory
by saying any of the following:
cp has-drunk private/has-drunk
cp has-drunk private/
cp has-drunk private
Obvious though it is, let us spell out what this means. After running
one of these commands, Alice will have two files,
/home/alice/has-drunk and /home/alice/private/has-drunk, where
previously she had only one, and each will contain a separate copy of
the same data.
We could plausibly adopt any or all of these in Fish to copy Alice’s
has-drunk tag to her private namespace. But what would that do?
I think the most obvious action would be to create a new
abstract tag called alice/private/has-drunk and then tag all
the objects currently tagged with alice/private/has-drunk with the new
alice/private/has-drunk tag, copying their values if any.
We would need to consider quite carefully how to handle permissions
when performing such copying.
The case is rather different from Unix, because in Unix permissions
are hierarchical in the sense that a file with public read permission
in an fully private directory cannot be read. This is an important detail.
On Unix, let’s assume that Alice’s private/things file has open read
permissions (644):
alice$ ls -l private/things
-rw-r--r-- 1 alice staff 0 17 Jan 18:57 things
and that she then locks her private directory so that only even she can
look at it.
If Bert now trys to look at alice/private/things he will find that he
cannot:
bert$ cat ~alice/private/things
cat: /home/alice/private/things: Permission denied
In particular, on Unix this means that if Alice moves a non-private file to
a private directory (by which I mean, one with neither read nor execute
permission) it becomes unreadable.
In Fluidinfo, the permissions hierarchy is consulted only when new
tags and namespaces are created. So if Alice creates a new tag in her
private namespace, it will default to being private; if we copy the
permissions of a tag when copying the tag, its permissions will be
unaltered, and potentially different from if we created the tag afresh
in the new location.
The correct behaviour is not clear, and either way there is potential for
surprising the user in unpleasant ways, most obviously by making public
data that the user intended to be private. We have seen above how
by copying permissions with tags we could violate a (reasonable)
assumption that moving a tag into a private namespace would make it private.
If we fail, however, to copy permissions, copying a private tag to a
non-private namespace would result in a non-private tag, which might also
be a nasty surprise.
My first inclination here is to do a “reverse Facebook” by, when in doubt,
setting the permissions on the destination to the more restrictive of
the two possibilities, on the assumption that revealing data that
Alice wanted to keep private is both worse and less correctable than
making data more private than intended, given the inability to make
people unsee (or even, uncopy) things. Needless to say, we could also
have options to allow the user to choose what behaviour she wants.
Q1. How should permissions behave when tags and namespaces are copied
or moved? Should we go for:
- The permission of the destination is copied from the source?
- We follow mv = rm + cp + rm and create the new tag
or namespace in the new location according to default rules?
- Maximum privacy: apply the more restrictive of the permissions
suggested by a. and b. (or, if necessary, their most restrictive
intersection).
Moving or renaming a file or a tag
Going back to Unix, Alice can move her has-drunk file from her
home directory to her private directory with any of these commands:
mv has-drunk private/has-drunk
mv has-drunk private/
mv has-drunk private
Again, we could plausibly adopt all of these forms in Fish to move
Alice’s has-drunk name to her private namespace. In this case,
there seems no real issue about what should happen. We can’t sensibly
“move” the abstract tag but not the concrete ones. Using our rule
of thumb that
move = copy to new location + remove the original
or
mv src dest = cp src dest; rm src
this reinforces the case for making cp copy all the concrete tags
as well as the abstract tag.
Renaming really raises no extra issues: just as in Unix Alice can rename
here has-drunk tag to drunk with a simple
she should be able to rename her has-drunk abstract tag as drunk
with the same command in Fish, and in the process rename all its concrete
instances.
Copying and Moving and Renaming Directories
What about copying a directory in Unix? We use cp for that, but now
we need to use -R to force the directory and all its contents
to be copied recursively: without this -R, we can’t even copy
and empty directory such as things:
$ cp things thangs
cp: things is a directory (not copied).
But with -R, we can copy a directory hierarchy as easily as a file.
Let’s suppose Alice wants a duplicate of her private directory
in her things directory. She can use any of
cp -R private things
cp -R private things/
cp -R private things/private
cp -R private things/private/
and the result will be a duplicate:
$ ls -RF things
private/
things/private:
moments/ things thoughts/
things/private/moments:
things/private/thoughts:
Move works essentially the same way and needs no example. Again, so far
there seems to be no reason why we shouldn’t build analogous functionality
in Fish for copying and moving namespaces and their contents.
We would certainly allow the -R flag, but might not require it,
and would certainly allow -r to be used as a synonym.
As with copying simple files, and following our mv = cp + rm dictum,
concrete tags in the hierarchy would be copied, together with their values,
on all objects to which they are attached.
Clobbering
Now consider the following commands on Unix, in the context of the same
directory structure shown in the original figure:
cp has-drunk private/things
mv has-drunk private/things
The destination, private/things is a file that already exists: it
will be clobbered (overwritten) by both cp and mv. The same
would be true if Alice copied her private/moments/things file to
her private directory with any of
cp private/moments/things private
cp private/moments/things private/
cp private/moments/things private/things
or their mv counterparts. So in Unix, the rule is
When the destination specified is a directory, move or copy the source
into that directory. If there was already a file with that name in
the directory, delete it first.
When the destination specified is a file, first remove that file
if it exists, then copy or move the source to that destination.
Except that this isn’t quite true: you can’t clobber a file with a
directory. So
$ cp -R things private/things
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory
cp: private/things: Not a directory
(the four failures being as each of the entities in private fails to be
be copied), and
$ mv things private/things
mv: rename things to private/things: Not a directory
Why can’t a hulking great directory clobber a puny file? I don’t know.
Unix has many wonderful attributes, but consistency is not foremost
among them. To my surprise, even adding -f cannot persuade the
system to do it. Whether Fish should copy this apparently anomalous
behaviour is not completely clear to me: logic suggests
not, but fidelity to Unix conventions suggests maybe so. The point
may be moot anyway, as there’s a good chance I will require a -f
to clobber even a tag, just as I do with rm, if it is in use.
This is because whereas on Unix, clobbering a single file removes a
single entity, however big. In Fluidinfo, a single abstract tag could
have a million instances or more, and I feel requiring a -f flag
to encourage the user to confirm her intent before engaging in such
(potentially) wide-spread destruction is not unreasonable.
These minor exceptions notwithstanding, the way files get clobbered suggests
that we might extend our recipe to include the rule:
If dest is a file:
mv src dest = rm -f dest; cp -R src dest; rm -r src
Does that make sense for tags in Fish?
This, I think, is an interesting question.
We could certainly make Fish remove all the abstract destination
tag and all its concrete tags before moving or copying another tag.
But it also seems reasonable to consider the possibility of replacing
those concrete tags present in the source, but not those absent in the
source.
To make this real: suppose Alice says (in Fish)
and at the time she does the state of her has-drunk and drunk
tags is as follows:
$ fish show -q 'has alice/has-drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
/fluiddb/about = "drink me (not poison)"
Object ec430756-e110-4bc4-b882-544afda1cce8:
/fluiddb/about = "drink me"
$ fish show -q 'has alice/drunk' /about
2 objects matched
Object d440c5cf-9680-4748-b70e-56f07f35ca09:
/fluiddb/about = "drink me (not poison)"
Object 49126b6d-18bd-457f-af55-a251cf400fc9:
/fluiddb/about = "drink me not"
or, diagramatically:
It seems clear that the value of the has-drunk tag on
"drink me (not poison)" (no value) should be replaced with the
value of the drunk tag (true), and that a new has-drunk tag
should be placed on "drink me not", also with the value true.
It is less clear, however, that the has-drunk
tag on "drink me" needs to be deleted.
We will be moving on to discuss selective copying and moving
later anyway, but we have certainly formed a question:
Q2. If a tag is clobbered by a mv or cp command,
should all of its instances be clobbered, or only those necessary
to make way for the tag values from the source?
Shared Paths
Where things get more interesting is when the source or destination
is ambiguous, because it specifies both a tag and a namespace.
This can’t occur in Unix, because each path resolves unambiguously
to either a file or a namespace. Let’s think through the cases
with the aid of our diagram of Alice’s tag structure above, which I’ll
repeat for easier reference (sod the cost!)
What would Alice reasonably expect to happen if she issued the following
commands?
mv private/moments private/thoughts
mv moments private
mv moments private/moments
mv private/moments /alice
There is ambiguity everywhere. Both the source and the destination
can be ambiguous, so we have to consider all of:
- unambiguous source, unambiguous destination
- unambiguous source, ambiguous destination
- ambiguous source, unambiguous destination
- ambiguous source, ambiguous destination
Only the first of these is straightforward.
In the case of the existing ls and rm commands in Fish,
I have taken that view that an ambiguous path refers to both
the possible targets, but while this seems unobjectionable in
the case of ls, it clearly leads to the possibility of removing
more than the user intended. I plan to reconsider
that in the light of these ruminations on cp and mv.
It seems to me that probably a better way forward than taking an
ambiguous specification as referring to both its targets is to
demand disambiguation. The question is: how would that be achieved?
I have made a point, in the examples above, of listing subtly
different alternative forms for some commands, e.g.
mv has-drunk private
mv has-drunk private/
Over recent years (particularly with the rise of tab completion in shells)
it has become increasing common to allow directories to be specified
including a trailing slash, and no harm derives from this practice.
The question is we can we exploit this tend this as a way of disambiguating
between tags and namespaces.
In Unix shells and commands, in most cases, the
inclusion or omission of the slash makes no difference to behaviour,
though I am aware of at least one case where this is not so—rsync.
To quote from the man page for rsync (on Mac OS 10.6.8):
A trailing slash on the source changes this behavior to avoid creating
an additional directory level at the destination. You can think of a
trailing / on a source as meaning “copy the contents of this directory”
as opposed to “copy the directory by name”, but in both cases the
attributes of the containing directory are transferred to the contain-
ing directory on the destination. In other words, each of the follow-
ing commands copies the files in the same way, including their setting
of the attributes of /dest/foo:
I personally find this behaviour bizarre and it always make me slightly
nervous, using rsync, which is otherwise superior in every way
to rcp. However, the idea of using a trailing slash to specify
the namespace (cf. directory) rather than the tag is different: it is not
changing the behaviour of the command according to which of two unanambiguous
specifications of a directory (namespace) is used, but rather using the
slash to disambiguate; this seems less objectionable.
The question then would be, how would unambiguously specify the tag?
It would seem very ill advised to require that namespaces should
always be specified with a trailing slash, so we cannot sensibly
say that a path not ending a slash will be taken to be a tag:
that way madness lies.
My temptation is to use a trailing dot. I like this as a solution
partly because I can recall no case, in nearly thirty years of Unix
use, of ever meeting a file (other than the directories . and
..) whose name ended in a dot. I also feel that, while files do
not always have extensions, and directory names may include them, by
convention most filenames do contain a period and most directory names
do not. Admittedly, this is not true of tags, but for me, at least,
some association between dots-in-tag-names and “file-ness” survives
through the analogy on which Fish is built. If we adopt this idea,
our problems all but disappear. We can imagine:
$ mv private/moments private/thoughts
Error: private/moments is ambiguous; use private/moments. or private/moments/
$ mv private/moments. private/thoughts
# moves the tag private/moments to the tag private/thoughts/moments
$ mv private/moments/ private/thoughts
# moves the namespace private/moments to the namespace private/thoughts/moments
$ mv private/moments/ private/moments. private/thoughts
# moves private/moments. to private/thoughts/moments.
# and private/moments/ to private/thoughts/moments/
If this were adopted in Fish, I think there would be an overwhelming
case for making rm work the same way; the current behaviour of
ls might stay the same, as it is not descructive, or might change
in the interest of slavish consistency.
That concludes our discussion of cp and mv for abstract tags
in Fish. In the third and perhaps final part of this “trilogy”, we will
discuss moving and copying tags and namespaces in the context of the
object hierarchy, i.e., how we might copy or move tags from one object
to another, or within or among objects.