29 April 2011

Wikifying Fluidinfo

One of the descriptions people have been known to use for Fluidinfo is

The database with the heart of a wiki.

I have hated that description from the first time I heard it. For me, the defining, central idea of a wiki is that it offers a single version of the truth that anyone can change. Fluidinfo isn’t like that: anyone can write to it, but each user writes in her own, private space; Fluidinfo offers as many versions of the truth as there are users. There are no edit wars in Fluidinfo.

Sometimes, this is brilliant. Rhiannon sticks her opinion in rhiannon/opinion and I stick mine in njr/opinion. For information that is personal—whether because it represents an opinion, or something to do with the user, or something of interest only to one person and a few friends—storing information in namespaces is perfect. On top of that, the permissions system adds a powerful layer of flexibility.

Where it feels unnatural is when we are recording facts. For example, I published the periodic table to Fluidinfo under the miro namespace (Miró being the data analysis software produced by my company, Stochastic Solutions; probably the only analytics software in the world with native integration to Fluidinfo at this time). So now, if you go to the object with about tag element:Hydrogen (id 270a8269-f02d-4925-b152-da3934edaa43) you will find lots of useful data about Hydrogen. (It was all culled from about seven different pages of not-very-structured data in Wikipedia, that would be nightmare for a machine to use.)

It looks like this;

daisyElementHydrogen.png

and like this:

fdb tags -F -a 'element:Hydrogen'
Object with about="element:Hydrogen":
/objects/270a8269-f02d-4925-b152-da3934edaa43
  fluiddb/about = "element:Hydrogen"
  njr/index/class = "element"
  miro/elements/Etymology = "Greek hydrogenes"
  miro/elements/Period = 1
  miro/elements/Group = 1
  miro/elements/AtomicWeight = 1.007947
  miro/elements/RelativeAtomicMass = 1.007947
  miro/class = "record"
  miro/elements/MeltingPointC = -258.975
  miro/elements/Description = "gas"
  miro/elements/Symbol = "H"
  miro/elements/BoilingPointF = -423.17
  miro/elements/BoilingPointC = -252.87
  miro/elements/db-record-number = 1
  miro/elements/ChemicalSeries = "Nonmetal"
  miro/elements/Name = "Hydrogen"
  miro/elements/Z = 1
  miro/elements/Colour = "colorless"
  miro/elements/db-next-record-about = "element:Helium"
  njr/rating = 10
  njr/index/about
  miro/elements/MeltingPointKelvin = 14.2
  miro/elements/Density = 8.988e-05

But that’s crazy. Hydrogen has atomic number 1; its symbol is H; its relative atomic mass really is somewhere near 1.007947. These simple facts have nothing to with Miró, or njr. In fact, if they turn out to be wrong, matters are even worse, because no one except Miró can change them.

For all Fluidinfo’s elegance and power, it encourages information to be ghettoized and personalized even when people are really wanting just to add uncontentious, factual data. In doing so, it makes it hard for others to correct, extend and improve it, and harder for it to be found and used. It’s natural that if you want to know how I rate something, you would look at a tag called njr/rating; but how could you know that you need to look under a user called miro to find information about Hydrogen? It’s a problem.

What if Fluidinfo Were Actually More Like a Wiki?

Fluidinfo comes with a fairly powerful permissions system that allows detailed control over who can read and write each tag and namespace. A tag can be set so that only its owner can write to it, or so that everyone can, or so that only a named set of users can write to it.

This means that Fluidinfo already has a core piece of infrastructure for enabling some wiki-like functionality. At the simplest level, we could have a user whose top-level namespace gave write permission to everyone, and we could augment this with a policy that made that apply recursively to all sub-namespaces and tags under it. Perhaps we would call that user wiki. (I could actually create such a user; or you could; Terrell probably won’t.)

The suggestion would then be that anyone who wanted to publish factual information to Fluidinfo consider defaulting to writing it under the wiki namespace. Instead of

Is that enough?

I confess that if anyone had described Wikipedia to me before it existed, I would have been a naysayer; I would simply never have believed that its anarchical processes could possibly have produced anything of value. Clearly I was wrong; in practice, there is a vast amount of useful information in there, and the level of accuracy of information in Wikipedia is remarkably high.

Wikipedia deals with vandalism largely through the work of humans undoing vandalising changes, often with breathtaking speed. Fluidinfo does not, today, have a mechanism for tracking and reverting changes, though it would clearly be possible to create such a mechanism. (I’m not even sure what transaction records Fluidinfo keeps today, but the great thing about software is that if it doesn’t keep enough today, it could be changed so that it did tomorrow.)

Even today, Fluidinfo’s permissions system does offer some powerful controls. We could allow everyone write access to the whole of the wiki namespace by default, but remove abusers. Or we could be more restrictive. We could have groups controlling different tags or namespaces under wiki. We could allow people in cautiously, perhaps requiring endorsement first. There are many possibilities. In the absence of an easy way to revert vandalising changes, we might need to lean towards being more restrictive early on; if reversion capabilities were added to Fluidinfo, together with detailed tracking of which user makes which changes, we might be able to become more liberal.

We could also have loose restrictions when we are trying to build up information in some domain, and potentially tighten them later. (After all, the periodic table is fairly stable, as are the planets, even if Pluto’s not a planet any more.)

Is this a Good Idea?

I don’t know.

I think there is a clear need for Fluidinfo to have some mechanism for detaching non-personal information from a (personal) user; for making reference information available in a more predictable, uniform, cooperatively gathered way. This can happen to some extent through individual initiatives, but something like the idea of a wiki user seems like a possible improvement.

On the other hand, this may just be a recipe for edit wars.

Overall, it seems to me that Wikipedia, and other similar collaborative projects, stand as a kind of existence proof that wiki-like mechanisms can work. So on balance, at the moment, I think there might be some benefit in trying something like the idea of a wiki user.

4 comments:

  1. Anonymous29/4/11 17:22

    But, but, *everything* is an opinion. It is the opinion of a ton of people that the symbol for Hydrogen is H, but... it was also the opinion of a ton of people that Pluto was a planet.

    A fact is merely where we have decided there is a lack of contention.

    In order to detect contention, we have to have separation of the different opinion holders - a wiki user conflates that possible separation - a wiki user would represent the aggregation of differences. Please let's keep the aggregation of collective meaning at a different layer than Fluid.

    Let's build an app (on top of Fluid) that keeps score of opinions - and let *that* represent some 'truth'. Of course, who's opinions do you include? Well, that's yet another opinion.

    Turtles... all the way down.

    -1 on a wiki user :)

    Terrell

    ReplyDelete
  2. Terrell

    It's a reasonable position; but I don't really agree that everything's opinion. I guess you get into Platonist arguments, but I tend to believe things I can prove, at least. Sure, axioms might change, my proofs may be incorrect, but we seem to make useful progress be agreeing that 1 + 1 = 2. We managed to land people on the Moon by believing Newtons Laws (even though they are only approximations; or are "wrong" if you prefer). But the point is, independent of whether Newton's second law is correct, we can agree what Newton's second law states, can't we?

    You contend that "The symbol for Hydrogen is H" is just an opinion. Do you also think it's just an opinion that the first letter of "Hydrogen" is H? It's a coherent position, but seems rather unworldly to me, and somewhat out of keeping with the Fluidinfo design goal of making computers easier for humans to use.

    Are we going to store the same information thousands of times because we all need our own copy (miro/elements/Symbol = "H"; njr/elements/symbols="H"; terrell/elements/symbol = "H")? If not, how do we even remember which one we use? Maybe you trust the miro user; maybe you agree with miro about the chemical symbol for hydrogen. Are you going to tag its miro/elements/symbol tag with a terrell/rating or a terrell/trusts or something? Maybe you'll trust njr too. But what if there's an njr/elements/symbol and a miro/elements/Symbol? How will you even remember?

    Of course, I agree there is conflict and a range of opinion even on "facts". Perhaps you go further believe that the "neutral point of view" that Wikipedia advocates is an oxymoron.

    I don't feel the need to have my own capital of France, symbol for Hydrogen or sum of 1 and 1; and I mostly know the cases in which I dispute facts that most/many others accept and could probably live with storing my own heretical versions in those cases but embracing the common view elsewhere.

    Practically, it seems to me there is a problem if people want to share information in Fluidinfo at the moment, and not always to create their own tags. I tend to think a wiki user could help with that.

    But thanks for disagreeing and elucidating how and why you disagree.

    NIck

    ReplyDelete
  3. I guess I'm skeptical of the wiki user in the same way your describe your hypothetical naysaying of Wikipedia. Fluidinfo is still a young project and a lot of it is very... fluid. Something like the wiki user would depend a lot on how it's implemented and whether people work to keep the vandals under control.

    There would be nothing to stop someone from changing wiki/element/name to 氢 (Chinese for Hydrogen). It's a fact and for someone who reads Chinese, it's quite correct (a least, I hope so). However, it's not English, so an anglophone might edit it back to Hydrogen, but then maybe someone in Russia decides to change the name to Водород, or some joker decides to change it to Uranium or http://127.0.0.1. With a wiki user, edit wars would almost be inevitable.

    At the beginning, Wikipedia really was the encyclopedia that anyone could edit. It's true now, but there are exceptions. Articles can be locked to prevent vandalism, and only editors who have a certain number of edits can make changes to them. Since that doesn't necessarily discourage pranksters, Wikipedia has developed other tools to deal with habitual offenders, such as IP blocks. Today, the edit wars are barely noticeable.

    I think one of the keys for Wikipedia's popularity is their software. MediaWiki has tools for article locking, tracking edit histories, tracking recent changes, tracking individual articles, hierarchical membership, and much, much more. While the tools may be things that people take for granted, many of them only appeared after problems arose. I look forward to Fluidinfo releasing their code at some point, so I can tinker around with it and figure out what it can and can't do.

    Wikipedia also has a global community of editors (tens of thousands on the English Wikipedia alone) who work to create and improve articles and to keep the vandalism to a minimum. Along the way, they came up with redirect pages, disambiguation pages, stub templates, categories, and other things that aid in Wikipedia's organization. Fluidinfo's community is still very small in comparison.

    ReplyDelete
  4. Michael

    I agree with almost all of this. I think some extra infrastructure would be neeeded (as a minimum, logging who did what and making reversion easy), and I think that sing the permissions system would be necessary to block obvious spammers and vandals. (The permissions system seems like a more powerful mechanism than IP blocking, and if we had small hurdles people have to get over before writing it might be possible to avoid the worst problems of vandalism and spam.)

    The question of language seems to me to be a broader one in Fluidinfo. There's are two main possibilities, as far as I can see. One is that we aim to use language-specific objects (perhaps with language explicitly in the about tag ("en:book:animal farm (george orwell)" etc.). On balance, I think such separate objects for separate languages are probably desirable. (I should write a post on that.) The other main possibility is that we go in the opposite direction and explicitly aim to have common objects across langauges to try to collect together all the information about something in one place. In that case, we would clearly need separate wiki users for each language, maybe with structured different names (en-wiki, fr-wiki etc.) or, perhaps more naturally, just with an appropriate name in each language. (I'm not sure what the French for wiki is, but I'm sure the Académie français will be able to advise.)

    Of course, unarguably, Fluidinfo is small today and a wiki per se user wasn't really envisaged when Fluidinfo was written, but as I say, some mechanisms that might help are there.

    ReplyDelete

Labels