Being the sorry sort of person I am, I’ve spent a fair portion of my last day of freedom reading Patterns and Inconsistencies in Collaborative Tagging Systems : An Examination of Tagging Practices (Kipp, Margaret E. I. and Campbell, D. Grant (2006). link).
Basically, they looked at the del.icio.us tags used to describe a number of URLs. In many respects, it’s an experiment that seeks to provide a commentary upon Clay Shirky’s seriously fab defence of tagging, Ontology is Overrated. Fair play to the academics here for engaging seriously with arguments presented on blogs. Shirky’s argument could be summarised by saying that the serious, top-down organisation systems we have are the consequence of the things we have to organise. When you need to sort out something physical and finite like a big pile of books, for example, you need to decide on some sort of over-riding system. That might be author A-Z, title A-Z, subject A-Z or a combination of those. The libraries, CD collections and corporate databases you encounter are probably organised along these lines in one way or another.
When it comes to the Internet, of course, the sheer amount and volume of information staggers these ontological approaches. There isn’t a single, useful directory of the Internet. There’s still the open directory and the yahoo directory, of course, but both have become seriously stagnated over the last couple of years. When I interviewed Yahoo’s European manager and asked him about the feature that had effectively made its brand famous, he said that the company now feels it’s not a focus for development. I think they’re right: the model isn’t useful any more.
When you’re talking about the Internet and billions of documents, traditional ontologies break down. Even if you could impose them, exactly how much use is a directory containing a million results once you’d drilled down to Literature>Authors>Shakespeare>Works>Hamlet>Criticism?
Very little. And that’s why we use search engines and social bookmarking nowadays; not directories.
So people tag things using del.icio.us according to their own whim. There are enough of us that ‘your own whim’ coincides with a lot of other people’s whims. The more people who are tagging means that there’s more chance that people will tag along the same lines as you. That’s the gist of Metcalfe’s Law — that the value of a network increases as the square of the number of users. Once there are lots of users, as I’ve suggested before, social search, or non-linear search, becomes possible.
Back to the study. The authors found that there’s actually a lot of user consensus about which tags to use for a lot of sites. Shirky suggested that there’s a power-law style falling off in the tags used to describe a site, like this:

The research found that this varied radically according to the URL chosen. Often there is a much gentler fall-off than the power-law description would suggest. These are some of the results for www.pocketmod.com, a clever site which allows users to create printed personal organisers on a single sheet of paper.
It looks a lot like we agree on a lot of things according to this analysis. Most users apply one to three tags, apparently, so the graph suggests it seems likely that we’ll use a tag that’s used by someone else. Good news for anyone using del.icio.us for search.
There are two big issues, though, that arguably cripple the true power of the del.icio.us service to provide social search; one is technical and the other is about user behaviour.
The technical issue is that del.icio.us doesn’t allow multi-word tags. Nor does it understand that ‘This’ is the same as ‘this’. You have to use some sort of improvisation to create a tag when a only a two-word description seems appropriate. You might tag the same thing, del.icio.us itself perhaps, as ‘socialbookmarking’, ‘social-bookmarking’, ‘social&bookmarking’ or ‘social_bookmarking’. God forbid capital letters get thrown into the mix.
The study conducted ‘co-word analysis’ on the sites it examined. Co-word analysis looks at how frequently two tags crop up together and helps analysts try to establish patterns of behaviour. Unfortunately, since ‘social bookmarking’ is a term rather than a word, it was difficult to establish whether it was common meme or not.
The analysts looked at the site www.bellybytes.com looking for clusters of tags, but because ‘Nutrition’ isn’t the same as ‘nutrition’, there’s not the cluster that you would expect.

The second issue comes about because del.icio.us bookmarking is ostensibly just for you: the benefit for the community is a by-product in many respects. One of the most frequently used tags on del.icio.us is ‘to read’, followed by things like ‘GTD’ or ‘cool’. These tags are about the user, not the subject. They aren’t helpful to anyone else. They are necessarily temporal — you either get round to reading it or you never do; things stop being cool after a while; either you GTD’d it, or you didn’t. In any case it loses its usefulness.
Fortunately, of course, no-one searches for the tag ‘to read’. You’d end up with a load of over-long blog posts like this one. However, the authors hint at the possibilities of a two-dimensional del.icio.us, where time is the second axis. The tags ‘to read’, ‘cool’ and ‘gtd’ might potentially become a common, vital property that the system is able to rank according to its age. If you were able to search on the tags you wanted plus a current ‘cool’ list then that would be ermm.. cool.






















Great post, Ian — and thanks for reading the tagging paper (don’t think I need to now ;-)).
I have tagged this post because it’s brilliant. I’m using Blue Dot since I read a rave review on it yesterday instead of del.icio.us — itrreally does seem to be quite good so far and you can import all your del.icio.us bookmarks.
BTW — in my research the cool thing about del.icio.us I have found is it can be an instant snapshot of what that community feels about a site or a number of similar sites — the number of tags and the complexity of the tag cloud generally reflects how relatively useful — and why it is useful — each site is vs. its peers.
That’s a great point about using del.icio.us as a barometer for the community’s feelings. And not a use I had thought of before.
(Agree on bluedot. I use it too and it does a great job. Their plug-in for IE7 is really good too. I am ‘iand’ on there if you want to befriend me ;) )
Del.icio.us also offers up “popular tags” when you go to tag something — i.e. a selection of what others have used to tag the same item. I wonder how often users take these recommendations, and how that might’ve affected the study? I know I sometimes will add a tag or two from the “popular tags” list, just so others who use my del.icio.us tags will be able to find the item, not just me.
I do that, too, Robin. I kind of consider it ‘good citizenship’ on del.icio.us to make my bookmarks useful to other people. I wonder if most of us think that way, either consciously or unconsciously?
One point regarding the temporal aspect of Del.icio.us tags. Where does that fit in with Chris Anderson’s remarks on the longtail of journalism which suggested that the older the material the greater value it had?
I don’t have the answer!
I guess.… del.icio.us bookmarking is metadata anyway, in some senses. Your tags aren’t registered by Google and doesn’t influence page rank AFAIK. Same with digg and other social bookmarking services. However, it does empower social search, in a way. Adding ‘toread’ on the end of a del.icio.us search would help you narrow down to the articles people thought were important at the time they were published. Not sure whether that answers the question or not.
hello
can someone give me an idea of what are the common points of web 2.0 sites
[…] been a little while since I wrote on this topic, but social (and not-so-social) bookmarking is on the rise […]