Of Tags and Taxonomy

PT-AD435 PK BOO 20061006150742Being the sorry sort of person I am, I’ve spent a fair portion of my last day of freedom reading Patterns and Inconsistencies in Collaborative Tagging Systems : An Examination of Tagging Practices (Kipp, Margaret E. I. and Campbell, D. Grant (2006). link).

Basically, they looked at the del.icio.us tags used to describe a number of URLs. In many respects, it’s an exper­i­ment that seeks to provide a com­mentary upon Clay Shirky’s ser­i­ously fab defence of tagging, Ontology is Overrated. Fair play to the aca­demics here for engaging ser­i­ously with argu­ments presented on blogs. Shirky’s argument could be sum­mar­ised by saying that the serious, top-​​down organ­isa­tion systems we have are the con­sequence of the things we have to organise. When you need to sort out some­thing physical and finite like a big pile of books, for example, you need to decide on some sort of over-​​riding system. That might be author A-​​Z, title A-​​Z, subject A-​​Z or a com­bin­a­tion of those. The lib­raries, CD col­lec­tions and cor­porate data­bases you encounter are probably organ­ised along these lines in one way or another.

When it comes to the Internet, of course, the sheer amount and volume of inform­a­tion staggers these onto­lo­gical approaches. There isn’t a single, useful dir­ectory of the Internet. There’s still the open dir­ectory and the yahoo dir­ectory, of course, but both have become ser­i­ously stag­nated over the last couple of years. When I inter­viewed Yahoo’s European manager and asked him about the feature that had effect­ively made its brand famous, he said that the company now feels it’s not a focus for devel­op­ment. I think they’re right: the model isn’t useful any more.

When you’re talking about the Internet and billions of doc­u­ments, tra­di­tional onto­lo­gies break down. Even if you could impose them, exactly how much use is a dir­ectory con­taining a million results once you’d drilled down to Literature>Authors>Shakespeare>Works>Hamlet>Criticism?

Very little. And that’s why we use search engines and social book­marking nowadays; not directories.

So people tag things using del.icio.us according to their own whim. There are enough of us that ‘your own whim’ coin­cides with a lot of other people’s whims. The more people who are tagging means that there’s more chance that people will tag along the same lines as you. That’s the gist of Metcalfe’s Law — that the value of a network increases as the square of the number of users. Once there are lots of users, as I’ve sug­gested before, social search, or non-​​linear search, becomes possible.

Back to the study. The authors found that there’s actually a lot of user con­sensus about which tags to use for a lot of sites. Shirky sug­gested that there’s a power-​​law style falling off in the tags used to describe a site, like this:

power law tags

The research found that this varied rad­ic­ally according to the URL chosen. Often there is a much gentler fall-​​off than the power-​​law descrip­tion would suggest. These are some of the results for www.pocketmod.com, a clever site which allows users to create printed personal organ­isers on a single sheet of paper.tags

It looks a lot like we agree on a lot of things according to this analysis. Most users apply one to three tags, appar­ently, so the graph suggests it seems likely that we’ll use a tag that’s used by someone else. Good news for anyone using del.icio.us for search.

There are two big issues, though, that arguably cripple the true power of the del.icio.us service to provide social search; one is tech­nical and the other is about user behaviour.

The tech­nical issue is that del.icio.us doesn’t allow multi-​​word tags. Nor does it under­stand that ‘This’ is the same as ‘this’. You have to use some sort of impro­visa­tion to create a tag when a only a two-​​word descrip­tion seems appro­priate. You might tag the same thing, del.icio.us itself perhaps, as ‘social­book­marking’, ‘social-​​bookmarking’, ‘social&bookmarking’ or ‘social_​bookmarking’. God forbid capital letters get thrown into the mix.

The study con­ducted ‘co-​​word analysis’ on the sites it examined. Co-​​word analysis looks at how fre­quently two tags crop up together and helps analysts try to estab­lish patterns of beha­viour. Unfortunately, since ‘social book­marking’ is a term rather than a word, it was dif­fi­cult to estab­lish whether it was common meme or not.

The analysts looked at the site www.bellybytes.com looking for clusters of tags, but because ‘Nutrition’ isn’t the same as ‘nutri­tion’, there’s not the cluster that you would expect.

coword2

The second issue comes about because del.icio.us book­marking is ostens­ibly just for you: the benefit for the com­munity is a by-​​product in many respects. One of the most fre­quently used tags on del.icio.us is ‘to read’, followed by things like ‘GTD’ or ‘cool’. These tags are about the user, not the subject. They aren’t helpful to anyone else. They are neces­sarily temporal — you either get round to reading it or you never do; things stop being cool after a while; either you GTD’d it, or you didn’t. In any case it loses its usefulness.

Fortunately, of course, no-​​one searches for the tag ‘to read’. You’d end up with a load of over-​​long blog posts like this one. However, the authors hint at the pos­sib­il­ities of a two-​​dimensional del.icio.us, where time is the second axis. The tags ‘to read’, ‘cool’ and ‘gtd’ might poten­tially become a common, vital property that the system is able to rank according to its age. If you were able to search on the tags you wanted plus a current ‘cool’ list then that would be ermm.. cool.

Share this post:

Digg This
Reddit This
Stumble Now!
Buzz This
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Possibly related:

8 comments to Of Tags and Taxonomy

  • Great post, Ian — and thanks for reading the tagging paper (don’t think I need to now ;-)).

    I have tagged this post because it’s bril­liant. I’m using Blue Dot since I read a rave review on it yes­terday instead of del.icio.us — itrreally does seem to be quite good so far and you can import all your del.icio.us bookmarks.

    BTW — in my research the cool thing about del.icio.us I have found is it can be an instant snapshot of what that com­munity feels about a site or a number of similar sites — the number of tags and the com­plexity of the tag cloud gen­er­ally reflects how rel­at­ively useful — and why it is useful — each site is vs. its peers.

  • That’s a great point about using del.icio.us as a baro­meter for the community’s feelings. And not a use I had thought of before.

    (Agree on bluedot. I use it too and it does a great job. Their plug-​​in for IE7 is really good too. I am ‘iand’ on there if you want to befriend me ;) )

  • Del.icio.us also offers up “popular tags” when you go to tag some­thing — i.e. a selec­tion of what others have used to tag the same item. I wonder how often users take these recom­mend­a­tions, and how that might’ve affected the study? I know I some­times will add a tag or two from the “popular tags” list, just so others who use my del.icio.us tags will be able to find the item, not just me.

  • I do that, too, Robin. I kind of consider it ‘good cit­izen­ship’ on del.icio.us to make my book­marks useful to other people. I wonder if most of us think that way, either con­sciously or unconsciously?

  • One point regarding the temporal aspect of Del.icio.us tags. Where does that fit in with Chris Anderson’s remarks on the longtail of journ­alism which sug­gested that the older the material the greater value it had?

    I don’t have the answer!

  • I guess.… del.icio.us book­marking is metadata anyway, in some senses. Your tags aren’t registered by Google and doesn’t influ­ence page rank AFAIK. Same with digg and other social book­marking services. However, it does empower social search, in a way. Adding ‘toread’ on the end of a del.icio.us search would help you narrow down to the articles people thought were important at the time they were pub­lished. Not sure whether that answers the question or not.

  • sarava

    hello
    can someone give me an idea of what are the common points of web 2.0 sites

  • […] been a little while since I wrote on this topic, but social (and not-​​so-​​social) book­marking is on the rise […]

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>