PCs Powered by the Wisdom of Crowds

Physorg.com - daily reading in the Delaney household - has posted an interesting piece on how researchers at the Technion-Israel Institute of Technology have fed their computers a diet of Wikipedia to make them more clever. While Web 2.0 sceptics might have preferred the Britannica, the point of the exercise is to make the machines capable of forming associations, the same way you and I do when we think.

It seems to me like one more step towards the Holy Grail next-gen Google-killer search engine. Google can already do synonyms: when you search for cars, it also returns results that talk about automobiles. What it can’t do is make any associations. If you search for ‘Iraq War’, it won’t return results on ‘axis of evil’, ‘bush foreign policy’ or ‘mutually-assured destruction’. It doesn’t know what topics are associated with your search term. By giving a machine a diet of (mainly) intelligent discussion about as many topics as possible, they’ll be able to find pages that are relevant to your search term, but which aren’t keyword heavy.

The second example given is a spam filter. A simple filter might block anything containing the word ‘vitamin’ that comes from a stranger. A filter which has been taught a little more would know that ‘B12′ is a vitamin and be able to distinguish scientific discussion from a sales pitch.

The method could also apparently be used to improve automated translations. When a simple translator comes across the word ‘mouse’, it doesn’t know if it’s a rodent or a computer peripheral. If it knew enough about the context in which the word appeared, though, it would be able to disambiguate the passage it was working on.

The article doesn’t mention the expression, but this is very much along the lines on which the semantic web is supposed to work. Wikipedia provides an ontology for machines to have some understanding of human text. It isn’t quite artificial intelligence, but it’s quite close: the machines use our intelligence to simulate their own.

N.B. Interestingly, my friend Marc Fawzi described exactly this idea in a piece he posted on the subject last June.


4 Comments

That Wikipedia 3.0 idea really made an impression on many people. It may have influenced Google Co-Op, Wikia’s revived search engine project, this project you menion here and other projects to come.

I’ll have more stuff happening on Evolving Trends and a sister site that will be dediacted to disruptive software development (a la semantics, AI, and nifty little things)

:)

Marc

Ian, I would be surprised if someone ‘enterprising’ has not thought of using Wikipedia to generate dummy content for spam mails to beat the spam filters themselves.

Good point, Vern. I’m sure we both already get those ‘literary’ spam mails that attempt a similar tactic to try to cheat filters.

Or … you can use this technique which will generate infinite number of well-formed sentences, some of which make perfect sense.

http://evolvingtrends.wordpress.com/2007/01/13/self-aware-text/

Would be nice to program the SK spin-glass model and see what kind of sentences get generated courtesy of 1/f quantum noise.

I have a friend who might attempt translating the model for this appliaction.

There has been at least one formally published scientific paper discussing the applicability of the spin glass model to the semantic web.

I’ve been into this sinc ‘87! but on the neural networks side, not the potential application of the model to semantics.

Sorry to be so verbose about such a niche/obsecure topic.

Marc


Leave a Comment