The Semantic Lunch

web

Lunch today with John Davies, who’s in charge of next-web research for BT. It was quite a long, or rather intense, discussion, so I’ll only tackle the basics here. I’ve been trying to nail this semantic web issue for some time, but every time I start reading an academic paper, my attention seems to wander off. So this was a good opportunity for me. I wasn’t going to deviate. As soon as he sat down, I was in with my carefully prepared, top journalist’s question: “so what’s this semantic web thingy, then?”

It turns out that that is one of the more difficult questions. (Damn!) It depends on what you mean. You might mean turning the billions of existing web sites semantic or only about possible future sites or services. The second of these options is the most likely outcome at the present. Semantic web is partly about annotating web pages to make them amenable to machines. John prefers the expression ’semantic technologies’ to avoid this confusion.

At the moment, information on the web is pretty much designed for human consumption. You and I know when we go to a shopping site that the figure in bold is the price, that a certain number is the product code and that this piece of information is about the shipping information. To a machine, it may make no sense whatsoever. If machines are to be able to bring together all these different pages to make the web more useful, then they need to be able to read them.

We’ll see the first applications of semantic technologies in the enterprise space. Its need is more acute. They have lots of databases, all built by different people according to different rules. Integrating the information from those is already a very costly and time-consuming activity. One database may talk about CustomerName, another may refer to CustomerID, for example. Joining these things together, so perhaps, a support department knows about what equipment the logistics department has installed for a customer, improves business efficiency. Semantic technologies put what Davies called a “wrapper” around these different data sources to create overarching access, connecting different datasources in a way that doesn’t require nearly so much human effort.

People developing semantic technologies work by developing an ontology for understanding the sort of data it’s looking at and the technology will be able to do some reasoning based on this. An ontology means a form of classification system for whatever it is that’s being examined. For foods, that might include their ingredients, nutritional properties, suppliers and type. It’s not just a list, though, but will also understand the relationships between different items. An ontology developed for food might come across E101 and additives and CrispyPop bar. It will know that E101 falls under additives which are part of the ingredients of the bar. If that description then gets combined with a database of shops at a wholesaler where you might send the bar, then the semantic agent will calculate that health food shops aren’t going to stock CrispyPop bars. It’s not intelligence in any way, but the application of rules that the creators have decided upon.

Because the semantic technologies are lightweight and open source, they are potentially available to any company. For this reason, enterprises that get some of their data from external sources will still be able to use semantic approaches to integrate and drill the ensuing combination. These are my words, not John’s, but one way to think of the technologies is as providing a toolkit for more easily creating web mashups. Companies already exist, such as Cerebra and Ontoprise, to sell ways to integrate enterprise information using OWL, the web ontology language.

I kind of understood, so far, but I needed a good example. I suspect you may be the same.

BT works closely with the National Health Service in the UK. The Service has already gone a long way into digitising and collating its information through the Electronic Patient Record system and also information on medical knowledge through the SNOMED classification system. Unfortunately, though, the data can still be very dispersed. The X-Ray department might have a patient’s data on a different system to the Pharmacy, for example, and those might be completely unconnected to the systems used in a different hospital or by a clinic.

What semantic web technologies bring to this is unity and also what John called Description Logics. It can prise open the different databases, allow an overview, but also calculate with it. Imagine a patient’s medical record says that they are allergic to almonds. Then a doctor misses this and somehow prescribes a nut-based food. When the nurse enters this into the patient’s record, the semantic application will use its ontology to work out that almonds are nuts and that therefore this is a very bad idea. Semantic technologies that can perform calculations like this, and potentially save lives, are already in use in the UK Health Service.

I’ll leave this there for today. My head hurts already. If there’s interest, I’ll be happy to do a follow-up on another day.

2 Trackbacks & Pingbacks


7 Comments

Ian, I share your pain, it’s the semantic migrane ;-)

But I really want to understand! So, with that in mind I went to a lecture on ‘Ontologies & The Semantic Web’ given by Professor Ian Horrocks at the Royal Society in December last year. It was a great talk and it actually helped me over the basic hurdles. However, some of the details were too much (read my report on the talk and you’ll see exactly where my brain melted).

At a recent Beers & Innovation event (which I organise) on Web Services & Mash Ups, Simon Willison who’s Technology Dev in Yahoo! said that the development of web data has come in three broad stages: unstructured, structured and standardised. We’re still struggling to get data structured at the minute.

Tom Loosemore, Project Director for BBC 2.0, said at the same event that standards cannot be fixed when the environment is not mature. Tim Berners-Lee would now find it hard to argue that the web has not grown as a mess, Tom said and he stressed that standards emerge and imposing them stifles innovation. Interesting point. And of course I don’t dream that he is saying they emerge unconciously, but from collaboration and testing I suppose….

Anyways, that’s my less than 2 cents. Horrock’s talk was good and I didn’t see any other write ups of it, but then most of the other people there weren’t random non-sciencey types like me ;-)

Hey… how come you have no link to the set of articles (or at least the original article) I had published that seeded your interest in the semantic Web?

;-)

Marc

My guy was quite keen to separate the idea of annotating individual web pages through xml from the idea of semantic web services.
In fact, he was quite disparaging about Microformats because they are prescriptive of the content they are used to represent. Semantic web services are designed to be descriptive, and thus inherently more flexible. While the ontologies are formal frameworks, the documents they describe can be anything.
Hang on, my headache’s coming back.

Marc - I’ll find some pretext for writing about it soon. ;)

We’re bloggers, we need no stinking pretexts!

BTW, I used to think it’s 98 all over again with Web 2.0 and all the hype. Now I think it’s 95 all over again. Well, at least in the area I’m in, which is experiencing the same growth the Web was experiencing in 95.

Your blog’s template/html has been evolving faster than any blog out there and somehow it has gone from good to better through all those changes. Thumbs up for getting rid of the Google NonSense ads.

:)

Marc

Like you, I blog for lots of reasons: personal pleasure, because it helps me work things out, to have a bit of dialogue about those things, to perhaps increase the audience for the eventual book, to maybe get some other work, to attract sponsorship.

Google doesn’t really factor into *any* of that. Sod ‘em.

Interesting is Google’s problem with the semantic web. A web site should have a machine readable and understandable version of its content - whose current form is a microformat. But given the abuse of existing web page descriptives by Spammers, like the metanames abuse, Google is not too impressed.

I think it is a minor issue and, as you say, annotating the current web should be separate from the overall semantic web project. Yet it sadly intrudes into the debate as in Berners-Lee’s recent Berners-Lee’s recent Question & Answer session with Google’s Norvig


Leave a Comment