Not the band or that famous book, you’ll need to go elsewhere for that. This is about the backstory to Muddy, which we thought would be nice to share, because although Muddy is essentially ‘middleware’, we want to think of it as a ‘consumable’, as an application with a life, that real people use and engage with (albeit not that many!).
Way back in March 2007, Rob and I submitted an idea to the BBC Labs, then run by Matt Locke, to improve the (horizontal) navigation across the BBC by grounding news articles in ’subjects’ people could peruse. It wasn’t an earth shattering idea, but it came from our frustration in a BBC News experience that was still about ‘pages’ and very ‘flat’ (it’s improved since then). So, together with Paul Farnell (a designer friend and CEO of Litmus) we spent five days in North Yorkshire taking the idea to pieces and re-building it.
What came of this process was a) a commission from BBC News and b) a greater appreciation of Wikipedia (and dbpedia which extracts structured information from Wikipedia ‘infoboxes’ and creates usable subject-predicate-object relationships from that data) for joining up content by acting as a ‘controlled vocabulary’, a glossary for an ever expanding range of concepts and things.
So, we produced a prototype ‘application’ for BBC News called Muddy Boots (we called it Muddy Boots because we felt we were trampling across the rather pristine lawn that is the BBC). Muddy Boots took BBC News articles and identified ‘notable things’ (i.e. things in Wikipedia) in the articles and then via an algorithm and a social bookmarking service we attempted to provide relevant links on the web for that news story. It kinda worked. Jonathan Austin did a write up of it on the BBC News Journalism Labs blog which gives a fair bit of detail. Whilst we were waiting for the testing phase run by BBC News to happen, we continued to develop Muddy Boots as we were interested in where it could go. As we developed it we dropped the ‘Boots’ bit of the name.

By early 2008 we had a working web service with an API that essentially took documents and found ‘notable entities’ which were then grounded in dbpedia. This was interesting in that we could then start to map notable entities across ‘domains’. If that sounds like gobbledegook then perhaps this example will help:
We took BBC Music and looked at artists’ pages (which at that time didn’t link to other bits of the BBC, mainly because they couldn’t) and we ran Muddy across those pages, found the artist and then did the same across BBC News, pulling back BBC news stories about that artist (again with a certain level of confidence so it was likely the story was ‘about’ the artist rather than just including them). We then pushed those stories back to the BBC Music artist pages. What benefit does this have? Well, the premise is that people who find themselves at an artist page are interested in things about that artist, rather than just things related to music.
So, enabling this page on BBC Music about Britney Spears…

To reference news articles on Britney…

This got a few people excited and we developed Muddy further for the BBC. And that’s how it’s been for much of the last year, developing, iterating and building a service (as application and web service) that was both robust and scalable, built as it is in the ‘cloud’. Along the way we’ve become ‘expert’ in linked data and aspects of the semantic web, although Muddy isn’t a full blown semantic web service. We’ve also seen what Muddy isn’t so good at, for example specific domains like healthcare, where NLP based systems can be trained to be more effective in some use cases. Where Muddy does seem to excel is in extracting notable things from large corpuses of data and enabling you to say things about those entities, like whether they are people, places, events and say things ‘about’ them, drawn from the dbpedia references. For example, if it’s a person, their webpage or place of birth or age or ‘role’. That can be pretty cool.
Of course as we were developing Muddy, other similar projects were also taking shape, like Open Calais and Zemanta. This was both ‘good’ and ‘bad’. ‘Good’ in that clearly there was a market for doing what we were doing, the needs above were ‘real’, and actually the other products were educating the market in the possibilities of data mining. However, it was also ‘bad’ as these competitors were also getting the first-to-market advantages. But we believe that Muddy offers benefits over these other services, benefits which you’ll be able to see for yourself when you explore.
Where now? We’re showcasing a couple of things we’ve done using Muddy and we’re listening to what other people might like to do (and will do) with Muddy. So please shout if you want an explanation or help in getting your head around Muddy or if you have an idea for what data would be interesting to play with and we’ll help.
[...] is more information over on the Muddy site and the Muddy blog, detailing how the background to the service, how to start using it and other stuff. It’s not for everyone. There are no flashing lights [...]