<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Muddy blog</title>
	<atom:link href="http://blog.muddy.it/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.muddy.it</link>
	<description></description>
	<lastBuildDate>Mon, 18 Jan 2010 12:52:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using Muddy as a simple entity extractor</title>
		<link>http://blog.muddy.it/2010/01/using-muddy-as-a-simple-entity-extractor</link>
		<comments>http://blog.muddy.it/2010/01/using-muddy-as-a-simple-entity-extractor#comments</comments>
		<pubDate>Mon, 11 Jan 2010 14:34:27 +0000</pubDate>
		<dc:creator>robl</dc:creator>
				<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://blog.muddy.it/?p=146</guid>
		<description><![CDATA[Muddy performs a few different tasks, and you may find you don&#8217;t need all of them initially.  Before building on top of Muddy, the most common task people want it to perform is to act as a simple term/concept/entity extraction API.  That is, given a piece of text, return the notable things that occur in [...]]]></description>
			<content:encoded><![CDATA[<p>Muddy performs a few different tasks, and you may find you don&#8217;t need all of them initially.  Before building on top of Muddy, the most common task people want it to perform is to act as a simple term/concept/entity extraction API.  That is, given a piece of text, return the notable things that occur in it.  In order to support this we&#8217;ve recently added a simpler API method (&#8216;extract&#8217;) that doesn&#8217;t require a collection and doesn&#8217;t store the entity extraction results.  The API can be used with or without a muddy account, you&#8217;ll be limited by IP address if you&#8217;re not authenticated.</p>
<p>A sample (unauthenticated) curl session is shown below :</p>
<pre class="brush: plain;">
echo '&lt;page&gt;
&lt;text&gt;Gordon Brown and Tony Blair went to town.&lt;/text&gt;
&lt;options&gt;
&lt;realtime&gt;true&lt;/realtime&gt;
&lt;/options&gt;
&lt;/page&gt;' | curl -X POST -H 'Content-type: text/xml' -H 'Accept: text/xml' -d @- http://muddy.it/extract

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;response status=&quot;OK&quot;&gt;
  &lt;title&gt;&lt;/title&gt;
  &lt;entities&gt;
    &lt;entity&gt;
      &lt;term&gt;Tony Blair&lt;/term&gt;
      &lt;uri&gt;http://dbpedia.org/resource/Tony_Blair&lt;/uri&gt;
      &lt;confidence&gt;1.0&lt;/confidence&gt;
      &lt;classification&gt;http://muddy.it/ontology/Person&lt;/classification&gt;
      &lt;position&gt;17&lt;/position&gt;
    &lt;/entity&gt;
...
</pre>
<p>Some sample code to extract &#8216;terms&#8217; from a given piece of source text using the muddyit_fu gem and the new extract method is shown below :</p>
<pre class="brush: ruby;">
#!/usr/bin/ruby
require 'rubygems'
require 'muddyit_fu'
muddyit =  Muddyit.new('./config.yml')
page = muddyit.extract(ARGV[0], :disambiguate =&gt; false, :include_unclassified =&gt; true)
puts &quot;Contains:&quot;
page.entities.each do |entity|
  puts &quot;\t#{entity.term}&quot;
end
</pre>
<p>The script expects a text string as it&#8217;s first argument and prints out the extracted terms to STDOUT :</p>
<pre class="brush: plain;">
ruby extract.rb &quot;Gordon Brown and Tony Blair went to town&quot;
Contains:
	Tony Blair
	Gordon Brown
</pre>
<p>As we want to retrieve as many terms as possible from the source text, we expand our list of available entities by including ones that have no classification and we disable disambiguation to improve response times (as we&#8217;re only interested in the text terms rather than a grounded entity).  If we wanted to retrieve disambiguated, grounded entities, rather than just text terms, then the &#8216;disambiguate&#8217; option can be enabled again to ensure any entities identified have been disambiguated where appropriate.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.muddy.it/2010/01/using-muddy-as-a-simple-entity-extractor/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Building with Muddy and OAuth</title>
		<link>http://blog.muddy.it/2010/01/building-with-muddy-and-oauth</link>
		<comments>http://blog.muddy.it/2010/01/building-with-muddy-and-oauth#comments</comments>
		<pubDate>Mon, 11 Jan 2010 13:58:48 +0000</pubDate>
		<dc:creator>robl</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://blog.muddy.it/?p=114</guid>
		<description><![CDATA[There are two authentication methods provided when building against the muddy system, OAuth and HTTP Basic Auth.  We strongly recommend using OAuth when allowing other systems access to your data in Muddy, as using HTTP Basic Auth can be a security risk.  However, HTTP Basic Auth is easier to use, often has better support in [...]]]></description>
			<content:encoded><![CDATA[<p>There are two authentication methods provided when building against the muddy system, <a href="http://en.wikipedia.org/wiki/OAuth">OAuth</a> and <a href="http://en.wikipedia.org/wiki/Basic_access_authentication">HTTP Basic Auth</a>.  We strongly recommend using OAuth when allowing other systems access to your data in Muddy, as using HTTP Basic Auth can be a security risk.  However, HTTP Basic Auth is easier to use, often has better support in many development languages and can be appropriate to use if you are aware of it&#8217;s risks and are happy to work with them.</p>
<p>In the introductory &#8216;<a href="http://blog.muddy.it/2009/11/getting-started-with-muddy">Getting Started with Muddy</a>&#8216; article, we used HTTP Basic Auth in the example given.  We&#8217;ll now re-work it using OAuth.   If you are unfamiliar with OAuth, then you might want to have a look at <a style="color: #227499; text-decoration: none; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #cccccc;" href="http://oauth.net/">oauth.net</a> for further information.</p>
<p><strong>Authenticating with Muddy</strong></p>
<p>In order to allow your programs to work with Muddy, you’ll need to register them as client applications with your Muddy account first.  To register an application, login and then visit the <a style="color: #227499; text-decoration: none; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #cccccc;" href="http://www.muddy.it/oauth_clients">oauth clients page</a>, click ‘Register your application’ :</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-30" style="border: 1px solid black;" title="Muddy Register Application" src="http://blog.muddy.it/wp-content/uploads/Muddy_register_app_1257866391247.png" alt="Muddy Register Application" width="600" height="312" /></p>
<p>Add a title and application URL and any other relevent attributes and then click ‘Register’ :</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-31" style="border: 1px solid black;" title="Muddy Registered Application" src="http://blog.muddy.it/wp-content/uploads/Muddy_registerd_app_1257867097110.png" alt="Muddy Registered Application" width="600" height="370" /></p>
<p>The ‘Consumer Key’ and ‘Consumer Secret’ are the attributes you’ll need to authorise your client application to access your Muddy data.</p>
<p><strong>A sample application : Newsminer</strong></p>
<p>In the previous article we created a small application called &#8216;Newsminer&#8217;.  We&#8217;ll rework this now, using OAuth instead of HTTP Basic Auth.   Again, we’ll use the <a style="color: #227499; text-decoration: none; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #cccccc;" href="http://github.com/rattle/muddyit_fu">muddyit_fu</a> Ruby client library.</p>
<pre class="brush: ruby;">
#!/usr/bin/ruby
require 'rubygems'
require 'muddyit_fu'
require 'rss'
require 'open-uri'
config = { :collection_token =&gt; 'mwkllxs7',
           :consumer_key =&gt; 'Ta0kS7jAkezMmJTQYMKStQ',
           :consumer_secret =&gt; 'sEXDiVSWHVc9kqjWQ2bRDU3I1gnplDTDwB5MEJWxnNE',
           :access_token =&gt; 'Har7Us3ZsOaN6TpqwW0AA',
           :access_token_secret =&gt; '96PJgoZIxAKXiJKwu323wyh6UlhezPoLdtQShsbL0'
}
# Connect to Muddy
muddyit =  Muddyit.new(:consumer_key =&gt; config[:consumer_key],
                       :consumer_secret =&gt; config[:consumer_secret],
                       :access_token =&gt; config[:access_token],
                       :access_token_secret =&gt; config[:access_token_secret])
collection = muddyit.collection.find(config[:collection_token])
# Parse RSS
rss_content = ''
open('http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/uk_politics/rss.xml') do |f|
  rss_content = f.read
end
rss = RSS::Parser.parse(rss_content, false)
# Loop through, analyse and display entities
rss.items.each do |item|
  page = collection.pages.create(item.guid.content, :realtime =&gt; true, :store =&gt; true)
  puts &quot;#{item.guid.content} contains:&quot;
  page.entities.each do |entity|
    puts &quot;\t#{entity.term}, #{entity.classification}&quot;
  end
end
</pre>
<p>In order for the script to work, you’ll need to login note down the token for the collection your content is stored in (the ‘collection_token’), you can access this by visiting ‘Dashboard’ → ‘View analysed Pages’ → ‘Settings’.  You’ll also need to authorise the script via OAuth.  To do this you’ll need to register a client application as described previously, you can then use the <a style="color: #227499; text-decoration: none; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #cccccc;" href="http://github.com/rattle/muddyit_fu/blob/master/examples/oauth.rb">convenience script</a> provided with muddyit_fu to obtain the authentication details required by the newsminer script, a sample session is shown below :</p>
<pre class="brush: plain;">
$ ruby ./examples/oauth.rb

&gt; enter consumer key
45048ANdEByjSuF2IogpQ
&gt; enter consumer secret
9uew3saTCM2RlEU0k122RgbkMUZdNKpTLJM1mJiX5jw
&gt; redirecting you to muddy to authorize
&gt; opening http://muddy.it/oauth/authorize?oauth_token=ZXdoJsaphYwdBpLpt9xSZw
&gt; authorize in the browser and then press enter

Access Details

Token : tuiBqD5ct6eZ1RlxNKdQ
Secret : EO9wJB2Xz7sEneoWqcOCnqslkSit4M9muJes4SF4
</pre>
<p>Add these details into the script and then execute it and you’ll see the BBC News pages being indexed and the entities identified in them and if you login to Muddy you’ll see the indexed pages :</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-78" style="border: 1px solid black;" title="Muddy - BBC News Stories" src="http://blog.muddy.it/wp-content/uploads/Muddy-BBC-News_1257881855839.png" alt="Muddy - BBC News Stories" width="700" height="229" /></p>
<p>Thats it, as you can see OAuth is a bit more complicated to use than HTTP Basic Auth but it&#8217;s well worth using if you&#8217;re giving third parties access to your data.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.muddy.it/2010/01/building-with-muddy-and-oauth/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Getting Started with Muddy</title>
		<link>http://blog.muddy.it/2009/11/getting-started-with-muddy</link>
		<comments>http://blog.muddy.it/2009/11/getting-started-with-muddy#comments</comments>
		<pubDate>Tue, 10 Nov 2009 17:50:15 +0000</pubDate>
		<dc:creator>robl</dc:creator>
				<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://blog.muddy.it/?p=6</guid>
		<description><![CDATA[For those of you that aren&#8217;t aware, Muddy is a webservice that allows you to mine your content and use it in ways you hadn&#8217;t previously been able to.  It combines elements of entity extraction, natural language processing and linked data to enable you to pick out the notable &#8216;things&#8217; in your content and provides [...]]]></description>
			<content:encoded><![CDATA[<p>For those of you that aren&#8217;t aware, Muddy is a webservice that allows you to mine your content and use it in ways you hadn&#8217;t previously been able to.  It combines elements of<a href="http://en.wikipedia.org/wiki/Named_entity_recognition"> entity extraction</a>, <a href="http://en.wikipedia.org/wiki/Natural_language_processing">natural language processing</a> and<a href="http://en.wikipedia.org/wiki/Linked_data"> linked data</a> to enable you to pick out the notable &#8216;things&#8217; in your content and provides<a href="http://www.bbc.co.uk/blogs/radiolabs/2008/06/the_simple_joys_of_webscale_id.shtml"> web-scale identifiers</a> to describe them, allowing you to dig into your content and data and provide new views on existing and newly published content.  This post is going to be a quick introduction to Muddy and how to start using it with your own data.</p>
<p><a href="http://blog.muddy.it/2009/11/getting-started-with-muddy"></a> <a href="http://en.wikipedia.org/wiki/Linked_data"></a> <strong>The basics</strong></p>
<p>Everything in Muddy belongs to a &#8216;collection&#8217;, a collection is a container for analysed content, think of it like a folder for documents.  You only get one collection with a free Muddy account, so don&#8217;t worry too much about this for now, all your content will end up in that collection.  Within a collection there are multiple &#8216;pages&#8217;, a page is a piece of web content (or text) that has been analysed by Muddy.  Finally, in every page there are &#8216;entities&#8217;, an entity is a notable &#8216;thing&#8217; that has been identified as occurring in the content.   Every entity is &#8216;grounded&#8217;  using a linked data identifier, by this we mean it&#8217;s unambiguous.  For example, when talking about &#8216;Apple&#8217; Computers this identifier <span style="font-family:courier">http://dbpedia.org/resource/Apple_Inc.</span> is used, when talking about &#8216;Apple&#8217; Records the identifier <span style="font-family:courier">http://dbpedia.org/resource/Apple_Records</span> is used.  This allows Muddy to describe the ambiguous term &#8216;Apple&#8217; in different ways based on the context of the page it appears in.</p>
<p><strong>Linked data</strong></p>
<p><strong></strong> Every entity identified in Muddy is notable in some way.  Muddy uses Wikipedia as it&#8217;s proxy for notability, if there&#8217;s a page on Wikipedia for it, then Muddy should know about it.  In many cases Muddy also knows what kind of thing it is, be it a Person, a Place or a Company (or many others).  Muddy uses a common identifier for each entity it identifies (as defined by the <a href="http://dbpedia.org">dbpedia</a> project) meaning you can relate your data to other web content that uses the same identifiers or possibly start marking up your content in new ways (have you seen <a href="http://commontag.org">commontag</a> ?).</p>
<p><strong>How does it work ?</strong></p>
<p><strong></strong> Muddy uses the dbpedia project as it&#8217;s list of notable things, it&#8217;s &#8216;<a href="http://en.wikipedia.org/wiki/Controlled_vocabulary">controlled vocabulary</a>&#8216;.  Muddy analyses the submitted content and finds relevant notable things that are mentioned and determines if they are the ones in the controlled vocabulary.  Many terms in the English language are ambiguous, fortunately dbpedia &#8216;knows&#8217; if something is ambiguous and Muddy picks the correct disambiguation based on the textual content of the page being analysed.  Muddy provides a confidence score based on a number of factors, including the ambiguity of the identified term and it&#8217;s contextual relevance to the content it appears in.  This confidence score can be used to filter the quality of the results returned.  Muddy uses intelligent extraction algorithms to identify and analyse only the core text for a submitted webpage, it can determine where the key content on a page is and analyse only that, meaning that irrelevant content such as sidebar and footer elements aren&#8217;t included.  Let&#8217;s see an example of Muddy in use, here we have the results page for a news story from the Guardian.  It shows the entities extracted from the article :</p>
<p style="text-align: center"><img class="size-full wp-image-8 aligncenter" style="border: 1px solid black" src="http://blog.muddy.it/wp-content/uploads/Muddy-Guardian-UK-2009-Ukip-not-BNP-set-to-benefit-from-anti-politics-…_1257851709057.png" alt="Muddy - Entities View" width="630" height="309" /></p>
<p>You can also start to &#8216;dig into the data&#8217;, for example, seeing which other articles feature a particular entity :</p>
<p style="text-align: center"><img class="size-full wp-image-15 aligncenter" style="border: 1px solid black" src="http://blog.muddy.it/wp-content/uploads/Muddy-Guardian-UK-2009-Entities-Harriet-Harman_1257853034384.png" alt="Muddy - Stories from entities" width="630" height="323" /></p>
<p>By finding content that shares similar entities, it&#8217;s possible to define new paths through the indexed content, whether that&#8217;s by aggregating pages around entities or finding related pages by looking at pages that share common entities.  Muddy makes this easier by providing views and APIs for both of these.</p>
<p><strong>Building with Muddy</strong></p>
<p>Muddy is a webservice, it&#8217;s designed to be built on.  What kinds of things could you build ?  How about defining new ways into content for the BBC :</p>
<p style="text-align: center"><a href="http://channelography.rattlecentral.com"><img class="size-full wp-image-19 aligncenter" style="border: 1px solid black" src="http://blog.muddy.it/wp-content/uploads/Channelography_1257855051840-630x363-custom.png" alt="Channelography" width="630" height="363" /></a></p>
<p>Or examining where the news happens in the UK ?</p>
<p style="text-align: center"><img class="size-full wp-image-24 aligncenter" style="border: 1px solid black" src="http://blog.muddy.it/wp-content/uploads/newsography-630x594-custom.png" alt="Newsography" width="630" height="594" /></p>
<p>Both of these applications were built using Muddy.  How do you go about building your own ?  Muddy exposes it&#8217;s functionality as RESTful APIs with multiple response formats.  You can see sample XML responses by adding .xml to the end of most URL&#8217;s presented by Muddy.  For more in depth details (including API options), please refer to the <a href="http://muddy.it/developers">Muddy developer guide</a>.</p>
<p style="text-align: left"><strong>A sample application : Newsminer</strong></p>
<p style="text-align: left">Now we&#8217;ve covered the basics of Muddy, lets try and build a simple application.  In this case we&#8217;ll create an RSS indexer for indexing the latest news stories from the BBC.  To do this, we&#8217;ll use the <a href="http://github.com/rattle/muddyit_fu">muddyit_fu</a> Ruby client library.</p>
<pre class="brush: ruby;">
#!/usr/bin/ruby
require 'rubygems'
require 'muddyit_fu'
require 'rss'
require 'open-uri'

# Connect to Muddy using HTTP Basic Auth
muddyit =  Muddyit.new(:username =&gt; 'myusername', :password =&gt; 'mypassword')
collection = muddyit.collection.find(config[:collection_token])
# Parse RSS
rss_content = ''
open('http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/uk_politics/rss.xml') do |f|
  rss_content = f.read
end
rss = RSS::Parser.parse(rss_content, false)
# Loop through, analyse and display entities
rss.items.each do |item|
  page = collection.pages.create(item.guid.content, :realtime =&gt; true, :store =&gt; true)
  puts &quot;#{item.guid.content} contains:&quot;
  page.entities.each do |entity|
    puts &quot;\t#{entity.term}, #{entity.classification}&quot;
  end
end
</pre>
<p style="text-align: left">Muddy provides two ways to authenticate your requests, <a href="http://en.wikipedia.org/wiki/OAuth">OAuth</a> and <a href="http://en.wikipedia.org/wiki/Basic_access_authentication">HTTP Basic Auth</a>.  We strongly recommend using OAuth as it represents less of a security risk.  For brevity, we&#8217;ve used HTTP Basic Auth in this example, however you can find the same example with the OAuth setup details in the &#8216;<a href="http://blog.muddy.it/2010/01/building-with-muddy-and-oauth">Building with Muddy and OAuth</a>&#8216; article.</p>
<p style="text-align: left">Execute the script and you&#8217;ll see the BBC News pages being indexed and the entities identified in them and if you login to Muddy you&#8217;ll see the indexed pages :</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-78" style="border: 1px solid black;" title="Muddy - BBC News Stories" src="http://blog.muddy.it/wp-content/uploads/Muddy-BBC-News_1257881855839.png" alt="Muddy - BBC News Stories" width="630" height="206" /></p>
<p style="text-align: left">Hopefully, this has given you a useful introduction to Muddy, how it works and how you could go about using it in your own applications.  For further details on the various elements of the API please see the<a href="http://muddy.it/developers"> developer guide</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.muddy.it/2009/11/getting-started-with-muddy/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Genesis</title>
		<link>http://blog.muddy.it/2009/11/genesis</link>
		<comments>http://blog.muddy.it/2009/11/genesis#comments</comments>
		<pubDate>Tue, 10 Nov 2009 17:44:45 +0000</pubDate>
		<dc:creator>jamesb</dc:creator>
				<category><![CDATA[Introduction]]></category>

		<guid isPermaLink="false">http://blog.muddy.it/?p=29</guid>
		<description><![CDATA[Not the band or that famous book, you&#8217;ll need to go elsewhere for that.  This is about the backstory to Muddy, which we thought would be nice to share, because although Muddy is essentially &#8216;middleware&#8217;, we want to think of it as a &#8216;consumable&#8217;, as an application with a life, that real people use and [...]]]></description>
			<content:encoded><![CDATA[<p>Not the <a href="http://en.wikipedia.org/wiki/Genesis_%28band%29">band</a> or that <a href="http://en.wikipedia.org/wiki/Book_of_Genesis">famous book</a>, you&#8217;ll need to go elsewhere for that.  This is about the backstory to Muddy, which we thought would be nice to share, because although Muddy is essentially &#8216;middleware&#8217;, we want to think of it as a &#8216;consumable&#8217;, as an application with a life, that real people use and engage with (albeit not that many!).</p>
<p>Way back in March 2007, Rob and I submitted an idea to the <a href="http://open.bbc.co.uk/labs/">BBC Labs</a>, then run by <a href="http://www.test.org.uk">Matt Locke</a>, to improve the (horizontal) navigation across the BBC by grounding news articles in &#8217;subjects&#8217; people could peruse.  It wasn&#8217;t an earth shattering idea, but it came from our frustration in a BBC News experience that was still about &#8216;pages&#8217; and very &#8216;flat&#8217; (it&#8217;s improved since then). So, together with Paul Farnell (a designer friend and CEO of <a href="http://www.litmusapp.com">Litmus</a>) we spent five days in North Yorkshire taking the idea to pieces and re-building it.</p>
<p>What came of this process was a) a commission from BBC News and b) a greater appreciation of <a href="http://en.wikipedia.org/">Wikipedia</a> (and <a href="http://dbpedia.org/About">dbpedia</a> which extracts structured information from Wikipedia &#8216;infoboxes&#8217; and creates usable subject-predicate-object relationships from that data) for joining up content by acting as a &#8216;controlled vocabulary&#8217;, a glossary for an ever expanding range of concepts and things.</p>
<p>So, we produced a prototype &#8216;application&#8217; for BBC News called Muddy Boots (we called it Muddy Boots because we felt we were trampling across the rather pristine lawn that is the BBC).  Muddy Boots took BBC News articles and identified &#8216;notable things&#8217; (i.e. things in Wikipedia) in the articles and then via an algorithm and a social bookmarking service we attempted to provide relevant links on the web for that news story.  It kinda worked.  Jonathan Austin <a href="http://www.bbc.co.uk/blogs/journalismlabs/2008/12/muddy_boots.html">did a write up of it on the BBC News Journalism Labs blog</a> which gives a fair bit of detail.  Whilst we were waiting for the testing phase run by BBC News to happen, we continued to develop Muddy Boots as we were interested in where it could go.  As we developed it we dropped the &#8216;Boots&#8217; bit of the name.</p>
<p><span id="more-29"></span></p>
<p><img class="aligncenter size-full wp-image-35" title="Muddy Boots" src="http://blog.muddy.it/wp-content/uploads/Muddy-Boots_1257866591676.png" alt="Muddy Boots" width="550" height="524" /></p>
<p>By early 2008 we had a working web service with an API that essentially took documents and found &#8216;notable entities&#8217; which were then grounded in <a href="http://dbpedia.org/About">dbpedia</a>. This was interesting in that we could then start to map notable entities across &#8216;domains&#8217;.  If that sounds like gobbledegook then perhaps this example will help:</p>
<p>We took <a href="http://www.bbc.co.uk/music/">BBC Music</a> and looked at <a href="http://www.bbc.co.uk/music/artists">artists&#8217; pages</a> (which at that time didn&#8217;t link to other bits of the BBC, mainly because they couldn&#8217;t) and we ran Muddy across those pages, found the artist and <em>then</em> did the same across BBC News, pulling back BBC news stories about that artist (again with a certain level of confidence so it was likely the story was &#8216;about&#8217; the artist rather than just including them). We then pushed those stories back to the BBC Music artist pages.  What benefit does this have?  Well, the premise is that people who find themselves at an artist page are interested in things about that artist, rather than just things related to music.</p>
<p>So, enabling this page on BBC Music about Britney Spears&#8230;</p>
<p><img class="aligncenter size-full wp-image-37" title="BBC - Music - Britney Spears" src="http://blog.muddy.it/wp-content/uploads/BBC-Music-Britney-Spears_1257866812958.png" alt="BBC - Music - Britney Spears" width="550" height="1000" /></p>
<p>To reference news articles on Britney&#8230;</p>
<p><img class="aligncenter size-full wp-image-38" title="Britney News" src="http://blog.muddy.it/wp-content/uploads/BBC-Music-Britney-Spears_1257867687938.png" alt="Britney News" width="550" height="308" /></p>
<p>This got a few people excited and we developed Muddy further for the BBC.  And that&#8217;s how it&#8217;s been for much of the last year, developing, iterating and building a service (as application and web service) that was both robust and scalable, built as it is in the &#8216;cloud&#8217;.   Along the way we&#8217;ve become &#8216;expert&#8217; in linked data and aspects of the semantic web, although Muddy isn&#8217;t a full blown semantic web service.  We&#8217;ve also seen what Muddy isn&#8217;t so good at, for example specific domains like healthcare, where <a href="http://en.wikipedia.org/wiki/Natural_language_processing">NLP</a> based systems can be trained to be more effective in some use cases.  Where Muddy does seem to excel is in extracting notable things from large corpuses of data and enabling you to say things about those entities, like whether they are people, places, events and say things &#8216;about&#8217; them, drawn from the dbpedia references.  For example, if it&#8217;s a person, their webpage or place of birth or age or &#8216;role&#8217;.  That can be pretty cool.</p>
<p>Of course as we were developing Muddy, other similar projects were also taking shape, like <a href="http://www.opencalais.com/">Open Calais</a> and <a href="http://www.zemanta.com/">Zemanta.</a> This was both   &#8216;good&#8217; and &#8216;bad&#8217;.  &#8216;Good&#8217; in that clearly there was a market for doing what we were doing, the needs above were &#8216;real&#8217;, and actually the other products were educating the market in the possibilities of data mining.  However, it was also &#8216;bad&#8217; as these competitors were also getting the first-to-market advantages.  But we believe that Muddy offers benefits over these other services, benefits which you&#8217;ll be able to see for yourself when you explore.</p>
<p>Where now?  We&#8217;re showcasing a couple of things we&#8217;ve done using Muddy and we&#8217;re listening to what other people might like to do (and will do) with Muddy.  So please shout if you want an explanation or help in getting your head around Muddy or if you have an idea for what data would be interesting to play with and we&#8217;ll help.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.muddy.it/2009/11/genesis/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
