In order to explain FactMint’s reason for being (and, in fact, Linked Data in general) I often use the following anecdotal example:
What are the top 5 schools, in the UK, for providing British Prime Ministers?
- Do you know?
- How would you find out?
- How long would that take?
In this post I would like to quickly explore those questions and try to define the problem, as I see it today.
So, to the first point, no: I don’t know the answer. I’d guess, with a pretty high level of confidence, that Eton is number one – Campbo’, at least, counts for one there. After that, not a clue. The only other school I can think of off-the-top-of-my-head is my school, but, much as I liked Wrenn in Wellingborough, I severely doubt it ever produced a Prime Minister.
Given my state of ignorance, then, how could I find the answer? Well the obvious answer is research.
First I tried AQA (the text service, Any Question Answered). I sent them the question, exactly as typed above, then waited… 32 minutes later and my phone beeped it’s alert for a reply. That was actually quite a tense half an hour – this was before FactMint was incorporated (or even named) and if AQA could answer the question for a quid it would have seriously hurt my business case! Fortunately for me, the text read as follows:
“Sorry, 63336 can’t find the top 5 schools. The top 2 schools are Eton, which has produced 19 Prime Ministers, and Harrow, which has produced 7.”
So where next? I could ask (directly or via a Web search) as many schools as I could find and compare their answers but that would doubtless be unreliable and I’d need to get a comprehensive list of schools from somewhere. Not to mention that it would take weeks. A better approach would be to investigate each Prime Minister – they’re doubtlessly well documented on Wikipedia and across the Web in general. So I began. First stop, a list of Prime Ministers of Britain… good work Wikipedia community. The page included some “Before Walpole” but I don’t think they count. The page was also split my monarch so I had 9 charts to merge. Because of the layout of the HTML they didn’t paste into a spreadsheet properly so 73 <ctrl-c> <ctrl-v>s later I had a list.
Next job, get their schools. 1 for Eton – Cambo’; 1 for Kirkcaldy High School – Gordon; Fettes College – Blair; Rutlish Grammar School – John Major. 5 minutes in and a little over 5% of the list done. Being the pragmatic / easily bored type, I decided not to complete this experiment. Roughly 1% a minute means that I can hazard a guess at the answer to the third question I posed… about 10 minutes for building the spreadsheet, 100 minutes getting the list of schools and a couple of minutes to total up and sort… just shy of 2 hours.
That – in a nutshell – is FactMint’s reason for being. The technologies which build up the Semantic Web can make that kind of query trivially easy. As we were using Wikipedia for the traditional research, let’s try the same thing with Freebase (roughly, an RDF database built on Wikipedia and loads of other stuff). Freebase isn’t easy to use, if you’re not a developer type, but it can get you the data quickly.
So, I get my phone timer ready, point one tab of my browser at the Freebase Query Editor and one at the Freebase page for David Cameron, and here I go…
6 and a half minutes later and I’m there. It still wasn’t the ideal process and would be completely inaccessible if you weren’t happy with JSON. Freebase also failed to give me the actual answer – it knew 31 of the 73 British Prime Minister’s schools. Still, a pretty good response (probably with the same coverage as WIkipedia would have given me) in just over a 20th of the research time.
The next step from Freebase is obvious. The calculations required to give me that results took a minuscule fraction of a second. The 6 and a half minutes were used up by my human mind trying to instruct a computer program on what, exactly, I wanted to know. And from that comes the mission statement of FactMint, “to create beautiful and intuitive ways for people to interact with the Semantic Web”. When these knowledge-bases become easy to query, the answer to the question I originally posed, and many others like it, become commodities; more complex queries become ask-able; and less time is spend searching, copying, pasting and sorting in Excel – surely everyone wants that!
Oh, and incase you were interested: Eton is number one, then Harrow, then Westminster. Charterhouse, Chatham House, Fettes, Haileybury, Rugby, Rutlish and Winchester all come in a fair way behind.