In October GovHack was held in Canberra. I went along as a participant, but also to advise any teams on the use of the National Library of Australia’s API’s. One of the things I spent my time doing there was to make some YQL Open Data Tables for some of the Library’s services. Why is this interesting? Let’s go back a few steps.
YQL is a service from Yahoo that provides a SQL like environment for querying, filtering and joining web services. So instead of having to write a complex URL to access data from a website, we can use YQL to write a statement that is similar to an SQL query that we might use to obtain data from a MySQL database, except, instead of querying a database, we are querying a web service. As an example, you can enter the following into the YQL console to extract photos of the Sydney Harbour Bridge from Flickr:
SELECT * FROM flickr.photos.search WHERE text="sydney harbour bridge";
When YQL was launched it initially had options to query only Yahoo’s services. If you wanted to query a web service that was outside of Yahoo’s services you were out of luck. Since then Yahoo has allowed developers to build YQL Open Data Tables. An Open Data Table is an XML file that acts as a bridge between your API the YQL language and you describe how your API is structured in terms that YQL can understand.
If we wish to use an API to return data from one of the Library’s services, say Picture Australia, we can query it using the following URL:
http://librariesaustralia.nla.gov.au/apps/kss?action=OpenSearch&targetid=pictaust&searchTerms=Sydney+Harbour+Bridge&startPage=1
As you can see, it starts to become a fairly complex URL with a lot of querystring values to point towards where we need to extract the data from.
Now let’s create that same query using YQL. Firstly I created an Open Data Table for Picture Australia. This is the key component that ties Picture Australia and YQL together. If you now enter the following into the YQL console & you’ll get back an XML feed from Picture Australia for the pictures of the Sydney Harbour Bridge.
USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT * FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1";
Alternatively you can query The National Library of Australia’s catalogue for pictures of the Sydney Harbour Bridge by using this Open Data Table and entering the following term into the YQL console:
USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor="sydney harbour bridge {format:Online AND format:Picture}";
So how is this interesting? Can’t all of this information already be gathered from our standard API’s? There are a couple of advantages to using YQL. One advantage is being able to extract just portions of the data. Say you want to extract just the title, description and persistant URL of the records and you only want to return the first 3 items, you can just enter:
USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT title,description,link FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 3;
or you could just extract a link to where the most relevant original item is stored.
USE "http://www.paulhagon.com/yql/pictureaustralia.xml" AS pictureaustralia;
SELECT enclosure.url FROM pictureaustralia WHERE searchTerms="sydney harbour bridge" AND startPage="1" LIMIT 1;
This starts to give you a bit of flexibility in the fields and amount of data that is returned and limit the amount of parsing that you have to do. All the hard work is being done by the servers at Yahoo.
But the really fun stuff starts when you try to create a little mashup by combining data from different services. Let’s use YQL to find the current number 1 artist at Yahoo’s music service:
SELECT name FROM music.artist.popular LIMIT 1;
We can now easily combine this search with a search for the top 5 items from or about that artist in the National Library’s catalogue:
USE "http://www.paulhagon.com/yql/nla.xml" AS nla;
SELECT * FROM nla WHERE lookfor IN (SELECT name FROM music.artist.popular LIMIT 1) LIMIT 5;
Once we have constructed this query, we can access that using a JSON-P call and use a little bit of JavaScript to display the results within a web page (see example 1).
<div id="nla"></div>
<script type="text/javascript">
function nlabooks(o){
var f = document.getElementById('nla');
var out = '<ul>';
var books = o.query.results.item;
for(var i=0,j=books.length;i<j;i++){
var cur = books[i];
out += '<li><a href="' + cur.link + '">'+ cur.title +'</a></li>';
}
out += '</ul>';
f.innerHTML = out;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=USE%20%22http%3A%2F%2Fwww.paulhagon.com%2Fyql%2Fnla.xml%22%20AS%20nla%3B%0ASELECT%20*%20FROM%20nla%20WHERE%20lookfor%20IN%20(SELECT%20name%20FROM%20music.artist.popular%20LIMIT%201)%20limit%205%3B&format=json&diagnostics=false&callback=nlabooks"></script>
We’ve now got a little widget that we can use inside any page to dynamically mashup 2 separate data sources.
If we were to do that in a traditional manner we would have to be writing two separate calls to the web services and possibly parsing the results in different ways. By using YQL, all that hard work can be carried out in a minimal amount of code.
Building these tables was as much a case of learning a bit more about YQL and the possibilities that it can offer. What I’ve shown here is a simple demonstration at the ease with which you can use services like YQL to expand your data to a wider audience.
Note: Please don’t build any mission critical applications using these data tables – they are only there for demonstration purposes. I’ll hopefully make them more permanent and hosted on the National Library’s servers.