Post details: SIOC, SPARQL and TimeLine
2006-07-14
SIOC, SPARQL and TimeLine
Putting Blogs on TimeLine.
Following the release of excellent SIMILE TimeLine visualisation tool here is what can be done using some SIOC data that's out there:
Fig 1. Blog posts on a TimeLine
Different icons show what blog the post belongs to (Christoph Görn, Harry Chen, John Breslin from DERI, ...).
If your posts do not have an icon and use a instead please let me know what icon to use.
You can go back in history (see below). Notice posts from Danny Ayers (watch out for cats!) appearing - from the time when he was using WordPress + WP SIOC plugin.
Fig 2. December 2005 on the same TimeLine.
Taking it to extremes: SIOC TimeLine from year 1997 showing the origins of John Breslin's blog.
Technical Details:
- Master list for the crawler taken from SIOC-enabled sites list, converted to RDF using Alex Passant's Wiki->RDF script.
- Gathered RDF data (using my crawler.py). Added Danny's SIOC data from the archive.
- Stored data in a Joseki 3.0 beta server (SPARQL endpoint is here).
- Perform a SPARQL query (script), used XSLT (sparql-tline.xslt) to convert SPARQL XML result set to TimeLine XML format (converted XML file). Icons for blogs are also added via XSLT (adding statements to the RDF store is another option).
- ... and here is the result - TimeLine doing its AJAX-y magic.
For more info read How to Create Timelines on SIMILE site.
See Also:
- Danny Ayers: SPARQL Timeline ps.
- Alex Passant: SPARQL/JSON into Timeline
Notes:
There are some things that need to be improved.
- Visual bugs - icons are getting cropped and text labels are wrapped to next line and cropped at the bottom. Making icons smaller makes then unrecognizeable, so that is not a solution (unless there's an icon graphics wizard who can teach me how to make good, small icons)
- Visual appearance - after bugs are fixed there are improvements that can be done - e.g. make the posts (small vertical lines) appear at the monthly band in one line creating a "feel" of density of posts in time; ...
- Performance / data volume - the amount of data including full text of blog posts can be quite large (it is amazing how much a small group of people can write in couple of months). Loading data on demand can be a solution - both for metadata of posts that are outside of viewable area and for post contents.
- Reliability - Joseki 3.0 is still in beta and crashes from time to time. It is not a problem now since data for TimeLine are queries for only once, in "attended" mode. But it will be a problem if doing data loading on demand. If you notice the store is down, please let me (captsolo @ gmail.com) know.
- Dynamic updates - currently data are crawled and stored in one batch. How to do incremental crawling of new SIOC data?
- Use SIOC - timeline as is now does not use much of relations available in SIOC. There is more information in the data store - posts linking to comments, site pointing to posts it contains, topics of posts, etc. It would be good to visualize this richer data, although I am not 100% sure timeline is best fit for that.
Comments, Pingbacks: