about semantic web, software architecture and life in general

This is an archived version of my blog.
See also my homepage.

Post details: NQuad parsing using Jython

2011-02-21

Permalink 01:31:47, Categories: Semantic Web, Software Development   English (EU)

NQuad parsing using Jython

When in need to parse NQuad RDF files (e.g., the Billion Triples Challenge data files) Java folks can use the NxParser by Aidan Hogan and Andreas Harth: NxParser - Parser for NTriples, NQuads, and more.

You can also use it from Python (provided you use the Jython implementation).

Code:

import sys
sys.path.append("./nxparser.jar")
 
from org.semanticweb.yars.nx.parser import *
from java.io import FileInputStream
from java.util.zip import GZIPInputStream
 
def all_triples(fname, use_gzip=False):
    in_file = FileInputStream(fname)
    if use_gzip:
        in_file = GZIPInputStream(in_file)
 
    nxp = NxParser(in_file, False)
 
    while nxp.hasNext():
        triple = nxp.next()
        n3 = ([i.toN3() for i in triple])
        yield n3

The code above defines a generator function which will yield a stream of NQuad records. We can now add some demo code in order to see it in action:

[More:]

Code:

def main():
    gzfname = "sioc-btc-2009.gz"
 
    for line in all_triples(gzfname, use_gzip=True):
        print line
 
if __name__ == "__main__":
    main()

Code:

[u'<http://2008.blogtalk.net/node/29>', u'<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', u'<http://rdfs.org/sioc/ns#Post>', u'<http://2008.blogtalk.net/sioc/node/29>']
 
[u'<http://2008.blogtalk.net/node/65>', u'<http://rdfs.org/sioc/ns#content>', u'"We\'ve created a map showing the main places of interest (event locations, restaurants, pubs, shopping locations and tourist sights) during BlogTalk 2008.  The conference venue is shown on the left-hand side of the map.  We will also have a hardcopy for all attendees. View Larger Map"', u'<http://2008.blogtalk.net/sioc/node/65>']

Notes:

  • I was using this code to parse BTC data files.
  • NQuad parsing was recently added to Redland.

Comments, Pingbacks:

No Comments/Pingbacks for this post yet...

Page served in 0.319 seconds

Valid XHTML 1.0! Valid CSS! Valid RSS! Valid Atom!