Main / RdfToTriplesStylesheet
Introduction
RdfToTriplesStylesheet is an XSLT 1.0 self contained stylesheet, designed to convert http://www.w3.org/RDF/ RDF XML http://www.w3.org/TR/rdf-syntax-grammar/ syntax to the http://www.w3.org/TR/rdf-testcases/#ntriples NTriples format.
Note: A RdfValidationStylesheet that performs validation on RDF has also been added.
Note: We have modified this stylesheet to fix an error in the way both language and datatype were being reported for some nodes(12 August 2004).
The motivation for writing the stylesheet was that while there are a few language specific RDF XML processing engines (for languages such as Perl, Java, and Python), many more languages support XSLT processing. So if the XML to triples production occured inside an XSLT stylesheet, many more languages could be used to build interesting RDF applications. In particular, at the London FOAF meetup in January, PHP users complained that there were no parsers for RDF in PHP (details of language-specific parsers, including PHP, available http://www.ilrt.bristol.ac.uk/discovery/rdf/resources/#sec-tools here. Dan Brickley has http://lists.w3.org/Archives/Public/www-rdf-interest/2002Mar/0112.html pointed out that there are 5 existing RDF XSLT stylesheets, in varying stages of completion. The most notable are:
- http://www.w3.org/2001/12/rubyrdf/xsltrdf/rdf2nt.xsl (Jason Diamond)
- http://www.w3.org/2001/12/rubyrdf/xsltrdf/README.html (Max Froumentin)
- http://www.w3.org/XML/2000/04rdf-parse/ (Dan Connolly)
- http://www.hpl.hp.co.uk/people/jjc/snail/ (Jeremy Carroll)
- http://www.w3.org/2002/03/11-RDF-XSL/ (Evan Lenz)
Stylesheet Component
The stylesheet that aims to follow the XML syntax spec completely and generates all the triples and only the triples of the test cases, including collection and reification triples.
More details of how the stylesheet works can be found at RdfToTriplesOperation.
Test Harness Component
The stylesheet has an associated test harness, so that changes to the way the stylesheet works or the RDF XML syntax can be checked for compliance very easily.
The harness consists of a relational database schema for storing triples, and a C# scraper to load triples from a test-case repository like the one available from W3C. We use this to load the test-case triples provided, together with ones generated from the stylesheet, into the database. We also have a number of SQL queries for determining how many of the test cases and triples the stylesheet is failing on, so that we can check that as we modify the stylesheet to pass one test case it does not break another.
This triples scraper, DB loader and DB schema with queries could be used to test the conformance of other RDF parsers.
Areas of Conformance
We believe that the stylesheet has passed the following positive tests (i.e. tests of the form test001.rdf --> test001.nt)
- amp-in-url
- datatypes
- rdf-charmod-uris
- rdf-containers-syntax-vs-schema
- rdfms-difference-between-ID-and-about
- rdfms-duplicate-member-props
- rdfms-empty-property-elements
- rdfms-identity-anon-resources
- rdfms-nested-bagIDs
- rdfms-not-id-and-resource-attr
- rdfms-para196
- rdfms-rdf-names-use
- rdfms-reification-required
- rdfms-seq-representation
- rdfms-syntax-incomplete
- rdfms-uri-substructure
- rdfms-xmllang
- rdf-ns-prefix-confusion
- rdfs-domain-and-range
- rdfs-no-cycles-in-subClassOf
- rdfs-no-cycles-in-subPropertyOf
- unrecognised-xml-attributes
- xmlbase
It should be noted that the identifiers for anonymous nodes (those beginning with _:) are random and therefore the ones created by the stylesheet are different from those in the test cases. To determine whether the test was completed successfully, the test harness queries are allowed to match anonymous nodes successfully even if the identifiers are different (i.e. _:5 will match _:abc).
Areas of Non-Conformance
The stylesheet fails the following positive tests:
- rdf-charmod-literals
- rdfms-xml-literal-namespaces
There are two issues with the current implementation that will be perhaps impossible to resolve with just XSLT:
1) The requirement to output triples in US-ASCII, with unicode characters escaped to e.g. \uHHHH. I can't see a way of getting the XSLT processor to produce this output without using a translation table, clearly impractical with 1.1 million unicode characters.
2) The requirement to produce triples for literal objects that contain escaped carriage return characters. Because an XML parser ignores whitespace between elements if that whitespace is the only character data between those elements, the XSLT processor will never see and escape those whitespace characters.
These two areas of non-conformance should not be seen as an impossible obstacle, if the aim is to be able to load RDF into a unicode compliant triple-store, and to store XML as XML rather than a series of characters.
We have not run the stylesheet against the error test cases, and the stylesheet is not producing error messages in the case where invalid RDF is presented. This is something which we could easily add to the stylesheet at a later stage. Note: We have since added a separate RdfValidationStylesheet (12 August 2004).
Download
The stylesheet is available here.
