Launch of sw-announce Mailing List

sw-announce is a moderated announcements-only mailing list. The list is intended to efficiently communicate news relevant to the Semantic Web and related technologies. This includes, but is certainly not limited to, metadata, ontologies, RDF, knowledge management, AI, electronic agents, and semantic web services.

Category RSS Feeds and RDF Versions

Each category now has its own RSS 1.0 feed linked at the top of the category archive (e.g. the RDF Templates Design Notes category). Additionally, most pages now have RDF Versions with full machine-readable metadata.

Resource Condition Extension

Currently resource conditions in a nodepath restrict you to a single arc and node pattern, e.g. resource()[arc()/literal() = literal('hello')]. I've relaxed this constraint so that the selection part of the condition can be any arc-matching nodepath. This means that you can now write conditions that test the arcs on a node, e.g. resource()[arc() = resource('http://xmlns.com/foaf/0.1/weblog')] would match any resource with a foaf:weblog property, no matter what the value of that property is, or resource()[arc('http://xmlns.com/foaf/0.1/knows')/resource()/arc() = resource('http://xmlns.com/foaf/0.1/weblog')] that matches any resource that knows a resource with a foaf:weblog.

The new BNF for the spec is:

ResourceCondition            ::= ArcMatchingNodePath [ " = " NodeSpecifier ] 

Arc Selection Syntax Reprise

OK, I implemented the proposed ArcPattern syntax change and decided that I didn't like it! However, I came up with an alternative that is even more expressive: introduce a specifier called arc() that acts in all ways like resource() but is to be used as the specifer in ArcPatterns. Here's how it would look:

<rt:root-template>
  <rt:for-each rt:select="~subject()">
    <rt:for-each rt:select="resource()/arc()">
      <rt:for-each rt:select="arc()/resource()">
        <rt:value-of rt:select="label()"/>
      </rt:for-each>
  </rt:for-each>
</rt:root-template>

For comparison, here's the original (i.e. current spec version):

<rt:root-template>
  <rt:for-each rt:select="~subject()">
    <rt:for-each rt:select="resource()/resource()">
      <rt:for-each rt:select="resource()/resource()">
        <rt:value-of rt:select="label()"/>
      </rt:for-each>
  </rt:for-each>
</rt:root-template>

The new way is clearer I'm sure.

Revised Spec Including Arc Selection

I've just published the latest RDF Templates Specification which includes the new arc selecting NodePaths. I've reworked large parts of the text too in an effort to make it more logical. There's still a long way to go before I'm happy with the spec but it's coming along quite nicely all the same. Of course, there's a new version of phpRDFT to go along with the spec too.

NodePath Terminology

The following is from the up-coming specification revision:

A NodePath is evaluated in a particular context which is determined by the structure of the stylesheet and may be either node-context or arc-context.

A node-matching NodePath has a NodePattern as its first pattern and is evaluated in a node-context. It is an error for this type of NodePath to be evaluated when the context is an arc.

An arc-matching NodePath has an ArcPattern as its first pattern and is evaluated in an arc-context. It is an error for this type of NodePath to be evaluated when the context is an arc.

NodePaths can change the context in which they are evaluated by the use of scope specifiers. A scope specifier overrides the current context by selecting a list of nodes or arcs from a graph. The NodePath is evaluated against each node or arc in the list as though it were the context node or arc.

A global scope specifier selects all nodes from the input graph. This scope specifier can only be used with a node-matching NodePath.

A source scope specifier selects all nodes from the source graph. This scope specifier can only be used with a node-matching NodePath.

NodePaths can select either nodes or arcs.

A node-selecting NodePath has a NodePattern as a terminating pattern and selects a list of nodes from the graph.

An arc-selecting NodePath has an ArcPattern as a terminating pattern and selects arcs from the graph.

Arc Selection Syntax Proposal

Adding arc selection to NodePaths has created some usability problems. Its not obvious from looking at a NodePath in isolation whether it expects nodes or arcs as its context. It's easy enough for the stylesheet author to work out but it might be a stumbling block for newcomers to RDF Templates. It's similar to the criticisms RDF/XML faces with the striping syntax - there's no easy way to identify whether you're looking at a node or an arc. RDF/XML has a convention of using an upper case first letter for nodes and lower case first letters for arcs but many schemas don't use this convention.

Here's an example of the kind of problem that I'm talking about:

<rt:root-template>
  <rt:for-each rt:select="~subject()">
    <rt:for-each rt:select="resource()/resource()">
      <rt:for-each rt:select="resource()/resource()">
        <rt:value-of rt:select="label()"/>
      </rt:for-each>
  </rt:for-each>
</rt:root-template>

Does the rt:value-of instruction emit the label of nodes or arcs? It's not immediately obvious, but tracing through the NodePaths reveals the answer to be node. The first for-each selects nodes that are subjects of a triple, the second for-each selects arcs of those nodes and the third for-each selects node values of those arcs

To clarify this situation I'm proposing a NodePath syntax change. My proposal is to prefix each ArcPattern with an @ symbol. The example above would become:

<rt:root-template>
  <rt:for-each rt:select="~subject()">
    <rt:for-each rt:select="resource()/@resource()">
      <rt:for-each rt:select="@resource()/resource()">
        <rt:value-of rt:select="label()"/>
      </rt:for-each>
  </rt:for-each>
</rt:root-template>

This would make it easier to match up arc-matching NodePaths with arc-selecting NodePaths. The compact syntax would also be clearer: */@*/*.

Is this a good idea or is it an unnecessary complication of the syntax?

Triples

  • Node: http://1.example.com/
    • Arc:http://purl.org/dc/elements/1.1/creator
      • Value: "Fred Flintstone"
  • Node: http://2.example.com/
    • Arc:http://purl.org/dc/elements/1.1/publisher
      • Value: mailto:publisher@example.com
    • Arc:http://purl.org/dc/elements/1.1/creator
      • Value: mailto:webmaster@2.example.com
  • Node: http://3.example.com/
    • Arc:http://purl.org/dc/elements/1.1/publisher
      • Value: mailto:publisher@example.com
    • Arc:http://purl.org/dc/elements/1.1/creator
      • Value: mailto:webmaster@3.example.com
  • Node: mailto:webmaster@3.example.com
    • Arc:http://www.w3.org/1999/02/22-rdf-syntax-ns#value
      • Value: "webmaster"
(more…)

First Arc Selecting Stylesheet

I've just shocked myself by running my first arc selecting stylesheet. I'm shocked because all I've changed is the parsing mechanism for nodepaths so that arc selecting nodepaths are allowed. I thought I'd have to change a lot more than just that…! Anyway, here's the first stylesheet:

<rt:stylesheet xmlns:rt="http://purl.org/vocab/2003/rdft/">
  <rt:root-template>
    <html>
      <body>
        <rt:apply-templates rt:select="~node()" />
      </body>
    </html>
  </rt:root-template>

  <rt:template rt:pattern="node()">
    <p><rt:value-of rt:select="label()" /> has arcs:</p>
    <ul>
      <rt:for-each rt:select="node()/resource()">
        <li><rt:value-of rt:select="label()" /></li>
      </rt:for-each>
    </ul>
  </rt:template>
</rt:stylesheet>
(more…)

NodePaths for Arcs

I'm working on implementing nodepaths that select arcs. This is essentially just allowing node()/resource() as well as node()/resource()/node(). I've completed that change and now face the bigger challenge of implementing processing support.

The current spec describes a processing model whereby a context node is selected from the graph and all operations act on that context node. However, I want to be able to act on nodes and arcs so that definition has to change. One small dilemma I have is what to call it - 'context node or arc’ is a bit of a mouthful and 'context resource’ while technically correct implies that literals cannot be the context which, of course, they can. I'm probably just going to just use the term 'context’ and leave it at that.

Updated RDFT Specification

I've just uploaded the latest RDFT specification which incorporates the alterations I've been discussing over the past few days, including:

  • Source Scope — a new scope that makes the NodePath apply only to the original source graph without the RDFS closure rules.
  • Label Function — a new function that returns the current node's label.
  • Data Model — some clarification of the data model.
  • Subject Specifier — a node specifier that matches only resources that are the subject of a triple.

Also uploaded is the latest version of the RDFT reference implementation: phpRDFT which implements all the features described in the latest specification.

Schema Stylesheet and Nodepath Examination

Here's a little stylesheet that demonstrates the use of source scope nodepaths. It produces an HTML representation of an RDF schema. The source scope nodepaths are used to ensure that only the classes and properties described in the schema document are selected for output. However, the subsequent matches and selects are scoped against the RDFS closure of the graph so it will recognise subclasses of rdfs:Class and rdf:Property if those are imported. For example you could import the Owl schema and it would recognise owl:TransitiveProperty as a though it were an rdf:Property.

(more…)

Source Scoped Nodepaths

I ended up choosing '~' as the prefix for the source scoped nodepaths. A source scoped nodepath uses all the nodes in the graph before the RDFS closure rules are applied. For comparison here is the same nodepath with three different scopes:

resource()[resource('http://ex.example.com/age')/literal() = literal('23')]
A locally scoped NodePath that selects the context node only if it has a 'http://ex.example.com/age’ property with a literal value of '23’.
/resource()[resource('http://ex.example.com/age')/literal() = literal('23')]
A globally scoped NodePath that selects all nodes from the RDFS closed graph that have a 'http://ex.example.com/age’ property with a literal value of '23’.
~resource()[resource('http://ex.example.com/age')/literal() = literal('23')]
A source scoped NodePath that selects all nodes from the original source graph that have a 'http://ex.example.com/age’ property with a literal value of '23’.

RDFS Closures

I really have to get the RDFS closures issue sorted out. Currently the spec mandates that the input graph be closed using the RDFS closure rules from the RDF model theory spec. This allows cool things such as automatic subProperty and subClass behaviour so you can write templates for foaf:knows and have it work even if the input graph is using Eric's relationship schema.

The problem comes when you're dealing directly with rdfs:Class or rdf:Property such as when you're writing a stylesheet to produce a nice HTML representation of an RDF schema document. When you try to select all the classes in the schema you also get the inferred ones from the closure rules which rather messes things up.

There are a number of ways round this:

  1. no closure rules — this would work a the expense of losing all the convenience of subProperty and subClass inferences. It would also make the processor faster.
  2. closure rules optional — this is one step up from removing the rules entirely and would allow stylesheets that need to deal with low level RDF/RDFS constructs to access only the source graph.
  3. nodepaths operate optionally on source only — rather than a wholesale on/off setting for the closure rules it might be possible to apply a nodepath only to the original graph. There is already a nodepath scope operator in the syntax which allows a 'global’ select, it would be feasible to add a 'source’ select as well.
  4. node patterns operate on source only — this would allow parts of a nodepath to operate only on the source graph. Not sure what overall benefit would be gained for the additional implementation complexity this would entail.

Now I've written the various options down, I can feel myself gravitating to option 3. I think implementing that would be fairly simple to do naievely - just keep a copy of the original graph before the closure rules are applied. Syntactically the spec currently uses '/' to denote a global scope nodepath. What symbol would be best for denoting the source graph? Any of these? '#', ':', '^', '!', '@', '~', '?', '%', '&', '$'

Outputting Subjects and Objects

Here's an RDF Template that lists all the subject nodes of the triples in a graph followed by a list of the objects of those triples. The stylesheet distinguishes between resource and literal objects.

<rt:stylesheet xmlns:rt="http://purl.org/vocab/2003/rdft/">
  <rt:root-template>
    <html>
      <head>
        <title>Subjects and Objects</title>
      </head>
      <body>
      <ul>
        <rt:for-each rt:select="/subject()">
        <li>
          <rt:value-of rt:select="label()" />
          <ul>
            <rt:for-each rt:select="*/resource()/resource()">
              <li>
                <rt:value-of rt:select="label()" />
              </li>
            </rt:for-each>
            <rt:for-each rt:select="*/resource()/literal()">
              <li>
                "<rt:value-of rt:select="label()" />"
              </li>
            </rt:for-each>
          </ul>
        </li>
        </rt:for-each>
        </ul>
      </body>
    </html>
  </rt:root-template>
</rt:stylesheet>
(more…)

Selecting Only Subjects

For the RDF/XML round-tripping to work effectively, I need to be able to select only nodes that are the subject of a triple. Taking a leaf out of Sean B. Palmer's excellent RDFPath proposal, I'm introducing a subject() node specifier:

subject()
A node specifier that matches only resources that are the subject of a triple.

Here are some examples of how it would be used in nodepaths:

/subject()
A globally scoped NodePath that selects all nodes in the context graph that are the subject of any triple in that graph.
resource('http://example.com/')
A locally scoped NodePath that selects the context node only if it has the given uriref and is the subject of any triple in the graph.

Outputting All Node Labels

Here's an RDF Template that outputs the labels for all the nodes in the graph:

<rt:stylesheet xmlns:rt="http://purl.org/vocab/2003/rdft/">
  <rt:root-template>
    <html>
      <head>
        <title>Output All Node Labels</title>
      </head>
      <body>
        <rt:for-each rt:select="/node()">
          <pre>
            <rt:value-of rt:select="label()" />
          </pre>
        </rt:for-each>
      </body>
    </html>
  </rt:root-template>
</rt:stylesheet>

This works well in the bleeding edge version of phpRDFT (not released yet, will be very soon) but demonstrates a potential drawback in the processing model: all the nodes inferred by the RDFS closure rules are also displayed. This wasn't unexpected but might cause confusion for someone getting to grips with RDFT for the first time.

(more…)

Arcs Not Forgotten

Arcs aren't forgotten in the RDF Templates design, and some way of manipulating them needs to be thought out if the RDF/XML round-tripping is to work. There are a couple of approaches to this from allowing nodepaths to select and match arcs to providing functions that access specific arcs from a given node. The former would allow the creation of arc specific templates to provide common behaviour no matter what the type of resource, e.g. outputing the dc:title of a resource in a consistent format. It means a change to the processing model though which may be complicated…

RDF Templates Label Function

Now the data model is a little clearer, I can introduce a new function: label:

label()
This acts on the context node. The function will return the string representation of the node's label. For a resource this is its uriref, for an untyped literal it is its value. For a blank node the label function will return an empty string.

rong>Updated 14:14pm, 12 September 2003

RDF Templates Data Model

To support access to the labels of nodes in an RDF graph, I need to clarify the RDF Templates data model which is a little fuzzy at the moment. I think the following data types are required for now (others will be required later):

  • node — represents a node in the graph.
  • node list — represents a list of nodes which currently is unordered but should be able to be ordered in the future.
  • string — an ordered sequence of characters.

At the moment only the node and node list types are explicit and the string type is inferred. My thinking is that the rt:value-of element should return a string, therefore it needs to be able to convert nodes and node lists to strings. The current spec says: This element is replaced by the literal value of the concatenated values of the nodes selected by the NodePath specified in the select attribute. Each literal node specified is replaced by its label. Each resource node is replaced by its rdf:value property if present or an empty string if not.. I can replace this wording with something like: This element evaluates the expression in the select attribute and returns the result converted to a string.. I then need to define the rules for converting nodes and node lists to strings:

  • node list to string — concatenate the string representations of each node in the list. The order in which they are concatenated is determined by the ordering of the node list which may be undefined.
  • node to string — if the node is a literal node then the string representation is equal to the node's label. If the node is a resource then the string representation is the value of the node's rdf:value property. If no rdf:value property is present then the string representation is the empty string.

These rules will need to be extended in the future when I start tackling literal datatypes and it might be prudent to allow other string representations of resources, such as the rdf:label property, but the above rules are intended to be clarifications, not extensions of the specification.

Why RDF Templates?

A couple of people have asked me why the world needs RDF Templates? My current answer is that if we're asking people to use the RDF data model then we'd better have good, portable ways of specifying how to make use of the data in that model. RDF Templates will hopefully provide a vendor and language independent means of specifying data extraction from a triple store.

The more immediate and personal driver is that several of my current projects are running up against the same problem: serialisation of RDF data in a variety of formats. I've already mentioned placetime.com and semantic planet, but I'm also facing it at myRSS. myRSS essentially has a large triple store backend which is fed by the spider. The feeds outputs are produced by querying this store and writing out RSS 1.0, RSS 0.91, JavaScript, HTML and several other versions of the feed. This whole process could be simplified if I could apply an RDF Template to a specific node for each feed and let the RDFT processor do all the hard work of queryign and traversing the graph.

RDF Templates Plans

I think the next steps for RDF Templates is to enable the round-tripping of RDF/XML. Is there a practical reason for doing this you might ask? Well, the reason I came up with RDF Templates in the first place was to solve a problem I had with placetime.com and Semantic Planet's source directory. Both of these services output HTML and RDF representations of the same things and both tie themselves in knots getting the right information out. With placetime I want to automatically incorporate RDF from other entities, e.g. the start and end instants of an interval. I could do this very easily if I just build an in-memory RDF model, add the triples I need from a query and use RDF Templates to generate the HTML and RDF/XML versions.

RDF Templates has a way to go before it's expressive enough to be able to serialise a graph as RDF/XML. Here's my starting list of what needs to be done:

  • node labels — there has to be some way of getting a string representation of the node label.
  • literal languages — likewise, there has to be a way of accessing the language of a literal.
  • namespaces — it has to be possible to output elements and attributes with namespaces.
  • xml encoding — need to be able to specify the output encoding in a similar way to XSLT.
  • variables — I suspect, but don't know for sure, that the stylesheet will need to have some context passed around. This means variable or param elements perhaps.

I need to mull this over a bit and see if there's anything I've missed. The best way to see is probably to try writing the stylesheet and see what walls I hit.

New Version of phpRDFT

Along with the revised spec. is the revised version of phpRDFT which implements the spec changes plus one other big change - it now supports RAP 0.6 (the latest version).

RDF Templates Update

I've just uploaded a new version of the RDF Templates 1.0 spec, with a couple of minor changes suggested by Daniel Biddle. The first was to make the absence of a literal value in a specifier more obvious by using an asterisk instead of an empty parameter. So, whereas before you would use this to match a literal s with any value but a language of 'fr':

literal(, 'fr')

you should now use this:

literal(*, 'fr')

I agreed with Daniel that this made the intention clearer (i.e. any value for the literal).

Daniel's second proposal was to allow any language from RFC3066 as part of the literal specifier. Instead of taking this direct route I instead specified that the language code could be any language code as specified in Section 6.5 of the RDF Concepts document which ties the RDFT spec closer to the RDF ones.

RDF Templates

I posted this announcement to the www-rdf-interest mailing list just now:

RDF Templates (RDFT) are an XML format for creating representations of RDF graphs. In a similar way to XSLT, RDF Templates define template rules with patterns which are matched against nodes. Template rules specify output actions and further node selections which trigger further template operation. However, instead of acting on an XML tree, RDFT acts upon an RDF graph. Nodes are specified using a 'nodepath’ syntax which defines conditional node/arc/node graph traversals. A macro definition facility is provided to reduce long nodepaths to easier to read strings.

RDFT has been designed to parallel XSLT where sensible and anyone familiar with that language and with the principles of the RDF model should find it very easy to learn. RDFT solves a key problem of processing RDF with XSLT since it acts on the underlying graph and therefore has no dependencies on the RDF serialisation syntax. It also specifies RDF Schema awareness so sub-classes and sub-properties are handled as expected.

The updated RDFT specification, previously hosted on my personal site, has found a new home at Semantic Planet:

http://www.semanticplanet.com/2003/08/rdft/spec

This version of the spec is accompanied by the first full implementation, written in PHP and using the RAP RDF parser. Development of this implementation has helped clarify large sections of the original RDFT specification especially around the processing model. The implementation is released under the GNU GPL. For more information about this implementation or to download it see:

http://www.semanticplanet.com/2003/08/rdft/impl/php/dist

For an online demo of RDFT, please see:

http://www.semanticplanet.com/2003/08/rdft/impl/php/demo

Stay tuned for more RDF Templates stuff over the coming weeks…

RDF queries using a relational triple store

While I was travelling in the States, I had the idea of writing a SQL stored procedure that would query a RDF triple store and return matching triple sets. The query would itself be specified as a triple set. For example, a simple query ("find the nodes with a name property that has the literal value of 'ian'") might look like this: ['?b', 'name', 'ian'] A more complex Friend of a Friend query ("find names of people that know a person that knows a person called 'ian'") might look like the triple set ['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?c', 'name', 'ian'], ['?a', 'name', '?e'] When I talk about triple sets to describe RDF queries, I was inspired by the work of Libby Miller in thinking of RDF query definitions as sub-graphs for SquishQL. The stored procedure I have built returns sets of triples that match all of the constraints declared in the query. A single matching set of triples might be: ['_:123', 'knows', '_:456'], ['_:456', 'knows', '_:789'], ['_:789', 'name', 'ian'], ['_:123', 'name', 'james']. I need to undertake more extensive testing to look at the unusual cases more. The current logic seems to work correctly but I need more confidence. The following are examples of queries that have been somewhat harder to deal with: 1. ['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?b', 'knows', '?d'] This translates as "find people who know a person who knows two people" and the problem is that result sets need to repeat values for ?a and ?b for each set of known pairs of people. 2. ['?a', 'knows', '?b'], ['?c', 'knows', '?b'], ['?b', 'knows', '?d'] This translates as "find two people who know the same person who knows a person". 3. ['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?c', 'knows', '?a'] This translates as "find people who know a person who knows a person who knows them" i.e. a circular chain relationship. The next step for me is to build this on top of a large triple store, to look at scalability issues. Following this, I will implement a simple web query interface and get some feedback on the query syntax etc. As you can see, at this stage I have completely glossed over issues like datatypes, the difference between resources and literals, and other very important areas in RDF. The goal is to build a high performance system that will scale to millions of triples, and to do it using plain ANSI-92 SQL. I think this would be of real value to the FOAF scutter builders, allowing a logical progression from spidering RDF statements to being able to query them.