RDF queries using a relational triple store
While I was travelling in the States, I had the idea of writing a SQL stored procedure that would query a RDF triple store and return matching triple sets. The query would itself be specified as a triple set. For example, a simple query ("find the nodes with a name property that has the literal value of 'ian'") might look like this:
['?b', 'name', 'ian']
A more complex Friend of a Friend query ("find names of people that know a person that knows a person called 'ian'") might look like the triple set
['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?c', 'name', 'ian'], ['?a', 'name', '?e']
When I talk about triple sets to describe RDF queries, I was inspired by the work of Libby Miller in thinking of RDF query definitions as sub-graphs for SquishQL.
The stored procedure I have built returns sets of triples that match all of the constraints declared in the query. A single matching set of triples might be:
['_:123', 'knows', '_:456'], ['_:456', 'knows', '_:789'], ['_:789', 'name', 'ian'], ['_:123', 'name', 'james'].
I need to undertake more extensive testing to look at the unusual cases more. The current logic seems to work correctly but I need more confidence. The following are examples of queries that have been somewhat harder to deal with:
1. ['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?b', 'knows', '?d']
This translates as "find people who know a person who knows two people" and the problem is that result sets need to repeat values for ?a and ?b for each set of known pairs of people.
2. ['?a', 'knows', '?b'], ['?c', 'knows', '?b'], ['?b', 'knows', '?d']
This translates as "find two people who know the same person who knows a person".
3. ['?a', 'knows', '?b'], ['?b', 'knows', '?c'], ['?c', 'knows', '?a']
This translates as "find people who know a person who knows a person who knows them" i.e. a circular chain relationship.
The next step for me is to build this on top of a large triple store, to look at scalability issues. Following this, I will implement a simple web query interface and get some feedback on the query syntax etc. As you can see, at this stage I have completely glossed over issues like datatypes, the difference between resources and literals, and other very important areas in RDF.
The goal is to build a high performance system that will scale to millions of triples, and to do it using plain ANSI-92 SQL. I think this would be of real value to the FOAF scutter builders, allowing a logical progression from spidering RDF statements to being able to query them.
Comments
No comments yet.
Leave a comment
Sorry, the comment form is closed at this time.
