Development/Tutorials/Metadata/Nepomuk/AdvancedQueries
Tutorial Series | Nepomuk |
Previous | Introduction to RDF and Ontologies, Nepomuk Server |
What's Next | |
Further Reading | SPARQL Quick Reference, SPARQL W3C Definition |
Advanced Sparql Queries in Nepomuk
We will now take a look at how to perform queries against the Nepomuk data repository.
The Main Model
Nepomuk uses one main Soprano model which is accessed through the ResourceManager:
Soprano::Model* model = Nepomuk::ResourceManager::instance()->mainModel();
Query Basics
Basically performing a query with Nepomuk/Soprano always looks as follows (More details on using the iterator in the Soprano API documentation.):
QString query = getFancyQueryString();
Soprano::QueryResultIterator it
= model->executeQuery( query,
Soprano::Query::QueryLanguageSparql );
while( it.next() ) {
Soprano::Node value = it.binding( "someVariableName" );
Soprano::BindingSet allBindings = *it;
}
Simple Queries
Let us have a look at how a query can be constructed. As an example we will query for all resources that are tagged with a certain tag. Let's imagine that we have a reference to this tag stored in myTag. (Please ignore the fact that Nepomuk::Tag::tagOf essentially returns the same information. After all, we are here to learn how it works.)
#include <Soprano/Model>
#include <Soprano/QueryResultIterator>
#include <Soprano/Vocabulary/NAO>
[...]
Nepomuk::Tag myTag = getOurFancyTag();
QString query
= QString("select distinct ?r where { ?r %1 %2 . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) )
.arg( Soprano::Node::resourceToN3(myTag.resourceUri()) );
Soprano::QueryResultIterator it
= model->executeQuery( query,
Soprano::Query::QueryLanguageSparql );
while( it.next() ) {
myResourceList << Nepomuk::Resource( it.binding( "r" ).uri() );
}
We begin by constructing the SPARQL query string. It is a simple query and if you know SQL it should be easy to understand. Basically we select resources that match the patterns in the where statement. In this case the resource needs to have the hasTag property with object myTag. As we can see, Soprano already provides a set of standard URIs as static instances in the Soprano::Vocabulary namespace. And since we have the Nepomuk resource object for the tag we can simply use its unique URI to directly access the tagged resources.
But what if we do not have the tag URI but only its label, i.e. the name given by the user?
Also no problem with SPARQL:
QString myTagLabel = getFancytagLabel();
QString query
= QString("select distinct ?r where { "
"?r %1 ?tag . "
"?tag %2 %3 . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::label()) )
.arg( Soprano::Node(myTagLabel).toN3() );
This already looks a lot more confusing as the previous example but that is mainly due to the QString argument paramters. Let's clean it up w bit by using SPARQL prefix declarations:
QString query
= QString("PREFIX nao: %1 "
"PREFIX rdfs: %2 "
"PREFIX xls: %3 "
"select distinct ?r where { "
"?r nao:hasTag ?tag . "
"?tag rdfs:label \"%4\"^^xls:string . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::naoNamespace()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::rdfsNamespace()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::XMLSchema::xsdNamespace()) )
.arg( myTagLabel );
Both queries are the same and it is up to the query writer to decide which version he or she prefers. We are just presenting both versions here for demonstration purposes.
Now let us analyse what is happening here. Instead of just matching a single graph pattern, we match two where the first one introduces another variable which is then reused in the second one. rdfs:label has a string literal range, meaning that each object related to a resource via the rdfs:label property is a string literal. And in this case we want to select the tag that has myTagLabel as its label.
Bringing more context into the mix
In Introduction to RDF and Ontologies we briefly learned about named graphs or context which make up the fourth part of each statement in Nepomuk. We can now use this information to filter our results based on creation dates. Imagine for example that we want to retrieve all resources tagged before the first of January 2008. We do this by introducing some more complex SPARQL syntax. For simplicity we go back to our first example of matching the tag URI directly to keep the query from getting too unreadable. But of course both can be combined. (Keep in mind that we only use the prefix syntax here for readability. In actual code it may be better to directly add the URIs from Soprano::Vocabulary to prevent typing errors in property and class names.)
QDateTime firstOfJanuary = getFirstOfJanuary();
QString query
= QString("PREFIX nao: <%1> "
"PREFIX rdfs: <%2> "
"select distinct ?r where { "
"graph ?g { ?r nao:hasTag <%3> . } "
"?g nao:created ?time . "
"FILTER(?time < %4) . }")
.arg( Soprano::Vocabulary::NAO::naoNamespace().toString() )
.arg( Soprano::Vocabulary::RDFS::rdfsNamespace().toString() )
.arg( myTag.resourceUri().toString() )
.arg( Soprano::Node::literalToN3( firstOfJanuary ) );
This query contains three new concepts:
- As we can see SPARQL does not simple add the context as fourth parameter but needs us to suround the triples we want to match into a certain context with the graph keyword.
- We use the SPARQL FILTER keyword to filter out only those graphs/contexts that have a nao:created value smaller than January, first.
- We use Soprano::LiteralValue instead of QDateTime directly. This is important since QDateTime does not support the RDF way of formatting a dateTime string. Thus, we need to use Soprano's internal dateTime string conversion algorithm by using LiteralValue.
Full text queries
With KDE 4.4 Nepomuk depends on Virtuoso for data storage. Virtuoso brings a lot of nice extensions to SPARQL. Most importantly the full text search which is used through the artificial bif:contains property.
This allows to combine graph queries with full text queries in a nice way:
select ?r where { ?r nao:prefLabel ?label .
?label bif:contains 'nepomuk' . }
The query above will find any resources that contain nepomuk in their label.
Of course wildcards are supported, too. However, be aware that when using wildcards the expression itself needs to be enclosed in quotes as follows:
select ?r where { ?r nao:prefLabel ?label .
?label bif:contains "'nepomuk*'" . }
For most simple queries (simple queries do not use any back-referencing for example) the Nepomuk desktop query API should be sufficient.