Development/Tutorials/Metadata/Nepomuk/AdvancedQueries: Difference between revisions
No edit summary |
(→Full text queries before KDE 4.4: Remove it - totally useless) |
||
(24 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
{{TutorialBrowser| | {{TutorialBrowser| | ||
series=Nepomuk| | series=[[../|Nepomuk]]| | ||
name=Advanced Sparql Queries in | name=Advanced Sparql Queries in Nepomuk| | ||
pre=[[../RDFIntroduction|Introduction to RDF and Ontologies]]| | pre=[[../RDFIntroduction|Introduction to RDF and Ontologies]], [[../NepomukServer|Nepomuk Server]]| | ||
next=| | next=| | ||
reading=[http://www.dajobe.org/2005/04-sparql/ SPARQL Quick Reference], [http://www.w3.org/TR/rdf-sparql-query/ SPARQL W3C Definition | reading=[http://www.dajobe.org/2005/04-sparql/ SPARQL Quick Reference], [http://www.w3.org/TR/rdf-sparql-query/ SPARQL W3C Definition]| | ||
}} | }} | ||
==Advanced Sparql Queries in | ==Advanced Sparql Queries in Nepomuk== | ||
''This | We will now take a look at how to perform queries against the Nepomuk data repository. | ||
{{Note|The queries presented here a pretty low-level. Only use this approach if the [[../NepomukQuery|Nepomuk Query API]] does not fulfill your needs.}} | |||
===The Main Model=== | |||
Nepomuk uses one main [http://soprano.sourceforge.net/apidox/stable/classSoprano_1_1Model.html Soprano model] which is accessed through the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1ResourceManager.html ResourceManager]: | |||
<syntaxhighlight lang="cpp-qt"> | |||
Soprano::Model* model = Nepomuk::ResourceManager::instance()->mainModel(); | |||
</syntaxhighlight> | |||
===Query Basics=== | |||
Basically performing a query with Nepomuk/Soprano always looks as follows (More details on using the iterator in the [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/classSoprano_1_1QueryResultIterator.html Soprano API documentation].): | |||
<syntaxhighlight lang="cpp-qt"> | |||
QString query = getFancyQueryString(); | |||
Soprano::QueryResultIterator it | |||
= model->executeQuery( query, | |||
Soprano::Query::QueryLanguageSparql ); | |||
while( it.next() ) { | |||
Soprano::Node value = it.binding( "someVariableName" ); | |||
Soprano::BindingSet allBindings = *it; | |||
} | |||
</syntaxhighlight> | |||
===Simple Queries=== | |||
Let us have a look at how a query can be constructed. As an example we will query for all resources that are tagged with a certain tag. Let's imagine that we have a reference to this tag stored in ''myTag''. (Please ignore the fact that Nepomuk::Tag::tagOf essentially returns the same information. After all, we are here to learn how it works.) | |||
<syntaxhighlight lang="cpp-qt"> | |||
#include <Soprano/Model> | |||
#include <Soprano/QueryResultIterator> | |||
#include <Soprano/Vocabulary/NAO> | |||
[...] | |||
Nepomuk::Tag myTag = getOurFancyTag(); | |||
QString query | |||
= QString("select distinct ?r where { ?r %1 %2 . }") | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) ) | |||
.arg( Soprano::Node::resourceToN3(myTag.resourceUri()) ); | |||
Soprano::QueryResultIterator it | |||
= model->executeQuery( query, | |||
Soprano::Query::QueryLanguageSparql ); | |||
while( it.next() ) { | |||
myResourceList << Nepomuk::Resource( it.binding( "r" ).uri() ); | |||
} | |||
</syntaxhighlight> | |||
We begin by constructing the SPARQL query string. It is a simple query and if you know SQL it should be easy to understand. Basically we select resources that match the patterns in the ''where'' statement. In this case the resource needs to have the ''hasTag'' property with object ''myTag''. As we can see, Soprano already provides a set of standard URIs as static instances in the [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary] namespace. And since we have the Nepomuk resource object for the tag we can simply use its unique URI to directly access the tagged resources. | |||
But what if we do not have the tag URI but only its label, i.e. the name given by the user? | |||
Also no problem with SPARQL: | |||
<syntaxhighlight lang="cpp-qt"> | |||
QString myTagLabel = getFancytagLabel(); | |||
QString query | |||
= QString("select distinct ?r where { " | |||
"?r %1 ?tag . " | |||
"?tag %2 %3 . }") | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) ) | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::label()) ) | |||
.arg( Soprano::Node(myTagLabel).toN3() ); | |||
</syntaxhighlight> | |||
This already looks a lot more confusing as the previous example but that is mainly due to the QString argument paramters. Let's clean it up w bit by using SPARQL prefix declarations: | |||
<syntaxhighlight lang="cpp-qt"> | |||
QString query | |||
= QString("PREFIX nao: %1 " | |||
"PREFIX rdfs: %2 " | |||
"PREFIX xls: %3 " | |||
"select distinct ?r where { " | |||
"?r nao:hasTag ?tag . " | |||
"?tag rdfs:label \"%4\"^^xls:string . }") | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::naoNamespace()) ) | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::rdfsNamespace()) ) | |||
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::XMLSchema::xsdNamespace()) ) | |||
.arg( myTagLabel ); | |||
</syntaxhighlight> | |||
Both queries are the same and it is up to the query writer to decide which version he or she prefers. We are just presenting both versions here for demonstration purposes. | |||
Now let us analyse what is happening here. Instead of just matching a single graph pattern, we match two where the first one introduces another variable which is then reused in the second one. ''rdfs:label'' has a string literal range, meaning that each object related to a resource via the ''rdfs:label'' property is a string literal. And in this case we want to select the tag that has ''myTagLabel'' as its label. | |||
===Bringing more context into the mix=== | |||
In [[../RDFIntroduction|Introduction to RDF and Ontologies]] we briefly learned about ''named graphs'' or ''context'' which make up the fourth part of each statement in Nepomuk. We can now use this information to filter our results based on creation dates. Imagine for example that we want to retrieve all resources tagged before the first of January 2008. We do this by introducing some more complex SPARQL syntax. For simplicity we go back to our first example of matching the tag URI directly to keep the query from getting too unreadable. But of course both can be combined. (Keep in mind that we only use the prefix syntax here for readability. In actual code it may be better to directly add the URIs from [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary] to prevent typing errors in property and class names.) | |||
<syntaxhighlight lang="cpp-qt"> | |||
QDateTime firstOfJanuary = getFirstOfJanuary(); | |||
QString query | |||
= QString("PREFIX nao: <%1> " | |||
"PREFIX rdfs: <%2> " | |||
"select distinct ?r where { " | |||
"graph ?g { ?r nao:hasTag <%3> . } " | |||
"?g nao:created ?time . " | |||
"FILTER(?time < %4) . }") | |||
.arg( Soprano::Vocabulary::NAO::naoNamespace().toString() ) | |||
.arg( Soprano::Vocabulary::RDFS::rdfsNamespace().toString() ) | |||
.arg( myTag.resourceUri().toString() ) | |||
.arg( Soprano::Node::literalToN3( firstOfJanuary ) ); | |||
</syntaxhighlight> | |||
This query contains three new concepts: | |||
# As we can see SPARQL does not simple add the context as fourth parameter but needs us to suround the triples we want to match into a certain context with the ''graph'' keyword. | |||
# We use the SPARQL [http://www.w3.org/TR/rdf-sparql-query/#tests ''FILTER''] keyword to filter out only those graphs/contexts that have a ''nao:created'' value smaller than January, first. | |||
# We use [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/classSoprano_1_1LiteralValue.html Soprano::LiteralValue] instead of QDateTime directly. This is important since QDateTime does not support the RDF way of formatting a dateTime string. Thus, we need to use Soprano's internal dateTime string conversion algorithm by using LiteralValue. | |||
=== Full text queries === | |||
With KDE 4.4 Nepomuk depends on [http://soprano.sourceforge.net/apidox/trunk/soprano_backend_virtuoso.html Virtuoso] for data storage. Virtuoso brings a lot of nice [http://docs.openlinksw.com/virtuoso/rdfsparql.html#sparqlextensions extensions to SPARQL]. Most importantly the full text search which is used through the artificial ''bif:contains'' property. | |||
This allows to combine graph queries with full text queries in a nice way: | |||
<syntaxhighlight lang="text"> | |||
select ?r where { ?r nao:prefLabel ?label . | |||
?label bif:contains 'nepomuk' . } | |||
</syntaxhighlight> | |||
The query above will find any resources that contain ''nepomuk'' in their label. | |||
Of course wildcards are supported, too. However, be aware that when using wildcards the expression itself needs to be enclosed in quotes as follows: | |||
<syntaxhighlight lang="text"> | |||
select ?r where { ?r nao:prefLabel ?label . | |||
?label bif:contains "'nepomuk*'" . } | |||
</syntaxhighlight> | |||
For most simple queries (simple queries do not use any back-referencing for example) the [[../NepomukQuery|Nepomuk desktop query API]] should be sufficient. |
Latest revision as of 08:16, 24 August 2012
Tutorial Series | Nepomuk |
Previous | Introduction to RDF and Ontologies, Nepomuk Server |
What's Next | |
Further Reading | SPARQL Quick Reference, SPARQL W3C Definition |
Advanced Sparql Queries in Nepomuk
We will now take a look at how to perform queries against the Nepomuk data repository.
The Main Model
Nepomuk uses one main Soprano model which is accessed through the ResourceManager:
Soprano::Model* model = Nepomuk::ResourceManager::instance()->mainModel();
Query Basics
Basically performing a query with Nepomuk/Soprano always looks as follows (More details on using the iterator in the Soprano API documentation.):
QString query = getFancyQueryString();
Soprano::QueryResultIterator it
= model->executeQuery( query,
Soprano::Query::QueryLanguageSparql );
while( it.next() ) {
Soprano::Node value = it.binding( "someVariableName" );
Soprano::BindingSet allBindings = *it;
}
Simple Queries
Let us have a look at how a query can be constructed. As an example we will query for all resources that are tagged with a certain tag. Let's imagine that we have a reference to this tag stored in myTag. (Please ignore the fact that Nepomuk::Tag::tagOf essentially returns the same information. After all, we are here to learn how it works.)
#include <Soprano/Model>
#include <Soprano/QueryResultIterator>
#include <Soprano/Vocabulary/NAO>
[...]
Nepomuk::Tag myTag = getOurFancyTag();
QString query
= QString("select distinct ?r where { ?r %1 %2 . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) )
.arg( Soprano::Node::resourceToN3(myTag.resourceUri()) );
Soprano::QueryResultIterator it
= model->executeQuery( query,
Soprano::Query::QueryLanguageSparql );
while( it.next() ) {
myResourceList << Nepomuk::Resource( it.binding( "r" ).uri() );
}
We begin by constructing the SPARQL query string. It is a simple query and if you know SQL it should be easy to understand. Basically we select resources that match the patterns in the where statement. In this case the resource needs to have the hasTag property with object myTag. As we can see, Soprano already provides a set of standard URIs as static instances in the Soprano::Vocabulary namespace. And since we have the Nepomuk resource object for the tag we can simply use its unique URI to directly access the tagged resources.
But what if we do not have the tag URI but only its label, i.e. the name given by the user?
Also no problem with SPARQL:
QString myTagLabel = getFancytagLabel();
QString query
= QString("select distinct ?r where { "
"?r %1 ?tag . "
"?tag %2 %3 . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::label()) )
.arg( Soprano::Node(myTagLabel).toN3() );
This already looks a lot more confusing as the previous example but that is mainly due to the QString argument paramters. Let's clean it up w bit by using SPARQL prefix declarations:
QString query
= QString("PREFIX nao: %1 "
"PREFIX rdfs: %2 "
"PREFIX xls: %3 "
"select distinct ?r where { "
"?r nao:hasTag ?tag . "
"?tag rdfs:label \"%4\"^^xls:string . }")
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::naoNamespace()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::RDFS::rdfsNamespace()) )
.arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::XMLSchema::xsdNamespace()) )
.arg( myTagLabel );
Both queries are the same and it is up to the query writer to decide which version he or she prefers. We are just presenting both versions here for demonstration purposes.
Now let us analyse what is happening here. Instead of just matching a single graph pattern, we match two where the first one introduces another variable which is then reused in the second one. rdfs:label has a string literal range, meaning that each object related to a resource via the rdfs:label property is a string literal. And in this case we want to select the tag that has myTagLabel as its label.
Bringing more context into the mix
In Introduction to RDF and Ontologies we briefly learned about named graphs or context which make up the fourth part of each statement in Nepomuk. We can now use this information to filter our results based on creation dates. Imagine for example that we want to retrieve all resources tagged before the first of January 2008. We do this by introducing some more complex SPARQL syntax. For simplicity we go back to our first example of matching the tag URI directly to keep the query from getting too unreadable. But of course both can be combined. (Keep in mind that we only use the prefix syntax here for readability. In actual code it may be better to directly add the URIs from Soprano::Vocabulary to prevent typing errors in property and class names.)
QDateTime firstOfJanuary = getFirstOfJanuary();
QString query
= QString("PREFIX nao: <%1> "
"PREFIX rdfs: <%2> "
"select distinct ?r where { "
"graph ?g { ?r nao:hasTag <%3> . } "
"?g nao:created ?time . "
"FILTER(?time < %4) . }")
.arg( Soprano::Vocabulary::NAO::naoNamespace().toString() )
.arg( Soprano::Vocabulary::RDFS::rdfsNamespace().toString() )
.arg( myTag.resourceUri().toString() )
.arg( Soprano::Node::literalToN3( firstOfJanuary ) );
This query contains three new concepts:
- As we can see SPARQL does not simple add the context as fourth parameter but needs us to suround the triples we want to match into a certain context with the graph keyword.
- We use the SPARQL FILTER keyword to filter out only those graphs/contexts that have a nao:created value smaller than January, first.
- We use Soprano::LiteralValue instead of QDateTime directly. This is important since QDateTime does not support the RDF way of formatting a dateTime string. Thus, we need to use Soprano's internal dateTime string conversion algorithm by using LiteralValue.
Full text queries
With KDE 4.4 Nepomuk depends on Virtuoso for data storage. Virtuoso brings a lot of nice extensions to SPARQL. Most importantly the full text search which is used through the artificial bif:contains property.
This allows to combine graph queries with full text queries in a nice way:
select ?r where { ?r nao:prefLabel ?label .
?label bif:contains 'nepomuk' . }
The query above will find any resources that contain nepomuk in their label.
Of course wildcards are supported, too. However, be aware that when using wildcards the expression itself needs to be enclosed in quotes as follows:
select ?r where { ?r nao:prefLabel ?label .
?label bif:contains "'nepomuk*'" . }
For most simple queries (simple queries do not use any back-referencing for example) the Nepomuk desktop query API should be sufficient.