Development/Tutorials/Metadata/Nepomuk/TipsAndTricks: Difference between revisions

From KDE TechBase
mNo edit summary
(Marked this version for translation)
 
(33 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{Template:I18n/Language Navigation Bar|Development/Tutorials/Metadata/Nepomuk/TipsAndTricks}}
<languages />
<translate>


{{TutorialBrowser|
== Using ontology URIs in your code  == <!--T:1-->
series=[[../|Nepomuk]]|
name=Nepmuk Tips and Tricks|
reading=[[../Resources|Resource Handling with Nepomuk]], 
[[../AdvancedQueries|Advanced Queries with SPARQL]],
[[../RDFIntroduction|RDF and Ontologies in Nepomuk]]
}}


== Always initialize Nepomuk ==
<!--T:2-->
Make sure that somewhere in the initialization code of your application or library Nepomuk is initialized via:
One often needs the URI of a specific class or a specific property in ones code. And not all ontologies are provided by the very convenient [http://soprano.sourceforge.net/apidox/stable/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary] namespace.


<code cppqt>
<!--T:3-->
Nepomuk::ResourceManager::instance()->init();
The solution is rather simple: create your own vocabulary namespaces by using Soprano's own onto2vocabularyclass command line tool. It can generate convenient vocabulary namespaces for you. The Soprano documentation shows how to [http://soprano.sourceforge.net/apidox/trunk/soprano_devel_tools.html use it manually] or even simpler with a [http://soprano.sourceforge.net/apidox/trunk/soprano_howto.html#cmake_magic simple CMake macro].
</code>


<!--T:4-->
<br />


== Using ontology URIs in your code ==
== Mind the Difference between QString and QUrl  == <!--T:5-->
One often needs the URI of a specific class or a specific property in ones code. And not all ontologies are provided by the very convenient [http://soprano.sourceforge.net/apidox/stable/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary] namespace.


The solution is rather simple: create your own vocbulary namespaces by using Soprano's own onto2vocabularyclass command line tool. It can generate convenient vocabulary namespaces for you. The Soprano documentation shows how to [http://soprano.sourceforge.net/apidox/trunk/soprano_devel_tools.html use it manually] or even simpler with a [http://soprano.sourceforge.net/apidox/trunk/soprano_howto.html#cmake_magic simple CMake macro].
<!--T:6-->
[http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Resource.html Nepomuk::Resource] provides two constructors: one taking a {{qt|QString}} as identifier or URI and one taking a {{qt|QUrl}}.  


<!--T:7-->
The latter one is really simple: the given URI is used as the resource URI. If the resource exists, its data is used, otherwise it will be created with exactly that URI.


== Debugging the created data ==
<!--T:8-->
=== Using sopranocmd ===
The {{qt|QString}} one is a bit trickier. It will try to be clever about the parameter and see if it is a URI. If no resource with that URI (if it is a URI) exists, it is interpreted as an identifier ([http://www.semanticdesktop.org/ontologies/nao/#mozTocId802441 nao:identifier]). Resource checks if a resource with that identifier exists. If so, its data is loaded, if not, a new resource with a random URI and that string as identifier is created.  
When using Nepomuk one creates a lot of RDF statements in the Nepomuk RDF storage. It is often of interest to check which data has been created, if statements have been correctly created or simply look at existing data.


Soprano provides a nice command line client to do all this called ''sopranocmd''. It provides all the features one needs to debug data: it can add and remove statements, list and query them, import and export whole RDF files, and even monitor for ''[http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Model.html#3e2595166caac3621fd4268e46049adf statementAdded]'' and ''[http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Model.html#8fa85bfce2f83e89f83ef602cd818991 statementRemoved]'' events.
<!--T:9-->
However, '''be aware that nothing is written to '''Nepomuk''' until the first writing call to Resource such as setProperty or addType.'''


To access the Nepomuk storage one would typically use the D-Bus interface:
<!--T:10-->
<br />


<code>
== Debugging the created data  == <!--T:11-->
# sopranocmd --dbus org.kde.NepomukStorage --model main <command> \
    <parameters>
</code>


If one wanted to list all the resources that have been tagged with the tag whose resource URI is nepomuk:/foobar one would use the following command:
<!--T:12-->
Soprano provides a command line client to connect to the storage service. It's called <tt>sopranocmd</tt>. It provides all the features one needs to debug data. It is recommended that you only use sopranocmd for running queries.


<code>
<!--T:13-->
# sopranocmd --dbus org.kde.NepomukStorage --model main list \
Running sopranocmd is cumbersome because of the large number of arguments it requires. This can be made simpler by adding the following alias -
    "" "" "<nepomuk:/foobar>"
<syntaxhighlight lang="bash">alias nepomukcmd="sopranocmd --socket `kde4-config --path socket`nepomuk-socket --model main --nrl"</syntaxhighlight>  
</code>


or one would use a SPARQL query ('''sopranocmd supports the standard URI prefixes out of the box'''):


<code>
<!--T:14-->
# sopranocmd --dbus org.kde.NepomukStorage --model main query \
For example -
<syntaxhighlight lang="text">
# nepomukcmd query \
     "select ?r where { ?r nao:hasTag ?tag . \
     "select ?r where { ?r nao:hasTag ?tag . \
                       ?tag nao:prefLabel 'foobar'^^xsd:string . }"
                       ?tag nao:prefLabel 'foobar'^^xsd:string . }"
</code>
</syntaxhighlight>  


To monitor all statements that are added and removed from the Nepomuk storage one would simply use the following command (as with ''list'' one can specify a filter to only list the added and removed statements one is interested in):
=== Using Konqueror  === <!--T:15-->
<code>
# sopranocmd --dbus org.kde.NepomukStorage --model main monitor
</code>


<code># sopranocmd --help</code> is your friend for all details.
<!--T:16-->
In the [http://websvn.kde.org:80/trunk/playground/base/nepomuk-kde/kioslaves/nepomuk/ Nepomuk playground] repository lives a KIO slave which can handle the ''nepomuk:/'' protocol. It will display all properties of a Nepomuk resource including its links to other resources and the backlinks. This is a convenient way of looking at the Nepomuk data. The KIO slave even support removal of resources.  


<!--T:17-->
[[Image:Nepomuk kio slave.png|560px]]


=== Using Konqueror ===
<!--T:18-->
In the [http://websvn.kde.org:80/trunk/playground/base/nepomuk-kde/kioslaves/nepomuk/ Nepomuk playground] repository lives a KIO slave which can handle the ''nepomuk:/'' protocol. It will display all properties of a Nepomuk resource including its links to other resources and the backlinks. This is a convenient way of looking at the Nepomuk data. The KIO slave even support removal of resources.
<br />


[[File:Nepomuk_kio_slave.png|560px]]
=== Using NepomukShell  === <!--T:19-->


<!--T:20-->
NepomukShell is a maintenance and debugging tool, which lives in its own git repository at  [https://projects.kde.org/projects/extragear/utils/nepomukshell nepomukshell]. It is a simple tool that let's one browse all resources in '''Nepomuk'''. Additionally it allows to create subclasses and properties and remove resources.
{{Remember|2='''Caution'''|1=Do only create subclasses and properties from PIMO classes and properties!}}


=== Using PIMOShell ===
<!--T:21-->
The PIMOShell is another tool that lives in the [http://websvn.kde.org:80/trunk/playground/base/nepomuk-kde/pimoshell playground]. It is a simple tool that let's one browse all resources in Nepomuk. Additionally it allows to create subclasses and properties ('''Caution: do only create subclasses and properties from PIMO classes and properties!''') and remove resources.
[[Image:Pimoshell.png|560px]]


[[File:Pimoshell.png|560px]]
== Constructing SPARQL queries  == <!--T:22-->


<!--T:23-->
{{Tip|1= In most cases the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/namespaceNepomuk_1_1Query.html Nepomuk Query API] should be enough and prevent you from writing your own SPARQL which is hard to debug.}}


== Constructing SPARQL queries ==
<!--T:24-->
Whenever doing something a bit fancier with Nepomuk one has to use SPARQL queries via  
Whenever doing something a bit fancier with '''Nepomuk''' one has to use SPARQL queries via <syntaxhighlight lang="cpp-qt">
<code cppqt>
Nepomuk::ResourceManager::instance()->mainModel()
Nepomuk::ResourceManager::instance()->mainModel()
     ->executeQuery( myQueryString,  
     ->executeQuery( myQueryString,  
                     Soprano::Query::QueryLanguageSparql );
                     Soprano::Query::QueryLanguageSparql );
</code>
</syntaxhighlight>  
Constructing these queries can be a bit cumbersome since one has to use a lot of class and property URIs from different ontologies. Also literals have to be formatted according to the N3 syntax used in SPARQL. Luckily Soprano provides the necessary tools to do exactly that: [http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#ad4c8ab988ae7d9fd587027087b593e4 Soprano::Node::toN3], [http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#d1c2618a28a13c6eac042ddccbf78e6a Soprano::Node::resourceToN3], and [http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#a66acf156e82b866114d90cd0c9ce13c Soprano::Node::literalToN3] take care of all formatting and percent-encoding you need. Using those methods the code to create queries might look ugly but the resulting queries are more likely to be correctly encoded and introduce less code duplication.


Typically one would use QString::arg like so (be aware that the standard prefixes are NOT supported out-of-the-box as with sopranocmd):
<!--T:25-->
Constructing these queries can be a bit cumbersome since one has to use a lot of class and property URIs from different ontologies. Also literals have to be formatted according to the N3 syntax used in SPARQL. Luckily Soprano provides the necessary tools to do exactly that: <br />[http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#ad4c8ab988ae7d9fd587027087b593e4 Soprano::Node::toN3], <br />[http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#d1c2618a28a13c6eac042ddccbf78e6a Soprano::Node::resourceToN3], and <br />[http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Node.html#a66acf156e82b866114d90cd0c9ce13c Soprano::Node::literalToN3] take care of all formatting and percent-encoding you need. Using those methods the code to create queries might look ugly but the resulting queries are more likely to be correctly encoded and introduce less code duplication.


<code cppqt>
<!--T:26-->
Typically one would use QString::arg like so (be aware that the standard prefixes are NOT supported out-of-the-box as with sopranocmd):
 
<!--T:27-->
<syntaxhighlight lang="cpp-qt">
using namespace Soprano;
using namespace Soprano;


<!--T:28-->
QString myQuery
QString myQuery
     = QString("select ?r where { "
     = QString("select ?r where { "
Line 93: Line 99:
       .arg(Node::literalToN3("foobar")));
       .arg(Node::literalToN3("foobar")));


</code>
<!--T:29-->
</syntaxhighlight>
 
<!--T:30-->
This will create the same query we used above only using no hard-coded components whatsoever.
 
== Debugging virtuoso-t  == <!--T:31-->
 
<!--T:32-->
If virtuoso-t consumes a lot of CPU resources but there are no active queries analysis has to go a bit deeper. Virtuoso is started through Soprano with certain parameters which are set in a temporary ini-file (/tmp/virtuoso_XXXX.ini). Soprano needs to be modified manually to start Virtuoso with different parameters in the ini-file, e.g. to improve virtuoso-t's behaviour by modifying backends/virtuoso/virtuosocontroller.cpp (Soprano) and setting NumberOfBuffers to 40000 (line 344) and SchedulerInterval to 0 (line 350).
 
<!--T:33-->
After re-compiling soprano one has to attach gdb to virtuoso-t as soon as it starts consuming CPU and create a full threaded backtrace:
 
<!--T:34-->
<syntaxhighlight lang="cpp-qt">
set logging file /tmp/virtuoso-t.out
set logging on
thread apply all bt full
</syntaxhighlight>
 
<!--T:35-->
Note: The above settings should only be used for debugging!


This will create the same query we used above only using no hard-coded components whatsoever.
<!--T:36-->
[[Category:Documentation]]
[[Category:Development]]
[[Category:Tutorials]]
</translate>

Latest revision as of 10:30, 14 December 2012

Other languages:
  • English

Using ontology URIs in your code

One often needs the URI of a specific class or a specific property in ones code. And not all ontologies are provided by the very convenient Soprano::Vocabulary namespace.

The solution is rather simple: create your own vocabulary namespaces by using Soprano's own onto2vocabularyclass command line tool. It can generate convenient vocabulary namespaces for you. The Soprano documentation shows how to use it manually or even simpler with a simple CMake macro.


Mind the Difference between QString and QUrl

Nepomuk::Resource provides two constructors: one taking a QString as identifier or URI and one taking a QUrl.

The latter one is really simple: the given URI is used as the resource URI. If the resource exists, its data is used, otherwise it will be created with exactly that URI.

The QString one is a bit trickier. It will try to be clever about the parameter and see if it is a URI. If no resource with that URI (if it is a URI) exists, it is interpreted as an identifier (nao:identifier). Resource checks if a resource with that identifier exists. If so, its data is loaded, if not, a new resource with a random URI and that string as identifier is created.

However, be aware that nothing is written to Nepomuk until the first writing call to Resource such as setProperty or addType.


Debugging the created data

Soprano provides a command line client to connect to the storage service. It's called sopranocmd. It provides all the features one needs to debug data. It is recommended that you only use sopranocmd for running queries.

Running sopranocmd is cumbersome because of the large number of arguments it requires. This can be made simpler by adding the following alias -

alias nepomukcmd="sopranocmd --socket `kde4-config --path socket`nepomuk-socket --model main --nrl"


For example -

# nepomukcmd query \
    "select ?r where { ?r nao:hasTag ?tag . \
                       ?tag nao:prefLabel 'foobar'^^xsd:string . }"

Using Konqueror

In the Nepomuk playground repository lives a KIO slave which can handle the nepomuk:/ protocol. It will display all properties of a Nepomuk resource including its links to other resources and the backlinks. This is a convenient way of looking at the Nepomuk data. The KIO slave even support removal of resources.


Using NepomukShell

NepomukShell is a maintenance and debugging tool, which lives in its own git repository at nepomukshell. It is a simple tool that let's one browse all resources in Nepomuk. Additionally it allows to create subclasses and properties and remove resources.

 
Caution
Do only create subclasses and properties from PIMO classes and properties!


Constructing SPARQL queries

Tip
In most cases the Nepomuk Query API should be enough and prevent you from writing your own SPARQL which is hard to debug.


Whenever doing something a bit fancier with Nepomuk one has to use SPARQL queries via

Nepomuk::ResourceManager::instance()->mainModel()
    ->executeQuery( myQueryString, 
                    Soprano::Query::QueryLanguageSparql );

Constructing these queries can be a bit cumbersome since one has to use a lot of class and property URIs from different ontologies. Also literals have to be formatted according to the N3 syntax used in SPARQL. Luckily Soprano provides the necessary tools to do exactly that:
Soprano::Node::toN3,
Soprano::Node::resourceToN3, and
Soprano::Node::literalToN3 take care of all formatting and percent-encoding you need. Using those methods the code to create queries might look ugly but the resulting queries are more likely to be correctly encoded and introduce less code duplication.

Typically one would use QString::arg like so (be aware that the standard prefixes are NOT supported out-of-the-box as with sopranocmd):

using namespace Soprano;

QString myQuery
     = QString("select ?r where { "
               "?r %1 ?v . "
               "?v %2 %3 . }")
       .arg(Node::resourceToN3(Vocabulary::NAO::hasTag()))
       .arg(Node::resourceToN3(Vocabulary::NAO::prefLabel()))
       .arg(Node::literalToN3("foobar")));

This will create the same query we used above only using no hard-coded components whatsoever.

Debugging virtuoso-t

If virtuoso-t consumes a lot of CPU resources but there are no active queries analysis has to go a bit deeper. Virtuoso is started through Soprano with certain parameters which are set in a temporary ini-file (/tmp/virtuoso_XXXX.ini). Soprano needs to be modified manually to start Virtuoso with different parameters in the ini-file, e.g. to improve virtuoso-t's behaviour by modifying backends/virtuoso/virtuosocontroller.cpp (Soprano) and setting NumberOfBuffers to 40000 (line 344) and SchedulerInterval to 0 (line 350).

After re-compiling soprano one has to attach gdb to virtuoso-t as soon as it starts consuming CPU and create a full threaded backtrace:

set logging file /tmp/virtuoso-t.out
set logging on
thread apply all bt full

Note: The above settings should only be used for debugging!