Development/Tutorials/Metadata/Nepomuk/Resources: Difference between revisions

Latest revision as of 10:47, 15 July 2012

Resource Metadata Handling with Nepomuk

Tutorial Series	Nepomuk
Previous	Introduction to RDF and Ontologies,Data Layout in Nepomuk
What's Next	Using the Nepomuk Resource Generator
Further Reading	Nepomuk Quickstart for the impatient, The Nepomuk Server, Nepomuk API Documentation

Introduction

Three types of meta data can be identified:

Meta data that is stored with the data itself and is available at all times. This includes id3 tags, the number of pages in a PDF document, or even the size of a file or the subject of an email.
Meta data that is created by the user manually like annotations or tags that are assigned to files, emails, or whatever resources.
Meta data that can be gathered automatically by applications such as the source of a downloaded file or the email an attachment was saved from or the original when copying a file locally.

Type 1 is already handled in many implementations. KDE itself uses Strigi which allows the extraction of this kind of meta information from files.

Nepomuk is intended for meta data of type 2 and 3. It provides an easy way to create and read meta data for arbitrary resources (this includes for example files or emails, but also contacts or maybe even a paragraph in a pdf file).

The simplest type of meta data that can be handled with Nepomuk is a comment. It is a simple string associated with a resource (a file for example). This comment is created by the user using an application that is based on Nepomuk. Nepomuk's core is designed to allow arbitrary types of meta data, i.e. any resource can be related with any other resource or value by simply naming the relation and providing the value. The power of Nepomuk from a developer's point of view, however, lies in that it can provide a C++ class for each type of resource. Such a C++ class then provides convenience methods to allow a simple handling of the metadata.

Nepomuk is resource based. Thus, working with Nepomuk is always done with instances representing a certain resource. This resource has a list of properties. Properties are named and have a certain type. The type can either be another resource (compare a file that was an attachment from an email) or a literal (this means for example a string, or an integer; the comment mentioned earlier would be a string literal). Each property can either have a cardinality of 1 (again a file can only be saved from one email) or greater than 1 (i.e. infinite, like one file can have many associated comments).

Getting started

Nepomuk is already part of kdelibs. To use it make sure you have installed Soprano from the kdesupport module.

To use Nepomuk in your application simply make cmake search for it using the Nepomuk macro that comes with kdelibs:

find_package(Nepomuk REQUIRED)
include_directories(${NEPOMUK_INCLUDES})
target_link_libraries( [...] ${NEPOMUK_LIBRARIES})

Developing with Nepomuk

Now let's dive into the good stuff and do some coding. There are basicly two ways to use Nepomuk: 1. by setting properties manually and 2. by generating convenience classes that provide proper method calls for each property.

The manual way of doing things

In Nepomuk metadata is represented as key/value pairs belonging to a certain resource which is represented by the Nepomuk::Resource class.

Setting and getting metadata is done via two methods:

void Nepomuk::Resource::setProperty( const QUrl& key, const Nepomuk::Variant& value );
Variant Nepomuk::Resource::property( const QUrl& key );

Note

Why don't we use QVariant instead of Nepomuk::Variant? Actually Nepomuk::Variant has been derived from QVariant and extended with some nice list-handling features that will come in handy later on.

Since in Nepomuk all metadata is stored as RDF all keys are actually URIs which are defined in specific ontologies (see Introduction to RDF and Ontologies for details) and look something like this: http://semanticdesktop.org/ontologies/2007/08/15/nao#Tag

Soprano::Vocabulary provides static instances of the most often used URIs as we see below (Soprano also comes with a code generation tool which can create these kind of namespaces from ontology files).

Nepomuk::Resource tag( "TestTag", Soprano::Vocabulary::NAO::Tag() );
tag.setProperty( Soprano::Vocabulary::RDFS::label(), QString("Important stuff") );

Nepomuk::Resource aFile( "/tmp/testfile.txt", Soprano::Vocabulary::Xesam::File() );
aFile.setProperty( Soprano::Vocabulary::NAO::hasTag(), tag );

Now what does this do then?

We first create a resource with an identifier "TestTag" and type "http://www.semanticdesktop.org/ontologies/2007/08/15/nao/#Tag" and then set a property (rdfs:label) to a string value "Important Stuff". Then we create a second resource and assign it the newly created tag.

Do it the smooth way

The second way of using Nepomuk is to rely on its class generator which, as mentioned in the introduction, generates C++ classes from ontologies. Suddenly the above code looks much simpler:

Nepomuk::Tag tag( "TestTag" );
tag.setLabel( "Important Stuff" );

Nepomuk::File aFile( "/tmp/testfile.txt" );
aFile.addTag( tag );

Isn't this much more readable and much easier to remember and debug? The nice thing is that both Tag and File are subclasses of Nepomuk::Resource and internally methods like addLabel are only based on setProperty and some magic use of Nepomuk::Variant.

Let us continue with the tagging example. The Tag class provides some more nice methods:

QList<Resource> Tag::TagOf() provides a list of all resources that have been tagged with this tag.
static QList<Tag> Tag::allTags() provide all existing tags

Using these methods it is already possible to provide basic tagging in your application.

Tips

Only one instance of each resource

In Nepomuk each resource has only one instance at all times. Thus, the following code will actually result in only one new tag with two labels since the second one is simply a local copy:

Nepomuk::Tag tagA( "test" );
Nepomuk::Tag tagB( "test" );

tagA.setLabel( "testtagA" );
tagB.setLabel( "testtagB" );

The actual data is cached and synced back regularly by the framework.

Two ways of accessing existing data

The constructor of Nepomuk::Resource (and with it the constructors of all subclasses) allows to use an identifier or an actual resource URI for creation. The framework will then look if the passed string already exists as an URI or an identifier and load the data appropriately. If neither identifier or URi exist the passed string is used as a new identifier and a new random resource URI is generated. To understand what this means it is important to know that each resource in Nepomuk is uniquely identified by its URI (which is also used for storage in RDF) but can also have an arbitrary number of alternative identifiers which are more convenient for certain applications to find a resource. A typical example is the path of a file or the name of a tag.