< Development‎ | Tutorials‎ | Metadata/Nepomuk
Revision as of 19:11, 3 April 2008 by Trueg (Talk | contribs) (The manual way of doing things)

Jump to: navigation, search
Resource Metadata Handling with Nepomuk


Three types of meta data can be identified:

  1. Meta data that is stored with the data itself and is available at all times. This includes id3 tags, the number of pages in a PDF document, or even the size of a file or the subject of an email.
  2. Meta data that is created by the user manually like annotations or tags that are assigned to files, emails, or whatever resources.
  3. Meta data that can be gathered automatically by applications such as the source of a downloaded file or the email an attachment was saved from or the original when copying a file locally.

Type 1 is already handled in many implementations. KDE itself uses Strigi which allows the extraction of this kind of meta information from files.

Nepomuk is intended for meta data of type 2 and 3. It provides an easy way to create and read meta data for arbitrary resources (this includes for example files or emails, but also contacts or maybe even a paragraph in a pdf file).

The simplest type of meta data that can be handled with Nepomuk is a comment. It is a simple string associated with a resource (a file for example). This comment is created by the user using an application that is based on Nepomuk. Nepomuk's core is designed to allow arbitrary types of meta data, i.e. any resource can be related with any other resource or value by simply naming the relation and providing the value. The power of Nepomuk from a developer's point of view, however, lies in that it can provide a C++ class for each type of resource. Such a C++ class then provides convenience methods to allow a simple handling of the metadata.

Nepomuk is resource based. Thus, working with Nepomuk is always done with instances representing a certain resource. This resource has a list of properties. Properties are named and have a certain type. The type can either be another resource (compare a file that was an attachment from an email) or a literal (this means for example a string, or an integer; the comment mentioned earlier would be a string literal). Each property can either have a cardinality of 1 (again a file can only be saved from one email) or greater than 1 (i.e. infinite, like one file can have many associated comments).

Getting started

Nepomuk is already part of kdelibs. To use it make sure you have installed Soprano from the kdesupport module.

To use Nepomuk in your application simply make cmake search for it using the Nepomuk macro that comes with kdelibs:

find_package(Nepomuk REQUIRED) include_directories(${NEPOMUK_INCLUDES}) target_link_libraries( [...] ${NEPOMUK_LIBRARIES})

Developing with Nepomuk

Now let's dive into the good stuff and do some coding. There are basicly two ways to use Nepomuk: 1. by setting properties manually and 2. by generating convenience classes that provide proper method calls for each property.

The manual way of doing things

In Nepomuk metadata is represented as key/value pairs belonging to a certain resource which is represented by the Nepomuk::Resource class.

Setting and getting metadata is done via two methods:

void Nepomuk::Resouce::setProperty( const QUrl& key, const Nepomuk::Variant& value ); Variant Nepomuk::Resouce::getProperty( const QUrl& key );

Why don't we use QVariant instead of Nepomuk::Variant? Actually Nepomuk::Variant has been derived from QVariant and extended with some nice list-handling features that will come in handy later on.

Since in Nepomuk all metadata is stored as RDF all keys are actually URIs which are defined in specific ontologies (see Introduction to RDF and Ontologies for details) and look something like this:

Soprano::Vocabulary provides static instances of the most often used URIs as we see below (Soprano also comes with a code generation tool which can create these kind of namespaces from ontology files).

Nepomuk::Resource tag( "TestTag", Soprano::Vocabulary::NAO::Tag() ); tag.setProperty( Soprano::Vocabulary::RDFS::label(), QString("Important stuff") );

Nepomuk::Resource aFile( "/tmp/testfile.txt", Soprano::Vocabuary::Xesam::File() ); aFile.setProperty( Soprano::Vocabulary::NAO::hasTag(), tag );

Now what does this do then?

We first create a resource with an identifier "TestTag" and type "" and then set a property (rdfs:label) to a string value "Important Stuff". Then we create a second resource and assign it the newly created tag.

Do it the smooth way

The second way of using Nepomuk is to rely on its class generator which, as mentioned in the introduction, generates C++ classes from ontologies. Suddenly the above code looks much simpler:

Nepomuk::Tag tag( "TestTag" ); tag.addLabel( "Important Stuff" );

Nepomuk::File aFile( "/tmp/testfile.txt" ); aFile.addTag( tag );

Isn't this much more readable and much easier to remember and debug? The nice thing is that both Tag and File are subclasses of Nepomuk::Resource and internally methods like addLabel are only based on setProperty and some magic use of Nepomuk::Variant.

Let us continue with the tagging example. The Tag class provides some more nice methods:

  • QList<Resource> Tag::TagOf() provides a list of all resources that have been tagged with this tag.
  • static QList<Tag> Tag::allTags() provide all existing tags

Using these methods it is already possible to provide basic tagging in your application.


Only one instance of each resource

In Nepomuk each resource has only one instance at all times. Thus, the following code will actually result in only one new tag with two labels since the second one is simply a local copy:

Nepomuk::Tag tagA( "test" ); Nepomuk::Tag tagB( "test" );

tagA.addLabel( "testtagA" ); tagB.addLabel( "testtagB" );

The actual data is cached and synced back regularly by the framework.

Two ways of accessing existing data

The constructor or Nepomuk::Resource (and with it the constructors of all subclasses) allows to use an identifier or an actual resource URI for creation. The framework will then look if the passed string already exists as an URI or an identifier and load the data appropriately. If neither identifier or URi exist the passed string is used as a new identifier and a new random resource URI is generated. To understand what this means it is important to know that each resource in Nepomuk is uniquely identified by its URI (which is also used for storage in RDF) but can also have an arbitrary number of alternative identifiers which are more convinient for certain applications to find a resource. A typical example is the path of a file or the name of a tag.

Ontologies in Nepomuk

Nepomuk is a semantic desktop project which is influenced by semantic web technologies such as RDF.

A direct result from this is that in Nepomuk all metadata is stored as RDF triples. Normally you as an application developer should not care about that too much. Unless you want to create your own resource classes, i.e. types and properties or fields.

Each resource class and property is defined in an ontology. Let us look at an example we know from before: tagging. The following code is an excerpt from the Nepomuk Annotation Ontology which defines basic annotation properties such as tagging:

<rdf:Description rdf:about="&nao;Tag">

       <rdf:type rdf:resource="&rdfs;Class"/>

</rdf:Description> <rdf:Property rdf:about="&nao;annotation">

       <rdfs:domain rdf:resource="&rdfs;Resource"/>
       <rdfs:range rdf:resource="&rdfs;Resource"/>

</rdf:Property> <rdf:Property rdf:about="&nao;hasTag" rdfs:label="hasTag">

       <rdfs:domain rdf:resource="&rdfs;Resource"/>
       <rdfs:range rdf:resource="&nao;Tag"/>
       <rdfs:subPropertyOf rdf:resource="&nao;annotation"/>


This defines the Tag class and the hasTag property which is a sub-property of annotation. The property has a domain and a range which defines the classes that can be used with this property.

This is the information used by the Nepomuk class generator to create the convinient C++ classes we saw before.

At the moment the basic metadata ontologies are created in cooperation with the Strigi developers and some other open-source projects. These ontologies will include things like id3 tags and exif data and much more.