Development/Tutorials/Metadata/Nepomuk/ChatLogger: Difference between revisions
(Brain dump of a tutorial for creating a ChatLogger. Normally I would have just explained most of this to Dario, but it's better to have it documented somewhere.) |
No edit summary |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{TutorialBrowser| | {{TutorialBrowser| | ||
Line 14: | Line 13: | ||
In Nepomuk, ontologies are very important. For a semi-decent analogy - An ontology is like a class specification. You get to define exactly which properties, sub-classes and super-classes a class has. Every object of that class, in Nepomuk terms, is called a Resource. | In Nepomuk, ontologies are very important. For a semi-decent analogy - An ontology is like a class specification. You get to define exactly which properties, sub-classes and super-classes a class has. Every object of that class, in Nepomuk terms, is called a Resource. | ||
In the linked data world, properties are also a special kind of class. Unlike conventional programming languages where a property is limited to that class and its descendants. In linked data, each property has a | In the linked data world, [http://www.w3.org/TR/rdf-schema/#ch_property properties] are also a special kind of [http://www.w3.org/TR/rdf-schema/#ch_class class]. Unlike conventional programming languages where a property is limited to that class and its descendants. In linked data, each property has a [http://www.w3.org/TR/rdf-schema/#ch_domain domain] and [http://www.w3.org/TR/rdf-schema/#ch_range range], indicating which classes it can be mapped to. Properties can also be derived from to have sub-properties. | ||
== Use Case == | == Use Case == | ||
The first thing one generally does when storing some information in a database is to decide what data should be stored. In the case of a chat logger, what we need is - | |||
* From | |||
* To | |||
* Time stamp | |||
* Message Content | |||
* Status of the message | |||
Additionally, we would require the database to be persistent and query-able. Nepomuk provides that. | |||
== Finding the correct Ontology == | == Finding the correct Ontology == | ||
Line 32: | Line 40: | ||
Here nmo:IMMessage is a sub class of nmo:Message. On further inspection of the properties of nmo:Message, we notice it has properties like nmo:messageTo, nmo:messageFrom, nmo:isRead, nmo:receivedDate, and nmo:plainTextMessageContent | Here nmo:IMMessage is a sub class of nmo:Message. On further inspection of the properties of nmo:Message, we notice it has properties like nmo:messageTo, nmo:messageFrom, nmo:isRead, nmo:receivedDate, and nmo:plainTextMessageContent | ||
These properties depict exactly what we need in the case of the chat logger. So we know exactly how we're going to store the data. | |||
== Implementation == | == Implementation == | ||
=== Getting Started === | === Getting Started === | ||
The simplest way to get started would be to use an existing Nepomuk template. There exists a test template over [https://projects.kde.org/projects/kde/kdeexamples/repository/revisions/master/show/nepomuk/test here]. | |||
That template contains more headers than are actually required. You might want to remove some of the headers before using it in the real world. | |||
=== Creating a Resource === | === Creating a Resource === | ||
In Nepomuk | In Nepomuk, a <i>Resource</i> is one of the fundamental building blocks. A resource is basically a unique URI ( Uniform Resource Identifier ) which identifies the resource, and it contains many properties. | ||
All of this is stored in the Nepomuk Repository in statements of the form - <syntaxhighlight lang="text">Subject Predicate Object</syntaxhighlight> | |||
==== Nepomuk::Resource ==== | ==== Nepomuk::Resource ==== | ||
The most common way to create or manipulate a resource is using the Nepomuk::Resource class | The most common way to create or manipulate a resource is using the Nepomuk::Resource class - | ||
<syntaxhighlight lang="text"> | |||
Nepomuk::Resource res; | |||
res.addType( Nepomuk::Vocabulary::NMO::IMMessage() ); | |||
</syntaxhighlight> | |||
This would initially create a empty resource, and then add the type 'IMMessage' to it. In programming languages, you need to mention the type before creation of an object. Here a resource needs to have its type set as a property. | |||
Internally it would use the <i>rdf:type</i> property to store the type. One could also add the type nmo:Message, but since <i>nmo:IMMessage</i> is a subclass of <i>nmo:Message</i>, it's automatically implied. | |||
Nepomuk automatically assigns a unique uri to each Nepomuk Resource. It is in form of '''nepomuk:/res/32-bit-uuid''' | |||
An example - <br> | |||
date1 '''Bob:''' Hi!<br> | |||
date2 '''Mary:''' Good Evening Bob.<br> | |||
Here we have 2 messages. If we had to store this information in Nepomuk, we would need 2 distinct resources. One for each message. | |||
<syntaxhighlight lang="text"> | |||
Nepomuk::Resource mes1; | |||
mes1.addType( Nepomuk::Vocabulary::NMO::IMMessage() ); | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Hi!" ); | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false ); | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date1 ) ); | |||
< add contact | Nepomuk::Resource mes2; | ||
mes2.addType( Nepomuk::Vocabulary::NMO::IMMessage() ); | |||
mes2.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Good Evening Bob." ); | |||
mes2.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false ); | |||
mes2.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date2 ) ); | |||
</syntaxhighlight> | |||
If we look at the ontology specification - The nmo:toMessage and nmo:fromMessage property has a nco:ContactMedium in its range | |||
<syntaxhighlight lang="text"> | |||
nmo:messageFrom | |||
a rdf:Property ; | |||
rdfs:comment "The sender of the message" ; | |||
rdfs:domain nmo:Message ; | |||
rdfs:label "from" ; | |||
rdfs:range nco:ContactMedium . | |||
</syntaxhighlight> | |||
So, this means that we need some kind of nco:ContactMedium for Bob and Mary. At this point you should probably try to read through the Nepomuk Contact Ontology. | |||
For a quick summary - nco:Medium is a generic base class for | |||
nco:ContactMedium | |||
____/ | \__________ | |||
/ | \ | |||
nco:IMAccount nco:PhoneNumber nco:EmailAddress | |||
Since we are dealing with IM chat logging, we would need an nco:IMAccount for both Bob and Mary. The code to make one is relatively simple | |||
<syntaxhighlight lang="text"> | |||
Nepomuk::Resource bobAccount; | |||
bobAccount.addType( Nepomuk::Vocabulary::NCO::IMAccount() ); | |||
// FIX THIS | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Hi!" ); | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false ); | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date1 ) ); | |||
</syntaxhighlight> | |||
Now, to the confusing part. The Nepomuk Contact Ontology specifies a class called nco:Role. The documentation says "A role played by a contact. Contacts that denote people, can have many roles (e.g. see the hasAffiliation property and Affiliation class). Contacts that denote Organizations or other Agents usually have one role. Each role can introduce additional contact media." | |||
This basically means that one person can have multiple roles, and for each role they can have different contact information. This makes sense, as it is analogous to what happens in the real world. | |||
nco:Role | |||
_____/ \____ | |||
/ \ | |||
nco:Contact nco:Affiliation | |||
Each nco:Role can have an IMAccount associated with it, along with a phone number, and loads of other info. | |||
One confusing aspect of The Nepomuk Contact Ontology is the presence of a property called nco:role. This is different that the class nco:Role. | |||
To confuse you a little bit more - There are 2 kinds of Contacts in NCO. Plain contacts and PersonContacts. The latter is used to depict real persons, the kind that exist in your address book. The former is used when saving information a person that is machine generated. For example - When Nepomuk indexes a music file, it creates a nco:Contact of that artist, and attaches it to File Resource using nmm:performer. That way we can easily search for all the files which have a nmm:performer as that contact. | |||
The code is to do all of this is actually far simpler - | |||
<syntaxhighlight lang="text"> | |||
Nepomuk::Resource bob; | |||
bob.addType( Nepomuk::Vocabulary::NCO::PersonContact ); | |||
bob.setProperty( Nepomuk::Vocabulary::NCO::fullname(), "Bob" ); | |||
Nepomuk::Resource mary; | |||
mary.addType( Nepomuk::Vocabulary::NCO::PersonContact ); | |||
mary.setProperty( Nepomuk::Vocabulary::NCO::fullname(), "Mary" ); | |||
</syntaxhighlight> | |||
Since we have finally made the contacts. We should add the respective IMAccounts to the contact | |||
<syntaxhighlight lang="text"> | |||
bob.addProperty( Nepomuk::Vocabulary::NCO::hasContactMedium(), bobAccount ); | |||
mary.addProperty( Nepomuk::Vocabulary::NCO::hasContactMedium(), maryAccount ); | |||
</syntaxhighlight> | |||
Now the last thing that we need to do is associate those messages with their respective contact mediums. | |||
<syntaxhighlight lang="text"> | |||
mes1.setProperty( Nepomuk::Vocabulary::NMO::messageFrom(), bobAccount ); | |||
mes2.setProperty( Nepomuk::Vocabulary::NMO::messageFrom(), maryAccount ); | |||
</syntaxhighlight> | |||
And that's it. We are done. | |||
But, the code is so messy! It's littered with loads of namespaces (Nepomuk::Vocabulary) and all we do it set the type and add properties. | |||
==== The Resource Generator ==== | ==== The Resource Generator ==== | ||
Most of the given code is horribly verbose and long. Plus, it's not really the C++ way of doing things. So it would be a lot easier for us to generate C++ classes for the ontologies that we use, so we don't have to use the cumbersome addProperty( uri, value ) method. | |||
This can be done by adding the following code to the CMakeLists.txt | |||
<syntaxhighlight lang="text"> | |||
nepomuk_add_ontology_classes( | |||
SRCS | |||
ONTOLOGIES | |||
${SHAREDDESKTOPONTOLOGIES_ROOT_DIR}/nie/nmo.trig | |||
) | |||
nepomuk_add_ontology_classes( | |||
SRCS | |||
ONTOLOGIES | |||
${SHAREDDESKTOPONTOLOGIES_ROOT_DIR}/nie/nco.trig | |||
) | |||
</syntaxhighlight> | |||
Both NMO and NCO are part of the NIE Ontologies. They are generally installed in the /usr/share/ontology/ folder by the shared-desktop-ontologies project. | |||
This code calls the CMake macro nepomuk_add_ontology_classes, which essentially creates C++ classes for each class in the specified ontology. | |||
On running this, there should be header and source files for each class in the ontology. Each of these classes is generated by sub-classing Nepomuk::Resource. | |||
The same code used above with the Resource Generator - | |||
<syntaxhighlight lang="text"> | |||
Nepomuk::IMAccount bobAccount; | |||
bobAccount.set | |||
Nepomuk::IMAccount maryAccount; | |||
maryAccount.set | |||
Nepomuk::PersonContact bob; | |||
bob.setFullname(); | |||
Nepomuk::PersonContact mary; | |||
</syntaxhighlight> | |||
=== Checking if the data exists === | === Checking if the data exists === | ||
Line 82: | Line 219: | ||
After compiling and running our short program - We would want to check if the data actually exists in the Nepomuk Repository. The simplest way to do that would be using a sparql query | After compiling and running our short program - We would want to check if the data actually exists in the Nepomuk Repository. The simplest way to do that would be using a sparql query | ||
< | <syntaxhighlight lang="text">select ?r ?p ?o where { ?r a nmo:IMMessage . ?r ?p ?o. } </syntaxhighlight> | ||
The query can be executed using either 'nepomukcmd' or by executing it in the nepomukshell. | The query can be executed using either 'nepomukcmd' or by executing it in the nepomukshell. |
Latest revision as of 16:44, 19 July 2012
Tutorial Series | Nepomuk |
Previous | |
What's Next | |
Further Reading |
Rough Basics
In Nepomuk, ontologies are very important. For a semi-decent analogy - An ontology is like a class specification. You get to define exactly which properties, sub-classes and super-classes a class has. Every object of that class, in Nepomuk terms, is called a Resource.
In the linked data world, properties are also a special kind of class. Unlike conventional programming languages where a property is limited to that class and its descendants. In linked data, each property has a domain and range, indicating which classes it can be mapped to. Properties can also be derived from to have sub-properties.
Use Case
The first thing one generally does when storing some information in a database is to decide what data should be stored. In the case of a chat logger, what we need is -
- From
- To
- Time stamp
- Message Content
- Status of the message
Additionally, we would require the database to be persistent and query-able. Nepomuk provides that.
Finding the correct Ontology
The first thing one generally needs to know when working on some Nepomuk related project, is how is the data going to be stored? To be specific - which ontology would be required?
Ontologies already exist for common use cases like Email, Messaging, Notes, etc. In the case of a chat logging system we would want to take a look at NMO - The Nepomuk Messaging Ontology. ( Add a link )
The easiest way to know the exact contents of the ontology is to read its trig file - Link. On inspection we realize that there exists several classes -
nmo:Message / \ / \ nmo:Email nmo:IMMessage
Here nmo:IMMessage is a sub class of nmo:Message. On further inspection of the properties of nmo:Message, we notice it has properties like nmo:messageTo, nmo:messageFrom, nmo:isRead, nmo:receivedDate, and nmo:plainTextMessageContent
These properties depict exactly what we need in the case of the chat logger. So we know exactly how we're going to store the data.
Implementation
Getting Started
The simplest way to get started would be to use an existing Nepomuk template. There exists a test template over here.
That template contains more headers than are actually required. You might want to remove some of the headers before using it in the real world.
Creating a Resource
In Nepomuk, a Resource is one of the fundamental building blocks. A resource is basically a unique URI ( Uniform Resource Identifier ) which identifies the resource, and it contains many properties.
All of this is stored in the Nepomuk Repository in statements of the form -
Subject Predicate Object
Nepomuk::Resource
The most common way to create or manipulate a resource is using the Nepomuk::Resource class -
Nepomuk::Resource res;
res.addType( Nepomuk::Vocabulary::NMO::IMMessage() );
This would initially create a empty resource, and then add the type 'IMMessage' to it. In programming languages, you need to mention the type before creation of an object. Here a resource needs to have its type set as a property.
Internally it would use the rdf:type property to store the type. One could also add the type nmo:Message, but since nmo:IMMessage is a subclass of nmo:Message, it's automatically implied.
Nepomuk automatically assigns a unique uri to each Nepomuk Resource. It is in form of nepomuk:/res/32-bit-uuid
An example -
date1 Bob: Hi!
date2 Mary: Good Evening Bob.
Here we have 2 messages. If we had to store this information in Nepomuk, we would need 2 distinct resources. One for each message.
Nepomuk::Resource mes1;
mes1.addType( Nepomuk::Vocabulary::NMO::IMMessage() );
mes1.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Hi!" );
mes1.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false );
mes1.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date1 ) );
Nepomuk::Resource mes2;
mes2.addType( Nepomuk::Vocabulary::NMO::IMMessage() );
mes2.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Good Evening Bob." );
mes2.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false );
mes2.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date2 ) );
If we look at the ontology specification - The nmo:toMessage and nmo:fromMessage property has a nco:ContactMedium in its range
nmo:messageFrom
a rdf:Property ;
rdfs:comment "The sender of the message" ;
rdfs:domain nmo:Message ;
rdfs:label "from" ;
rdfs:range nco:ContactMedium .
So, this means that we need some kind of nco:ContactMedium for Bob and Mary. At this point you should probably try to read through the Nepomuk Contact Ontology.
For a quick summary - nco:Medium is a generic base class for
nco:ContactMedium ____/ | \__________ / | \ nco:IMAccount nco:PhoneNumber nco:EmailAddress
Since we are dealing with IM chat logging, we would need an nco:IMAccount for both Bob and Mary. The code to make one is relatively simple
Nepomuk::Resource bobAccount;
bobAccount.addType( Nepomuk::Vocabulary::NCO::IMAccount() );
// FIX THIS
mes1.setProperty( Nepomuk::Vocabulary::NMO::plainTextMessageContent(), "Hi!" );
mes1.setProperty( Nepomuk::Vocabulary::NMO::isRead(), false );
mes1.setProperty( Nepomuk::Vocabulary::NMO::receivedDate(), QDateTime( date1 ) );
Now, to the confusing part. The Nepomuk Contact Ontology specifies a class called nco:Role. The documentation says "A role played by a contact. Contacts that denote people, can have many roles (e.g. see the hasAffiliation property and Affiliation class). Contacts that denote Organizations or other Agents usually have one role. Each role can introduce additional contact media."
This basically means that one person can have multiple roles, and for each role they can have different contact information. This makes sense, as it is analogous to what happens in the real world.
nco:Role _____/ \____ / \
nco:Contact nco:Affiliation
Each nco:Role can have an IMAccount associated with it, along with a phone number, and loads of other info.
One confusing aspect of The Nepomuk Contact Ontology is the presence of a property called nco:role. This is different that the class nco:Role.
To confuse you a little bit more - There are 2 kinds of Contacts in NCO. Plain contacts and PersonContacts. The latter is used to depict real persons, the kind that exist in your address book. The former is used when saving information a person that is machine generated. For example - When Nepomuk indexes a music file, it creates a nco:Contact of that artist, and attaches it to File Resource using nmm:performer. That way we can easily search for all the files which have a nmm:performer as that contact.
The code is to do all of this is actually far simpler -
Nepomuk::Resource bob;
bob.addType( Nepomuk::Vocabulary::NCO::PersonContact );
bob.setProperty( Nepomuk::Vocabulary::NCO::fullname(), "Bob" );
Nepomuk::Resource mary;
mary.addType( Nepomuk::Vocabulary::NCO::PersonContact );
mary.setProperty( Nepomuk::Vocabulary::NCO::fullname(), "Mary" );
Since we have finally made the contacts. We should add the respective IMAccounts to the contact
bob.addProperty( Nepomuk::Vocabulary::NCO::hasContactMedium(), bobAccount );
mary.addProperty( Nepomuk::Vocabulary::NCO::hasContactMedium(), maryAccount );
Now the last thing that we need to do is associate those messages with their respective contact mediums.
mes1.setProperty( Nepomuk::Vocabulary::NMO::messageFrom(), bobAccount );
mes2.setProperty( Nepomuk::Vocabulary::NMO::messageFrom(), maryAccount );
And that's it. We are done.
But, the code is so messy! It's littered with loads of namespaces (Nepomuk::Vocabulary) and all we do it set the type and add properties.
The Resource Generator
Most of the given code is horribly verbose and long. Plus, it's not really the C++ way of doing things. So it would be a lot easier for us to generate C++ classes for the ontologies that we use, so we don't have to use the cumbersome addProperty( uri, value ) method.
This can be done by adding the following code to the CMakeLists.txt
nepomuk_add_ontology_classes(
SRCS
ONTOLOGIES
${SHAREDDESKTOPONTOLOGIES_ROOT_DIR}/nie/nmo.trig
)
nepomuk_add_ontology_classes(
SRCS
ONTOLOGIES
${SHAREDDESKTOPONTOLOGIES_ROOT_DIR}/nie/nco.trig
)
Both NMO and NCO are part of the NIE Ontologies. They are generally installed in the /usr/share/ontology/ folder by the shared-desktop-ontologies project.
This code calls the CMake macro nepomuk_add_ontology_classes, which essentially creates C++ classes for each class in the specified ontology.
On running this, there should be header and source files for each class in the ontology. Each of these classes is generated by sub-classing Nepomuk::Resource.
The same code used above with the Resource Generator -
Nepomuk::IMAccount bobAccount;
bobAccount.set
Nepomuk::IMAccount maryAccount;
maryAccount.set
Nepomuk::PersonContact bob;
bob.setFullname();
Nepomuk::PersonContact mary;
Checking if the data exists
After compiling and running our short program - We would want to check if the data actually exists in the Nepomuk Repository. The simplest way to do that would be using a sparql query
select ?r ?p ?o where { ?r a nmo:IMMessage . ?r ?p ?o. }
The query can be executed using either 'nepomukcmd' or by executing it in the nepomukshell.