Projects/Nepomuk/ComponentOverview

From KDE TechBase

Technical Terms

The Nepomuk world uses some jargon which may be slightly intimidating to new comers. This page attempts to list down the commonly used Nepomuk development terms, and what they mean.

Nepomuk

Nepomuk is actually an abbreviation for a very long and obtuse name. That however, does not really matter.

Nepomuk is the underlying semantic technology that is used by KDE. It provides an API for software developers, and provides all the glue to index all of the file metadata. Whenever anyone is talking about using "semantic" technologies in KDE, they generally mean Nepomuk.

It is the top most component of the entire Semantic Stack.

Soprano

Soprano is a Qt abstraction over databases. It provides a friendly Qt-based API for accessing different RDF stores. It currently supports 3 database backends - Sesame, Redland and Virtuoso. The KDE Semantic Stack only works with Virtuoso. Soprano also provides additional features such as serializing, parsing rdf data, and a client server architecture that is heavily used in Nepomuk.

Virtuoso

Virtuoso is the only supported RDF database in KDE. It's a very powerful database that powers massive projects such as dbpedia. It is currently controlled by OpenLink, and is available under commercial and an open source license.

Internally, virtuoso may be looked at as a relational database, with some added RDF features.

Strigi

The Strigi project is divided into 5 sub-projects, and can be used for a full file indexing framework. However, KDE only uses some parts of it. Nepomuk uses the libstreams and libstream analyzer to pass it the filecontents. In returns it gives metadata about the file, which Nepomuk reads and then pushes into virtuoso.

Nepomuk Components

Nepomuk has been split into a number of component for stability reasons. Many of these components communicate with each other, using a combination of dbus and local sockets.

Nepomuk Server

The Nepomuk server is the central process that is responsible for spawning and controlling all other nepomuk processes. In reality it is not a server since none the Nepomuk components actual connect to it, or try to communicate with it.

On starting up, it checks if Nepomuk is enabled, and accordingly, either kills itself, or starts spawning the other nepomuk processes. All other nepomuk processes go by the name of 'nepomukservicestub'.

Nepomuk Service Stub

The nepomukservicestub is a generic process that is used to run any of the Nepomuk services. It reads the nepomuk service name as an argument, and loads the plugin for that service.

$ nepomukservicestub "nepomukfileindexer"
$ nepomukservicestub "nepomukwatch"

Nepomuk Storage Service

The Storage Service is the central nepomuk service, on which all other services depend. This service is responsible for launching virtuoso, monitoring the ontologies. Recently the Nepomuk Query Service was merged into the storage service for performance reasons. So, now the storage service is also responsible for running queries.

The Storage service also acts as a hard dependency for all other services. These services communicate with the storage service either via dbus or via a local socket.

Relevant Code: nepomuk-core/services/storage

Nepomuk File Watch Service

The File Watch service is responsible for hooking up to the kernel and listening for file move, deletion, and creation events. On receiving any of these events, it updates the metadata present in virtuoso. It is also responsible for calling the file indexer service to index new files.

Nepomuk File Indexing Service

The File Indexing service is responsible for indexing all the files. It relies on a helper process called 'nepomukindexer' which actually performs the indexing and pushing to Nepomuk. This service merely schedules and the indexing, and decides which files should be indexed.

Ontologies

Relational databases have database schemas which are fixed. RDF databases instead use ontologies which are less static. They define how the data should be stored in the database.