Projects/Nepomuk/IndexingPlugin: Difference between revisions

From KDE TechBase
(Created page with "File Indexing has gone through a major overhaul in 4.10. We no longer rely on strigi. This means that we need to write our own file indexer from scratch. However writing a fil...")
 
Line 48: Line 48:


= Required Files =
= Required Files =
Since the plugin interface still isn't public. It would be best to directly contribute to nepomuk-core. The relevant code can be found at nepomuk-core/services/fileindexer/indexer/.


= Testing the Indexer =
= Testing the Indexer =

Revision as of 16:44, 5 November 2012

File Indexing has gone through a major overhaul in 4.10. We no longer rely on strigi. This means that we need to write our own file indexer from scratch. However writing a file indexer is very simple.

Currently, there is no public interface for the indexing plugins. There might be one for 4.10, but we aren't sure right now.

Extractor Plugin

In order to write a file indexer, we have to write a plugin derived from Nepomuk2::ExtractorPlugin. We are required to implement two simple functions -

    class NEPOMUK_EXPORT ExtractorPlugin : public QObject
    {
        Q_OBJECT
    public:
        ExtractorPlugin(QObject* parent);
        virtual ~ExtractorPlugin();

        virtual QStringList mimetypes() = 0;
        virtual SimpleResourceGraph extract(const QUrl& resUri, const QUrl& fileUrl, const QString& mimeType) = 0;
    };

These two functions are mimetypes and extract. Each plugin can act on a certain set of mimetypes. Each plugin simply needs to list out all the mimetypes they support.

The second function extract is the heart of the extractor. You are provided with the mimetype and the url of the file. The file can be read and information can be extracted from it.

Saving the Extracted Data

The Nepomuk Extractors are based around two simple classes SimpleResource and SimpleResourceGraph. The SimpleResourceGraph is just a collection of SimpleResources. A SimpleResource is just a collection of (key, value) pairs which contain the properties of that particular resource.

The main file resource has a resource uri which is passed as a parameter. It can be used as follows -

    SimpleResource fileRes( resUri );
    fileRes.addType( NFO::PlainTextDocument() );
    fileRes.addProperty( NIE::plainTextContent(), contents );
    fileRes.addProperty( NFO::wordCount(), words );
    fileRes.addProperty( NFO::lineCount(), lines );
    fileRes.addProperty( NFO::characterCount(), characters );


This fileRes can then be added to a SimpleResourceGraph and returned. It will then be saved in Nepomuk.

Another simple example of a music file -

....

Required Files

Since the plugin interface still isn't public. It would be best to directly contribute to nepomuk-core. The relevant code can be found at nepomuk-core/services/fileindexer/indexer/.

Testing the Indexer