User:V for vandal/Tutorials/WebExtractorDataPP

Introduction

For this tutorial you need development script newplugin.py and it's data files. If you use repository clone - and it is the prefered way - go to dev-scripts/newplugin subfolder.

Creating a template

Create a template for the plugin. If you have installed and correctly set up git, then execute ( in dev-scripts/newplugin folder) ./newplugin.py -a -n NAME -v VERSION -s NAME is the NAME of the plugin and the DataPP. It will be used in classes names. VERSION is the version of the plugin. It must be float value. If not given, then 0.1 will be used.

If you haven't installed and set up git, or you want to use other <author,mail> combination, then you should execute: ./newplugin.py -a -n NAME -v VERSION -s -m EMAIL -t AUTHOR

Don't miss '-s' flag. It will make you life much easier.

After script finished, goto __output folder ( It will be created automatically ). All files there are template of your plugin. Copy entire folder anywhere. Go to this folder. If you have libwebextractor properly installed, then running cmakekde in the folder must not produce any errors.

Changing the template

There are 3 main classes - Plugin, DataPP and DataPPReply. Plugin: The purpose of this class is to generate DataPP. The only important method is getDataPP(KSharedConfigPtr configFile). It recives the config file and must return DataPP instance or 0 if some error occur ( for example config file is incorrect).

DataPP: Usually it is the main worker. But not in our case. The most important method is requestDecisions(...). This method must return DataPPReply instance. When DataPP instance emits signal finished() then all work has been done. More about it later. The default implementation of the requestDecisions is pretty good.

DataPPReply: This is the main worker in our case. If you don't forget to use '-s' key, then your NAMEDataPPReply will inherit SimpleDataPPReply instead of DataPPReply.

Working with subclass of SimpleDataPPReply

All information below is valid only for sublcasses of SimpleDataPPReply.

The main work cycle is:

generate and add Decisions
if error has occured, call setError() with necessary error
call finish()

The process of generating Decisions is easy Example: void NAMEDataPPReply::someFunction() { Nepomuk::WebExtractor::Decision d = newDecision() // Make some changes to d. // It will be described in next subsection(s) // ..... // Add decision. Without this call // Decision will be silently ignored addDecision(d) } You can generate as much Decisios as you need. See next sections for more information about Decisions.

After you have finished, you must call finish() method. Do not emit finished signal directly. If you want to report an error, then first call setError() method and then call finish().

You may call finish() method even from constructor of the NAMEDataPPReply, if you have done all your work there.

It is obvious that you should not change anything after finish() was called. Most operations with Decisions will be silently ignored after finish() call, but, in case there is some undetected bug in the system, please, do not try.

Errors. Errors are serious things. If you finish with error, then all your Decisions will be ignored and destroyed. You should report about errors like: server is not available, resource doesn't match requirements and so on. If your connection to server was closed unexpectedly, but you have generated some Decisions, it is better to finish() without error.

Making the Decisions

Decisions represents the changes you want to introduce. Before further details will be explained, remember please: Decisions are very not-thread-safe. Even using different Decisions, but generated in one DataPPReply from different threads can be dangerous. Please use only one thread. Of, if it is necessary to use more than one thread, then make sure that only one of your threads is working with Decisions.

What is Decision ? Decision is a set of changes to be made to target resource and some auxilary information. The itrepretation of the Decision is the folowing: Decision represents new state of the object and must satisfy the following requirement:

New state of the resource must be a valid real-world object. You must not assign to the music file album title "Kill 'Em All" and author Jóhann Sebástian Bach.

Firstly, you need to create a Decision object. If you have ignored my insistence to pass '-s' flag, then you look at the DecisionFactory documentation. If you have followed my advice, then SimpleDataPPReply ( that is the base class of your NAMEDataPPReply ) will provide you newDecision() method. In the future, I will assume that you have inherited from the class SimpleDataPPReply. And let denote NW as abbreviation for Nepomuk::Webextractor. So, you first step is: NW::Decision dec = newDecision();

Now, some more information about Decisions. Decisions own a special storage model. What is this model and where it is stored is not you buisness. Decision will give you a ResourceManager object with appropriate model. <note>Please, do not try to obtain a Decision storage model, because there are some special FilterModel atop of it.</note> Some information about Decision's storage model:

Usually, it will be redland model. So some features are not supported
It can be a main nepomuk storage model. Be aware of it in your queries
A lot of Decisions share the same model. Be aware of it in you queries too.
Currently, ontologies are not copied to this model. So query like "select all triples where property is subPropertyOf XXX will not work. But it will be fixed in further releases.
You should not use any information from other Decisions in this model. Do not try to detect whether such Decision was created by another plugin - this case is properly handled by system.

Next step is to select target resources for decision. Target resources are resources that will be synced back.

Only changes to target resources are logged. ( this actually mean that adding\removing statments where at least object or subject is target resource will be logged
Changes to non-target resources - resources that you have created localy or resources that were created during deep copy procedure(see more about this later ) won't be logged. But this doesn't mean that this chages are unnecessary!

copy objects from main nepomuk storage to your Decision. When you copy an object to your Decision, a deep copy is performed. All changes you want to do with original resources should be done to the copy of this resources instead. Then, if Decision will be accepted, they will be synced back. The sync word is important! The changes will not be simply applied to the original resources, but will be wisely added. But there are still some limitations:

Remember, it is a deep copy. If you copy resource R1 and R1 has tag T1, then T1 will be copied too. And, when applying Decision, all changes you have made to the T1 will be synced back too. This can lead to some errors. If you have changed the T1 label, then all resources in the nepomuk main storage that has T1 as tag will now have a tag with another label! Instead, please just remove the T1 copy from the R1 copy, create new tag in the Decisions model and assign it to the R1 copy.

Each Decision consits of one or more PropertiesGroup. PropertiesGroup respresents an indivisible bunch of changes. It is up to you how to split Decision into PropertiesGroups. There are 2 ways to work with PropertiesGroup. The simplest one is: Decision d1; PropertiesGroup grp1 = d1.newGroup();