User:V for vandal/Tutorials/WebExtractorDataPP

Introduction

For this tutorial you need development script newplugin.py and it's data files. If you use repository clone - and it is the prefered way - go to dev-scripts/newplugin subfolder.

Creating a template

Create a template for the plugin. If you have installed and correctly set up git, then execute ( in dev-scripts/newplugin folder) ./newplugin.py -a -n NAME -v VERSION -s NAME is the NAME of the plugin and the DataPP. It will be used in classes names. VERSION is the version of the plugin. It must be float value. If not given, then 0.1 will be used.

If you haven't installed and set up git, or you want to use other <author,mail> combination, then you should execute: ./newplugin.py -a -n NAME -v VERSION -s -m EMAIL -t AUTHOR

Don't miss '-s' flag. It will make you life much easier.

After script finished, goto __output folder ( It will be created automatically ). All files there are template of your plugin. Copy entire folder anywhere. Go to this folder. If you have libwebextractor properly installed, then running cmakekde in the folder must not produce any errors.

Changing the template

There are 3 main classes - Plugin, DataPP and DataPPReply. Plugin: The purpose of this class is to generate DataPP. The only important method is getDataPP(KSharedConfigPtr configFile). It recives the config file and must return DataPP instance or 0 if some error occur ( for example config file is incorrect).

DataPP: Usually it is the main worker. But not in our case. The most important method is requestDecisions(...). This method must return DataPPReply instance. When DataPP instance emits signal finished() then all work has been done. More about it later. The default implementation of the requestDecisions is pretty good.

DataPPReply: This is the main worker in our case. If you don't forget to use '-s' key, then your NAMEDataPPReply will inherit SimpleDataPPReply instead of DataPPReply.

Working with subclass of SimpleDataPPReply

All information below is valid only for sublcasses of SimpleDataPPReply.

The main work cycle is:

generate and add Decisions
if error has occured, call setError() with necessary error
call finish()

The process of generating Decisions is easy Example: void NAMEDataPPReply::someFunction() { Nepomuk::WebExtractor::Decision d = newDecision() // Make some changes to d. // It will be described in next subsection(s) // ..... // Add decision. Without this call // Decision will be silently ignored addDecision(d) } You can generate as much Decisios as you need. See next sections for more information about Decisions.

After you have finished, you must call finish() method. Do not emit finished signal directly. If you want to report an error, then first call setError() method and then call finish().

You may call finish() method even from constructor of the NAMEDataPPReply, if you have done all your work there.

It is obvious that you should not change anything after finish() was called. Most operations with Decisions will be silently ignored after finish() call, but, in case there is some undetected bug in the system, please, do not try.

Errors. Errors are serious things. If you finish with error, then all your Decisions will be ignored and destroyed. You should report about errors like: server is not available, resource doesn't match requirements and so on. If your connection to server was closed unexpectedly, but you have generated some Decisions, it is better to finish() without error.

Making the Decisions

Decisions represents the changes you want to introduce. Before further details will be explained, remember please: Decisions are very not-thread-safe. Even using different Decisions, but generated in one DataPPReply from different threads can be dangerous. Please use only one thread. Of, if it is necessary to use more than one thread, then make sure that only one of your threads is working with Decisions.

What is Decision ? Decision is a set of changes to be made to target resource and some auxilary information. The itrepretation of the Decision is the folowing: Decision represents new state of the object and must satisfy the following requirement:

New state of the resource must be a valid real-world object. You must not assign to the music file album title "Kill 'Em All" and author Jóhann Sebástian Bach.

Firstly, you need to create a Decision object. If you have ignored my insistence to pass '-s' flag, then you look at the DecisionFactory documentation. If you have followed my advice, then SimpleDataPPReply ( that is the base class of your NAMEDataPPReply ) will provide you newDecision() method. In the future, I will assume that you have inherited from the class SimpleDataPPReply. And let denote NW as abbreviation for Nepomuk::Webextractor. So, you first step is: NW::Decision dec = newDecision();

Now, some more information about Decisions. Decisions own a special storage model. What is this model and where it is stored is not you buisness. Decision will give you a ResourceManager object with appropriate model. <note>Please, do not try to obtain a Decision storage model, because there are some special FilterModel atop of it.</note> Some information about Decision's storage model:

Usually, it will be redland model. So some features are not supported
It can be a main nepomuk storage model. Be aware of it in your queries
A lot of Decisions share the same model. Be aware of it in you queries too.
All available ontologies in the system are loaded there. But currently it is not possible to load ontology from the internet to the model.
You should not use any information from other Decisions in this model. Do not try to detect whether such Decision was created by another plugin - this case is properly handled by system.

Next step is to select target(main) resources for decision. Target resources are resources that will be synced back.

Only changes to target resources are logged. ( this actually mean that adding\removing statments where at least object or subject is target resource will be logged
Changes to non-target resources - resources that you have created localy or resources that were created during deep copy procedure(see more about this later ) won't be logged. But this doesn't mean that this chages are unnecessary!

After you has determined the target resources for you decision, you should call proxyUrl( Nepomuk::Resource ) or proxyResource(Nepomuk::Resource) with selected resource.( If there are more than one selected resource, then several calls are necessary)

The behaviour of these functions is the same, the only differences is in returned value. proxyUrl will return you a QUrl, and proxyResource - Nepomuk::Resource.

These functions will copy resources from main nepomuk storage to your Decision. The copy will be a deep copy. All subresources of the resource will be copied too. But be aware: Every resource is copied only once! Consider the folowin situation: R1 and R2 are target resources. R1 has R3 as subresource and R2 has R3 as subresource. If you call proxyUrl(R1) and then proxyUrl(R2), there will be only one! copy of the R3 as result! If you call proxyUrl(R1),then edit the copy of the R3 resource, then call proxyUrl(R2), the copy of the neither R3 will be restored to it's original state nor new copy of the R3 will be created .

So, the prefered way is to call proxyUrl() as early as possible and before making any changes.

Some guidelines as summarization of all written above:

* Call proxyUrl as early as possible
* If you want to edit some subresource or the target resoruce, you should create new resource and assigne it instead old one. It is not very important if you have only one target resource, but it is realy important if you have 2 or more.

Finally, if Decision will be accepted, the changes you have made will be synced back. The sync word is important! The changes will not be simply applied to the original resources, but will be wisely added.

How to make changes in Decision

Each Decision consits of one or more PropertiesGroup. PropertiesGroup respresents an indivisible bunch of changes. It is up to you how to split Decision into PropertiesGroups. Each Decision has a model and a ResourceManager. Each PropertiesGroup has it's own model and ResourceManager.

The changes that are done through the model ( addStatement, removeStatement ) and through the Nepomuk::Resource created with these ResourceManagers will be checked and logged if necessary. So, to make a Decision you must change the data with help of theese models and/or ResourceManagers. You can freely mix usage of model and manager.

There are 2 ways to work with PropertiesGroup. In both ways first you need to create PropertiesGroup. Decision d1 = newDecision(); PropertiesGroup grp1 = d1.newGroup();

As it was mentioned above, there are 2 ways to work with Decisions. First way is like working with files - you open PropertiesGroup, make necessary changes and close PropertiesGroup. // 'Open' group grp1.makeCurrent() ; // After this call all changes that are made through this PropertiesGroup // model and manager AND through Decision's model and manager are added to // the grp1 list of changes. grp1.model().addStatement(...)// Nepomuk::Resource R1(targetResourceProxyUrl, QUrl(), grp1.manager() ); Nepomuk::Resource R2(targetResourceProxyUrl, QUrl(), d1.manager() ); // All changes that you will do with ANY of R1 or R2 will be added to the // changelog of the grp1. // targetResourceProxyUrl is the proxy url of the some target resource. // It was returned to you with proxyUrl() call.

// Stop. d1.resetCurrentGroup(); // Actually this call stop logging all changes that are done via // Decision's model and manager. // All changes that will be done via PropertiesGroup's model // and manager still will be added to the group changelog.

Remarks:

* It has already been mentioned above. Only changes that touches the proxy of the target resources will be logged. So even if you create new Resource with Decision's manager with current group set or PropertiesGroup's manager, this doesn't mean that new resource  will be definetly created in main nepomuk storage   after applying Decision. It will be created only if it is necessary.
* The resources that are returned by proxyResource() are created with Decision's manager. So you can avoid creating of new Resources with PropertiesGroup's manager.

Second way. This way is more usefull for network-aware DataPP. It was partitialy described in previous subsection. Just do not set current group ( to avoid errors ) and use PropertiesGroup's manager and model to add changes to this PropertiesGroup. PropertiesGroup is shared object. Changes that you have made to the copy of the PropertiesGroup will be made to some shared between all copies object. So you can freely pass PropertiesGroup as parameter to function and so on.