User:V for vandal/Tutorials/WebExtractorDataPP

From KDE TechBase

Introduction

For this tutorial you need development script newplugin.py and it's data files. If you use repository clone - and it is the prefered way - go to dev-scripts/newplugin subfolder.

Creating a template

Create a template for the plugin. If you have installed and correctly set up git, then execute ( in dev-scripts/newplugin folder) ./newplugin.py -a -n NAME -v VERSION -s NAME is the NAME of the plugin and the DataPP. It will be used in classes names. VERSION is the version of the plugin. It must be float value. If not given, then 0.1 will be used.

If you haven't installed and set up git, or you want to use other <author,mail> combination, then you should execute: ./newplugin.py -a -n NAME -v VERSION -s -m EMAIL -t AUTHOR

Don't miss '-s' flag. It will make you life much easier.

After script finished, goto __output folder ( It will be created automatically ). All files there are template of your plugin. Copy entire folder anywhere. Go to this folder. If you have libwebextractor properly installed, then running cmakekde in the folder must not produce any errors.

Changing the template

There are 3 main classes - Plugin, DataPP and DataPPReply. Plugin: The purpose of this class is to generate DataPP. The only important method is getDataPP(KSharedConfigPtr configFile). It recives the config file and must return DataPP instance or 0 if some error occur ( for example config file is incorrect).

DataPP: Usually it is the main worker. But not in our case. The most important method is requestDecisions(...). This method must return DataPPReply instance. When DataPP instance emits signal finished() then all work has been done. More about it later. The default implementation of the requestDecisions is pretty good.

DataPPReply: This is the main worker in our case. If you don't forget to use '-s' key, then your NAMEDataPPReply will inherit SimpleDataPPReply instead of DataPPReply. In this class you must generate Decisions. After you have finished, call finish() method. Do not emit finished signal directly. If you want to report an error, then first call setError() method and then call finish(). Example: void NAMEDataPPReply::someFunction() { Nepomuk::WebExtractor::Decision d = newDecision() // Make some changes to d. mainDecisionList.addDecision(d) } It is obvious that you should not change anything after finish() is called. IMPORTANT: DataPPReply has an asychronious API. Because of this, if you have created all Decisions directly in the constructor of the NAMEDataPPReply, do not call finish() directly. Use QTimer::singleShot(0, SLOT(finish()));

Making the Decisions

Decisions represents the changes you want to introduce. Before further details will be explained, remember please: Decisions are very not-thread-safe. Even using different Decisions, but generated in one DataPPReply from different threads can be dangerous. Please use only one thread. Of, if it is necessary to use more than one thread, then make sure that only one of your threads is working with Decisions.

What is Decision ? Decision is a set of changes to be made to target resource and some auxilary information. The itrepretation of the Decision is the folowing: Decision represents new state of the object and must satisfy the following requirement:

  • New state of the resource must be a valid real-world object. You must not assign to the music file album title "Kill 'Em All" and author Jóhann Sebástian Bach.


Firstly, you need to create a Decision object. If you have ignored my insistence to pass '-s' flag, then you look at the DecisionFactory documentation. If you have followed my advice, then SimpleDataPPReply ( that is the base class of your NAMEDataPPReply ) will provide you newDecision() method. In the future, I will assume that you have inherited from the class SimpleDataPPReply. And let denote NW as abbreviation for Nepomuk::Webextractor. So, you first step is: NW::Decision dec = newDecision();

Now, some more information about Decisions. Decisions own a special storage model. What is this model and where it is stored is not you buisness. Decision will give you a ResourceManager object with appropriate model. <note>Please, do not try to obtain a Decision storage model, because there are some special FilterModel atop of it.</note> Some information about Decision's storage model:

  • Usually, it will be redland model. So some features are not supported
  • It can be a main nepomuk storage model. Be aware of it in your queries
  • A lot of Decisions share the same model. Be aware of it in you queries too.
  • Currently, ontologies are not copied to this model. So query like "select all triples where property is subPropertyOf XXX will not work. But it will be fixed in further releases.
  • You should not use any information from other Decisions in this model. Do not try to detect whether such Decision was created by another plugin - this case is properly handled by system.

Next step is to copy objects from main nepomuk storage to your Decision. When you copy an object to your Decision, a deep copy is performed. All changes you want to do with original resources should be done to the copy of this resources instead. Then, if Decision will be accepted, they will be synced back. The sync word is important! The changes will not be simply applied to the original resources, but will be wisely added. But there are still some limitations:

  • Remember, it is a deep copy. If you copy resource R1 and R1 has tag T1, then T1 will be copied too. And, when applying Decision, all changes you have made to the T1 will be synced back too. This can lead to some errors. If you have changed the T1 label, then all resources in the nepomuk main storage that has T1 as tag will now have a tag with another label! Instead, please just remove the T1 copy from the R1 copy, create new tag in the Decisions model and assign it to the R1 copy.


Each Decision consits of one or more PropertiesGroup. PropertiesGroup respresents an indivisible bunch of changes. It is up to you how to split Decision into PropertiesGroups. There are 2 ways to work with PropertiesGroup. The simplest one is: Decision d1; PropertiesGroup grp1 = d1.newGroup();