Projects/Nepomuk: Difference between revisions
(→ToDo) |
(added another todo item: the improved filewatch service) |
||
Line 39: | Line 39: | ||
=== General Nepomuk === | === General Nepomuk === | ||
==== Catching all file moves ==== | |||
Nepomuk uses an RDF database for all data. This includes file metadata. Files are referenced by URL. The problem with this is that when a file is moved or renamed we have to realize this and update the metadata accordingly (update the URL in the database). | |||
For KIO this is fairly simple since we have {{class|KDirNotify}}. The [http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchService Nepomuk filewatch service] takes care of this and updates the metadata whenever KIO moves or deletes a file. | |||
However, if a file is moved by a non-KDE application (typical example: the shell via the mv command) the filewatch service doe snot notice it and the file -> metadata link is gone. This is a bad situation which sadly cannot be solved easily. Systems like inotify are too restricted. | |||
Thus, while having a more powerful replacement for inotify would be great, in the meantime we should work with what we got. | |||
The idea is to create a [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Service.html Nepomuk service] that tries very hard to find file moves. It would regularly check the database for dangling metadata and then try to find the file using all kinds of evidence: | |||
* file name matching | |||
* xattrs if available (this would mean that Nepomuk::Resource also needs to set the xattrs at some point) | |||
* checksums, maybe the checksum of the first N bytes or something like that to speed the process up | |||
* compare metadata extracted by strigi | |||
* etc. | |||
All this information should be used to generate a score which indicates the certainty of the file matching. Then the final decision would have to be made by the user. | |||
'''''Hints:''''' | |||
* Try to detect if a complete folder has been moved (or deleted) and do not ask the user for every single file. | |||
==== Handling of external storage ==== | ==== Handling of external storage ==== |
Revision as of 19:24, 18 May 2009
About Nepomuk
This page is dedicated to Nepomuk development ideas, progress, experiments, and is a general starting point for new developers.
For general information about the Nepomuk project see the dedicated Nepomuk homepage.
Developer Coordination
The Nepomuk project is maintained by Sebastian Trueg of Mandriva.
Documentation
The following links provide good reads for getting used to the Nepomuk system and its APIs.
ToDo
Nepomuk is a rather young project with a notorious shortage in developers. There are many tasks and subprojects to get ones hands dirty on. Unlike other projects like Plasma, however, developing for Nepomuk is not easy. One has to read up on a lot of things and fight some day-to-day annoyances. But: helping with the development will improve the situation in any case.
If you are interested in working on a task in this list, please contact Sebastian Trueg.
Low level Nepomuk Development Tasks
The low-level development tasks are those that are not directly reflected in the GUI or even in the API used by most developers. However, they are important in terms of performance, scalability, and compatibility.
Soprano Transaction Support
Soprano is the RDF database framework used in Nepomuk. Currently Soprano does not support transactions, i.e. sets of commands that can be rolled back. An experimental development branch exists which already contains new API for transaction support (while keeping BC).
It still misses an implementation of the transaction support in Soprano backends (Sesame2 and Virtuoso) and in the client/server architecture.
General Nepomuk
Catching all file moves
Nepomuk uses an RDF database for all data. This includes file metadata. Files are referenced by URL. The problem with this is that when a file is moved or renamed we have to realize this and update the metadata accordingly (update the URL in the database).
For KIO this is fairly simple since we have KDirNotify. The Nepomuk filewatch service takes care of this and updates the metadata whenever KIO moves or deletes a file.
However, if a file is moved by a non-KDE application (typical example: the shell via the mv command) the filewatch service doe snot notice it and the file -> metadata link is gone. This is a bad situation which sadly cannot be solved easily. Systems like inotify are too restricted.
Thus, while having a more powerful replacement for inotify would be great, in the meantime we should work with what we got.
The idea is to create a Nepomuk service that tries very hard to find file moves. It would regularly check the database for dangling metadata and then try to find the file using all kinds of evidence:
- file name matching
- xattrs if available (this would mean that Nepomuk::Resource also needs to set the xattrs at some point)
- checksums, maybe the checksum of the first N bytes or something like that to speed the process up
- compare metadata extracted by strigi
- etc.
All this information should be used to generate a score which indicates the certainty of the file matching. Then the final decision would have to be made by the user.
Hints:
- Try to detect if a complete folder has been moved (or deleted) and do not ask the user for every single file.
Handling of external storage
A typical problem with the way Nepomuk handles files and file metadata are removable storage devices. They can be mounted at different paths on different systems. But still one wants to keep the metadata stored in Nepomuk. If possible one would even want to be able to search for files saved on an USB stick even if it is not plugged in.
The blog entry about removable storage in Nepomuk already discusses this problem and shows some existing code in KDE's playground which tries to tackle this problem.
However, one actually needs more. The system would have to be embedded into KIO to make sure the metadata cache on the removable storage device is always up-to-date. Also it is directly related to the problem of relative vs. absolute file URLs.
Relative vs. Absolute File URLs
Currently Nepomuk uses the absolute file URLs as URI identifiers for the resources representing the files in the Nepomuk RDF store. The file ~/test.png for example has the resource URI file:///home/<username>/test.png. This is nice in many situations since one can simply use the file URL to query file metadata but on the other hand we need to change a lot of triples whenever the file is moved (not to mention the removable storage problem above).
Thus, the idea is to use random URI identifiers for new file resources and store the file path relative to the mount point. This would solve the above problem with removable devices and make updates after file moves simpler (only update the path).
This problem should probably be tackled by introducing a class Nepomuk::File as a subclass to Nepomuk::Resource which handles all these special file stuff like making sure we have a correct nao:filePath property and so on (currently all that is done with an if clause in Nepomuk::Resource.
Ideas
There are many ideas on how to improve the Nepomuk system or on how to use it. This is the place to list them all.
Feel free to add your own ideas. Please leave your name in case someone wants to contact you for details or a discussion of the idea.