Projects/Nepomuk: Difference between revisions

    From KDE TechBase
    (add warning that Nepomuk is obsolete, text copied from other wikis)
     
    (93 intermediate revisions by 13 users not shown)
    Line 1: Line 1:
    [[Image:Nepomuk_logo_big.png|center|300px]]
    [[Image:Nepomuk_logo_big.png|center|300px]]
    ''{{Warning|The Nepomuk project no longer exists in modern day KDE software. From KDE Applications 4.13 onwards, the '[[community:Baloo | Baloo]]' file indexing and file search framework replaces Nepomuk. Read [http://dot.kde.org/2014/02/24/kdes-next-generation-semantic-search details on the changes for Applications 4.13 here].}}


    == About Nepomuk ==
    == About Nepomuk ==


    This page is dedicated to Nepomuk development ideas, progress, experiments, and is a general starting point for new developers.
    '''Nepomuk''' serves as a cross application semantic storage backend. It aims at collecting data from various sources - file indexing, the web, applications, etc, and linking them all together to form a cohesive map of data.
     
    For general information about the Nepomuk project see the [http://nepomuk.kde.org/ dedicated Nepomuk homepage].
     
     
    == Developer Coordination ==
     
    The Nepomuk project is maintained by [mailto:[email protected] Sebastian Trueg] of Mandriva.


    This page is dedicated to third party documentation for '''Nepomuk'''. To know more about '''Nepomuk''' from a user's point of view, head over to the [http://userbase.kde.org/Special:myLanguage/Nepomuk Nepomuk page on UserBase]. Or to know more about the Nepomuk community and getting involved in '''Nepomuk''', head over to the [http://community.kde.org/Projects/Nepomuk Nepomuk Community Page].


    == Documentation ==
    == Documentation ==


    The following links provide good reads for getting used to the Nepomuk system and its APIs.
    Any new project is intimidating and jumping right into the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk-core/html/index.html API Documentation] can be scary. So, we have prepared some articles which explain the different aspects of '''Nepomuk''' and even touch on some advanced features.
    * [[Development/Tutorials/Metadata/Nepomuk|Development Tutorials]]
    * [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/index.html Nepomuk API Documentation]
    * [http://soprano.sourceforge.net/apidox/trunk/index.html Soprano (RDF storage) API]
    * [http://trueg.wordpress.com/2009/06/02/nepomuk-and-some-cmake-magic/ Using the Nepomuk Resource Code generator and the Soprano Ontology class generator in cmake]
     
     
    As Nepomuk is highly dependent on its data in the RDF store and the used ontologies, one might consider to read up on RDF and the Nepomuk ontogies:
    * [http://www.w3.org/TR/REC-rdf-syntax/ RDF Primer]
    * [http://www.semanticdesktop.org/ontologies Nepomuk Ontologies]
    * [http://dev.nepomuk.semanticdesktop.org/wiki/OntologyMaintenance Experimental Nepomuk Ontologies and Ideas for new ones]
     
    == ToDo  ==
     
    Nepomuk is a rather young project with a notorious shortage in developers. There are many tasks and subprojects to get ones hands dirty on. Unlike other projects like Plasma, however, developing for Nepomuk is not easy. One has to read up on a lot of things and fight some day-to-day annoyances. But: helping with the development will improve the situation in any case.
     
    If you are interested in working on a task in this list, please contact [mailto:[email protected] Sebastian Trueg].
     
     
    === Low level Nepomuk Development Tasks  ===
     
    The low-level development tasks are those that are not directly reflected in the GUI or even in the API used by most developers. However, they are important in terms of performance, scalability, and compatibility.
     
    ==== Soprano Transaction Support  ====
     
    [http://soprano.sf.net/ Soprano] is the RDF database framework used in Nepomuk. Currently Soprano does not support transactions, i.e. sets of commands that can be rolled back. An [http://websvn.kde.org/branches/soprano/experimental experimental development] branch exists which already contains new API for transaction support (while keeping BC).
     
    It still misses an implementation of the transaction support in Soprano backends (Sesame2 and Virtuoso) and in the client/server architecture.
     
    ==== Multi-threaded Storage service  ====
     
    At the moment the [[Development/Tutorials/Metadata/Nepomuk/StorageService|Nepomuk storage]] service is single-threaded. This slows down the system when more than one application tries to access data. Making the [http://soprano.sourceforge.net/apidox/trunk/classSoprano_1_1Server_1_1ServerCore.html Soprano server] implementation (which the Nepomuk service is based on) multi-threaded should not be that hard, knowing that the storage backend (sesame2) is already thread-safe.
     
     
    === General Nepomuk  ===
     
    ==== Catching all file moves  ====
     
    '''Work already begun by Sebastian Trueg - help always welcome'''
     
    Nepomuk uses an RDF database for all data. This includes file metadata. Files are referenced by URL. The problem with this is that when a file is moved or renamed we have to realize this and update the metadata accordingly (update the URL in the database).
     
    For KIO this is fairly simple since we have {{class|KDirNotify}}. The [http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchService Nepomuk filewatch service] takes care of this and updates the metadata whenever KIO moves or deletes a file.
     
    However, if a file is moved by a non-KDE application (typical example: the shell via the mv command) the filewatch service does not notice it and the file -> metadata link is gone. This is a bad situation which sadly cannot be solved easily. Systems like inotify are too restricted.
     
    Thus, while having a more powerful replacement for inotify would be great, in the meantime we should work with what we got.
     
    The idea is to create a [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Service.html Nepomuk service] that tries very hard to find file moves. It would regularly check the database for dangling metadata and then try to find the file using all kinds of evidence:
     
    *file name matching
    *xattrs if available (this would mean that Nepomuk::Resource also needs to set the xattrs at some point)
    *checksums, maybe the checksum of the first N bytes or something like that to speed the process up
    *compare metadata extracted by strigi
    *etc.
     
    All this information should be used to generate a score which indicates the certainty of the file matching. Then the final decision would have to be made by the user.
     
    '''''Hints:'''''
     
    *Try to detect if a complete folder has been moved (or deleted) and do not ask the user for every single file.
     
    ==== Handling of external storage  ====
     
    A typical problem with the way Nepomuk handles files and file metadata are removable storage devices. They can be mounted at different paths on different systems. But still one wants to keep the metadata stored in Nepomuk. If possible one would even want to be able to search for files saved on an USB stick even if it is not plugged in.
     
    The [http://trueg.wordpress.com/2009/04/15/portable-meta-information-yet-again-only-this-time-there-is-code/ blog entry about removable storage in Nepomuk] already discusses this problem and shows some existing code in KDE's [http://websvn.kde.org/trunk/playground/base/removablestorageservice/ playground] which tries to tackle this problem.
     
    However, one actually needs more. The system would have to be embedded into KIO to make sure the metadata cache on the removable storage device is always up-to-date. Also it is directly related to the problem of relative vs. absolute file URLs.
     
    ==== Relative vs. Absolute File URLs  ====
     
    Currently Nepomuk uses the absolute file URLs as URI identifiers for the resources representing the files in the Nepomuk RDF store. The file ''~/test.png'' for example has the resource URI ''file:///home/<username>/test.png''. This is nice in many situations since one can simply use the file URL to query file metadata but on the other hand we need to change a lot of triples whenever the file is moved (not to mention the removable storage problem above).
     
    Thus, the idea is to use random URI identifiers for new file resources and store the file path relative to the mount point. This would solve the above problem with removable devices and make updates after file moves simpler (only update the path).
     
    This problem should probably be tackled by introducing a class Nepomuk::File as a subclass to ''[http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Resource.html Nepomuk::Resource]'' which handles all these special file stuff like making sure we have a correct nao:filePath property and so on (currently all that is done with an ''if'' clause in ''Nepomuk::Resource''.
     
    ==== Nepomuk Backup Service  ====
     
    We need a backup solution. The idea is the typical one: have a Nepomuk service that allows to specify update intervals and manual updates.
     
    The service should ignore all data extracted by Strigi, i.e. data that can be recreated deterministically. This can easy be determined by checking the context/named graph the data statements are stored in. Strigi stores all extracted data in one context which is marked as the ''http://www.strigi.org/fields#indexGraphFor'' for the file in question. Thus, a query along the lines of the following would work:
    <pre>select ?s ?p ?o ?g where {
        graph ?g { ?s ?p ?o . } .
        OPTIONAL { ?g strigi:indexGraphFor ?x . } .
        FILTER(!BOUND(?x)) .
    }</pre>
     
    Other features could include replacement of the home directory like it is done in KConfig. This way the data could be re-imported in another user account.
     
     
    === GUI  ===
     
    ==== Better tagging widget  ====
     
    Currently there is a [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1TagWidget.html tagging widget in kdelibs] which is pretty ugly and not even used. Then there is the tag cloud used in Dolphin. The latter was already criticized for not being appropriate in that situation.
     
    Thus, it would be great to make [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1TagWidget.html Nepomuk::TagWidget] a nice and usable (maybe talk to the usability people) widget that can then be used in Dolphin, Gwenview, and pretty much any application that wants to tag resources.
     
    == Ideas ==


    There are many ideas on how to improve the Nepomuk system or on how to use it. This is the place to list them all.
    The documentation of any project is always in progress as the code base is always evolving. If you feel that the documentation is lacking in some regard, please come talk to us. We'd love to hear your feedback, and the documentation might just get improved in the process.


    Feel free to add your own ideas. Please leave your name in case someone wants to contact you for details or a discussion of the idea.
    '''Nepomuk Mailing List: ''' nepomuk@kde.org <br/>
    '''IRC Channel:''' #nepomuk-kde on freenode


    === Remember download locations ===
    === Introductory Material ===


    As [http://www.kdedevelopers.org/node/3843 blogged before] remembering the download location and the referrer web page is a pretty good idea. The most pressing problem at the moment is finishing [http://dev.nepomuk.semanticdesktop.org/wiki/NdoOntology the download ontology].
    If you're just getting started with '''Nepomuk''' and want to know a quick way to fetch some data.


    Giving the user the option to tag either at the download dialogue and/or the kuiserver download notification would make it easier to tag instantly rather than waiting for the file to download and then rmembering to come back when it's finished and tag then. When bookmarking in Firefox, FF adds some suggested tags (fairly accurately too!) which the user can delet / add to. I suggest the same for downloaded files. This way, the web pages meta tags / title can be used for suggestions if the user doesn't feel like tagging , doesn't know about it, or just to speed the process up.
    * [[Special:myLanguage/Projects/Nepomuk/QuickStart| Quick Start]]
    * [[Special:myLanguage/Projects/Nepomuk/OntologyBasics| Basic Ontology concepts]]
    * [[Special:myLanguage/Projects/Nepomuk/Uris| Questions about URIs]]


    === Use Nepomuk in the KDE Menu ===
    === Managing Data ===
    One could think of using nepomuk search in the KMenu to look for applications or even files or persons.


    === Remember Usage of movie/sound files ===
    This section includes more in-depth articles on how manage the data in '''Nepomuk'''. As a starting point you should probably open up the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk-core/html/index.html Nepomuk API Documentation]. It is generally more up to date than the articles mentioned below.
    Media players such as Dragonplayer or Amarok could remember when movie/sound files have been watched/listened to. The last time is interesting but maybe also a history.


    In any case, it allows to quickly access unwatched episodes.
    * [[Special:myLanguage/Projects/Nepomuk/Resources| Using Resources]]
    * [[Special:myLanguage/Projects/Nepomuk/ResourceWatcher| Monitoring Changes]]
    * [[Special:myLanguage/Projects/Nepomuk/BulkChanges| Bulk Changes]]
    * [[Special:myLanguage/Projects/Nepomuk/DataFeeders| Data Feeders]]


    === Tool to gather annotation statistics about selected files ===
    === File Indexing ===
    Quoting [http://trueg.wordpress.com/2009/05/21/your-our-nepomuk-ideas/#comment-303 blog comment] as an example: ''"I am using Nepomuk to tag/rate schoolwork from my students. For every paper/file I tag it with seen/unseen and rate it with the actual grade I want to give (0-5). When I have seen them all, I collect the results into a spreadsheet. It would make my life (even) easier if, by selecting a bunch of file I could have a summary (one I could save in some text form) of all ratings/tags for each file in the selection."''


    One could think of an action in Dolphin (for a first prototype this is always a good idea) which triggers a collection of all metadata which is then layed out according to the user's wishes: html, plain text, odt, whatever.
    With 4.10, the file indexing architecture has substantially changed. We no longer rely on strigi, and have our own plugin based interface.


    === Add support for qualified links/relations ===
    * [[Special:myLanguage/Projects/Nepomuk/IndexingPlugin| Writing an Indexing Plugin]]


    This is a somewhat low-level idea with no visible results as long as applications don't use it, but I think that having it implemented would allow for some nice possibilities.
    === Querying ===


    Basically, the goal is to have some generic way of attaching a "quality" to any “thing -- property” assignment, in order to cope with the varying credibility/certainty of different meta-data collection methods such as user input, heuristic algorithms, circumstantial guesses, etc. in a transparent and unified way.
    As you advance into '''Nepomuk''', you'll want to move beyond just fetching and pushing data and will want to query '''Nepomuk''' for specialized data. One can query '''Nepomuk''' in many different ways, the important part is to optimize your queries and make sure they run well on production systems where the database sizes may way very large.


    This would among other things allow implementing many automatic-data-collection ideas like NLP-support in a more user-friendly (that is: non-intrusive) fashion.
    * [[Special:myLanguage/Projects/Nepomuk/QueryingMethods| Different ways to Query Nepomuk]]
    * [[Special:myLanguage/Projects/Nepomuk/QueryLibrary| Nepomuk Query Library]]
    * [[Special:myLanguage/Projects/Nepomuk/SparqlQueries| Sparql Queries]]


    For more details & discussion see [[Projects/Nepomuk/Qualified Relations Idea]].
    === Architectural Overview ===


    === Folder Cloud in Dolphin/KDirOperator ===
    If you're looking to get more involved with '''Nepomuk''' development process, you should probably need to need to figure out our basic architecture and where you can find all the relevant code.
    Using information from the Nepomuk DB about usage frequency of folders (or the files within) it would be nice to have the folders be presented in a cloud. More often used or more important folders would appear bigger.


    This is a nice idea originally posted on [http://www.kde-look.org/content/show.php/Folder+Cloud?content=101521 kde-look.org].
    * [[Special:myLanguage/Projects/Nepomuk/Repositories| Nepomuk Repositories]]
    * [[Special:myLanguage/Projects/Nepomuk/ComponentOverview| Nepomuk Architectural Overview]]
    * [[Special:myLanguage/Projects/Nepomuk/kioslaves| Nepomuk KIO Slaves]]


    === Standalone Search Application ===
    === Nepomuk Internals ===
    Create a standalone search application using Nepomuk. Currently, the KDE desktop does not have a clear application that will search the user's entire hard drive for a file.  One application that implements this well is Beagle in Gnome. 


    This standalone application would provide the full search.  This application could also be the place for implementing the tagging idea above.
    When you decide to dig even deeper.


    In [http://websvn.kde.org/trunk/playground/base/nepomuk-kde/knepsearchclient/ playground] we already have a simple search client. It could be the basis for this tool. The existing client does show the results in categories. This could be further improved. It also uses a first attempts at creating a generic resource presentation framework, i.e. a way to handle drawing of arbitrary resource types through plugins and even a rule system.
    * [[Special:myLanguage/Projects/Nepomuk/GraphConcepts| Graph handling]]
    * [[Special:myLanguage/Projects/Nepomuk/VirtuosoInternal| Virtuoso Internals]]
    * [[Special:myLanguage/Projects/Nepomuk/OntologyExtention| Extending the Ontologies]]


    === Nepomuk based backup system ===
    === Miscellaneous ===
    Nepomuk has a huge potential for an intelligent backup system. The point here is that Nepomuk could "know" that a certain file on a certain device is the backup of a local file. Then, when the device is available it would trigger an automatic update of the backup.


    The user could, for example, just tag a folder with "Backup" (better use a dedicated ontology) and the system would ask where to back it up and perform all the necessary tasks. Backup history and recovery could then be done inside the Nepomuk resource. The key point here is really the fact that the system would "know" what a backup is, recognize one when it sees it and know what to do with it.
    * [[Special:myLanguage/Projects/Nepomuk/Nepomuk2Port| Porting to Nepomuk2]]
    * [[Special:myLanguage/Projects/Nepomuk/ManagingNepomukProcesses| Managing Nepomuk Processes]]
    * [[Special:myLanguage/Projects/Nepomuk/TestEnvironment| Nepomuk Test Environment]]
    * [[Special:myLanguage/Development/Tutorials/Metadata/Nepomuk/TipsAndTricks| Nepomuk Tips and Tricks]]
    * [[Special:myLanguage/Projects/Nepomuk/NepomukShow| Debugging Nepomuk Data]]


    === Nepomuk based versioning system ===
    ==== Outdated links ====  
    This is partly related to the backup system. User should be able to tag a version of a file, a bit like in svn. When opening a file, there could be a list of saved versions, in case user wants to revert changes or something like that.


    This could even be combined with automatic detection. For example in email: somebody sends you the second version of a paper which is then saved. It would be great to remember that the new file is the second version of the first one. Then the system could warn if one opens the older version.
    The following links provide good reads for getting used to the '''Nepomuk''' system and its APIs. <br\>
    They are slightly outdated, but still has some useful material.
    * [[Special:myLanguage/Development/Tutorials/Metadata/Nepomuk|Development Tutorials]]
    * [[Special:myLanguage/Projects/Nepomuk/Ideas|Random Ideas]]
    * [[Special:myLanguage/Projects/Nepomuk/Qualified_Relations_Idea| Qualified Relations Idea]]
    * [[Special:myLanguage/Projects/Nepomuk/ScenarioExamples| Scenario Examples]]


    === FIXME: add your own ideas ===
    [[Category:Documentation]]

    Latest revision as of 06:44, 16 November 2023

    Warning
    The Nepomuk project no longer exists in modern day KDE software. From KDE Applications 4.13 onwards, the ' Baloo' file indexing and file search framework replaces Nepomuk. Read details on the changes for Applications 4.13 here.


    About Nepomuk

    Nepomuk serves as a cross application semantic storage backend. It aims at collecting data from various sources - file indexing, the web, applications, etc, and linking them all together to form a cohesive map of data.

    This page is dedicated to third party documentation for Nepomuk. To know more about Nepomuk from a user's point of view, head over to the Nepomuk page on UserBase. Or to know more about the Nepomuk community and getting involved in Nepomuk, head over to the Nepomuk Community Page.

    Documentation

    Any new project is intimidating and jumping right into the API Documentation can be scary. So, we have prepared some articles which explain the different aspects of Nepomuk and even touch on some advanced features.

    The documentation of any project is always in progress as the code base is always evolving. If you feel that the documentation is lacking in some regard, please come talk to us. We'd love to hear your feedback, and the documentation might just get improved in the process.

    Nepomuk Mailing List: [email protected]
    IRC Channel: #nepomuk-kde on freenode

    Introductory Material

    If you're just getting started with Nepomuk and want to know a quick way to fetch some data.

    Managing Data

    This section includes more in-depth articles on how manage the data in Nepomuk. As a starting point you should probably open up the Nepomuk API Documentation. It is generally more up to date than the articles mentioned below.

    File Indexing

    With 4.10, the file indexing architecture has substantially changed. We no longer rely on strigi, and have our own plugin based interface.

    Querying

    As you advance into Nepomuk, you'll want to move beyond just fetching and pushing data and will want to query Nepomuk for specialized data. One can query Nepomuk in many different ways, the important part is to optimize your queries and make sure they run well on production systems where the database sizes may way very large.

    Architectural Overview

    If you're looking to get more involved with Nepomuk development process, you should probably need to need to figure out our basic architecture and where you can find all the relevant code.

    Nepomuk Internals

    When you decide to dig even deeper.

    Miscellaneous

    Outdated links

    The following links provide good reads for getting used to the Nepomuk system and its APIs. <br\> They are slightly outdated, but still has some useful material.