Projects/Nepomuk: Difference between revisions

    From KDE TechBase
    mNo edit summary
    (Marked this version for translation)
    (54 intermediate revisions by 4 users not shown)
    Line 1: Line 1:
    {{Template:I18n/Language Navigation Bar|Projects/Nepomuk}}
    <languages />
    <translate>


    <!--T:1-->
    [[Image:Nepomuk_logo_big.png|center|300px]]
    [[Image:Nepomuk_logo_big.png|center|300px]]


    == About Nepomuk ==
    == About Nepomuk == <!--T:2-->


    This page is dedicated to Nepomuk development ideas, progress, experiments, and is a general starting point for new developers.
    <!--T:3-->
    '''Nepomuk''' serves as a cross application semantic storage backend. It aims at collecting data from various sources - file indexing, the web, applications, etc, and linking them all together to form a cohesive map of data.


    For general information about the Nepomuk project see the [http://nepomuk.kde.org/ dedicated Nepomuk homepage].
    <!--T:4-->
    This page is dedicated to third party documentation for '''Nepomuk'''. To know more about '''Nepomuk''' from a user's point of view, head over to the [http://userbase.kde.org/Special:myLanguage/Nepomuk Nepomuk page on UserBase]. Or to know more about the Nepomuk community and getting involved in '''Nepomuk''', head over to the [http://community.kde.org/Projects/Nepomuk Nepomuk Community Page].


    == Documentation == <!--T:5-->


    == Contact ==
    <!--T:6-->
    Any new project is intimidating and jumping right into the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk-core/html/index.html API Documentation] can be scary. So, we have prepared some articles which explain the different aspects of '''Nepomuk''' and even touch on some advanced features.


    The Nepomuk project is maintained by [mailto:trueg@kde.org Sebastian Trueg] of Mandriva.
    <!--T:7-->
    The documentation of any project is always in progress as the code base is always evolving. If you feel that the documentation is lacking in some regard, please come talk to us. We'd love to hear your feedback, and the documentation might just get improved in the process.


    The "official" IRC channel is '''#nepomuk-kde''' on freenode.
    <!--T:8-->
    '''Nepomuk Mailing List: ''' [email protected] <br/>
    '''IRC Channel:''' #nepomuk-kde on freenode


    All development questions should be discussed on the [https://mail.kde.org/mailman/listinfo/nepomuk Nepomuk mailing list].
    === Introductory Material === <!--T:9-->


    == Documentation ==
    <!--T:10-->
    If you're just getting started with '''Nepomuk''' and want to know a quick way to fetch some data.


    The following links provide good reads for getting used to the Nepomuk system and its APIs.
    <!--T:11-->
    * [[Development/Tutorials/Metadata/Nepomuk|Development Tutorials]]
    * [[Special:myLanguage/Projects/Nepomuk/QuickStart| Quick Start]]
    *'''[[Development/Tutorials/Metadata/Nepomuk/TipsAndTricks|Nepomuk Tips and Tricks]]'''
    * [[Special:myLanguage/Projects/Nepomuk/OntologyBasics| Basic Ontology concepts]]
    * [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/index.html Nepomuk API Documentation]
    * [[Special:myLanguage/Projects/Nepomuk/Uris| Questions about URIs]]
    * [http://soprano.sourceforge.net/apidox/trunk/index.html Soprano (RDF storage) API]
    * [http://trueg.wordpress.com/2009/06/02/nepomuk-and-some-cmake-magic/ Using the Nepomuk Resource Code generator and the Soprano Ontology class generator in cmake]


    === Managing Data === <!--T:12-->


    As Nepomuk is highly dependent on its data in the RDF store and the used ontologies, one might consider to read up on RDF and the Nepomuk ontogies:
    <!--T:13-->
    * [http://www.w3.org/TR/REC-rdf-syntax/ RDF Primer]
    This section includes more in-depth articles on how manage the data in '''Nepomuk'''. As a starting point you should probably open up the [http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk-core/html/index.html Nepomuk API Documentation]. It is generally more up to date than the articles mentioned below.
    * [http://www.semanticdesktop.org/ontologies Nepomuk Ontologies]
    * [http://dev.nepomuk.semanticdesktop.org/wiki/OntologyMaintenance Experimental Nepomuk Ontologies and Ideas for new ones]


    == Events ==
    <!--T:14-->
    * [[Special:myLanguage/Projects/Nepomuk/Resources| Using Resources]]
    * [[Special:myLanguage/Projects/Nepomuk/ResourceWatcher| Monitoring Changes]]
    * [[Special:myLanguage/Projects/Nepomuk/BulkChanges| Bulk Changes]]
    * [[Special:myLanguage/Projects/Nepomuk/DataFeeders| Data Feeders]]


    [[Projects/Nepomuk/CodingSprint2009|June 19-21, 2009 - Coding Sprint 2009 Freiburg, Germany]]
    === File Indexing === <!--T:15-->


    [[Projects/Nepomuk/OpenSocialSemanticDesktopWorkshop2009|Open Social Semantic Desktop Workshop 2009 Freiburg, Germany]]
    <!--T:16-->
    With 4.10, the file indexing architecture has substantially changed. We no longer rely on strigi, and have our own plugin based interface.


    == ToDo  ==
    <!--T:17-->
    * [[Special:myLanguage/Projects/Nepomuk/IndexingPlugin| Writing an Indexing Plugin]]


    Nepomuk is a rather young project with a notorious shortage in developers. There are many tasks and subprojects to get ones hands dirty on. Unlike other projects like Plasma, however, developing for Nepomuk is not easy. One has to read up on a lot of things and fight some day-to-day annoyances. But: helping with the development will improve the situation in any case.
    === Querying === <!--T:18-->


    If you are interested in working on a task in this list, please contact [mailto:[email protected] Sebastian Trueg].  
    <!--T:19-->
    As you advance into '''Nepomuk''', you'll want to move beyond just fetching and pushing data and will want to query '''Nepomuk''' for specialized data. One can query '''Nepomuk''' in many different ways, the important part is to optimize your queries and make sure they run well on production systems where the database sizes may way very large.


    === Junior Jobs ===
    <!--T:20-->
    If you want to get into Nepomuk development quickly by taking over a small task have a look at our [[Projects/Nepomuk/JuniorJobs|Junior Job page]].
    * [[Special:myLanguage/Projects/Nepomuk/QueryingMethods| Different ways to Query Nepomuk]]
    * [[Special:myLanguage/Projects/Nepomuk/QueryLibrary| Nepomuk Query Library]]
    * [[Special:myLanguage/Projects/Nepomuk/SparqlQueries| Sparql Queries]]


    === Low level Nepomuk Development Tasks  ===
    === Architectural Overview === <!--T:21-->


    The low-level development tasks are those that are not directly reflected in the GUI or even in the API used by most developers. However, they are important in terms of performance, scalability, and compatibility.  
    <!--T:22-->
    If you're looking to get more involved with '''Nepomuk''' development process, you should probably need to need to figure out our basic architecture and where you can find all the relevant code.


    <!--T:23-->
    * [[Special:myLanguage/Projects/Nepomuk/Repositories| Nepomuk Repositories]]
    * [[Special:myLanguage/Projects/Nepomuk/ComponentOverview| Nepomuk Architectural Overview]]
    * [[Special:myLanguage/Projects/Nepomuk/kioslaves| Nepomuk KIO Slaves]]


    ==== Add Inference Configuration to the Virtuoso Soprano Backend ====
    === Nepomuk Internals === <!--T:24-->


    Virtuoso 5 provides inference on rdfs:subClassOf and rdfs:subPropertyOf. These are the most important ones and for now all we need in Nepomuk.
    <!--T:25-->
    When you decide to dig even deeper.


    The current implementation of the Virtuoso Soprano backend does not enable inference. We need a configuration option to do exactly that. It could happen along the lines of the [http://soprano.sourceforge.net/apidox/trunk/soprano_backend_virtuoso.html existing config options] or with the introduction of dedicated inference configuration options on the Soprano::Backend level.
    <!--T:26-->
    * [[Special:myLanguage/Projects/Nepomuk/GraphConcepts| Graph handling]]
    * [[Special:myLanguage/Projects/Nepomuk/VirtuosoInternal| Virtuoso Internals]]
    * [[Special:myLanguage/Projects/Nepomuk/OntologyExtention| Extending the Ontologies]]


    === Miscellaneous === <!--T:27-->


    <!--T:28-->
    * [[Special:myLanguage/Projects/Nepomuk/Nepomuk2Port| Porting to Nepomuk2]]
    * [[Special:myLanguage/Projects/Nepomuk/ManagingNepomukProcesses| Managing Nepomuk Processes]]
    * [[Special:myLanguage/Projects/Nepomuk/TestEnvironment| Nepomuk Test Environment]]
    * [[Special:myLanguage/Development/Tutorials/Metadata/Nepomuk/TipsAndTricks| Nepomuk Tips and Tricks]]
    * [[Special:myLanguage/Projects/Nepomuk/NepomukShow| Debugging Nepomuk Data]]


    ==== Outdated links ==== <!--T:29-->


    ==== Soprano Transaction Support  ====
    <!--T:30-->
    The following links provide good reads for getting used to the '''Nepomuk''' system and its APIs. <br\>
    They are slightly outdated, but still has some useful material.
    * [[Special:myLanguage/Development/Tutorials/Metadata/Nepomuk|Development Tutorials]]
    * [[Special:myLanguage/Projects/Nepomuk/Ideas|Random Ideas]]
    * [[Special:myLanguage/Projects/Nepomuk/Qualified_Relations_Idea| Qualified Relations Idea]]
    * [[Special:myLanguage/Projects/Nepomuk/ScenarioExamples| Scenario Examples]]


    [http://soprano.sf.net/ Soprano] is the RDF database framework used in Nepomuk. Currently Soprano does not support transactions, i.e. sets of commands that can be rolled back. An [http://websvn.kde.org/branches/soprano/experimental experimental development] branch exists which already contains new API for transaction support (while keeping BC).
    <!--T:31-->
     
    [[Category:Documentation]]
    It still misses an implementation of the transaction support in Soprano backends (Sesame2 and Virtuoso) and in the client/server architecture.
    </translate>
     
    Another idea is to create a new API based on the design that Sesame2 follows: Repository and RepositoryConnection classes. The former creates instances of the latter which then has all the actual data handling methods and acts as one transaction object.
     
     
    === General Nepomuk  ===
     
    ==== Handling of external storage  ====
     
    '''We already have the removablestorage service in kdebase which handles USB keys and such to a degree.'''
     
    A typical problem with the way Nepomuk handles files and file metadata are removable storage devices. They can be mounted at different paths on different systems. But still one wants to keep the metadata stored in Nepomuk. If possible one would even want to be able to search for files saved on an USB stick even if it is not plugged in.
     
    The [http://trueg.wordpress.com/2009/04/15/portable-meta-information-yet-again-only-this-time-there-is-code/ blog entry about removable storage in Nepomuk] already discusses this problem and shows some existing code in KDE's [http://websvn.kde.org/trunk/playground/base/removablestorageservice/ playground] which tries to tackle this problem.
     
    However, one actually needs more. The system would have to be embedded into KIO to make sure the metadata cache on the removable storage device is always up-to-date. Also it is directly related to the problem of relative vs. absolute file URLs.
     
     
    ==== Nepomuk Backup Service  ====
     
    Implementation details are discussed in [[Projects/Nepomuk/MetadataSharing]]
     
    We need a backup solution. The idea is the typical one: have a Nepomuk service that allows to specify update intervals and manual updates.
     
    The service should ignore all data extracted by Strigi, i.e. data that can be recreated deterministically. This can easy be determined by checking the context/named graph the data statements are stored in. Strigi stores all extracted data in one context which is marked as the ''http://www.strigi.org/fields#indexGraphFor'' for the file in question. Thus, a query along the lines of the following would work:
    <pre>select ?s ?p ?o ?g where {
        graph ?g { ?s ?p ?o . } .
        OPTIONAL { ?g strigi:indexGraphFor ?x . } .
        FILTER(!BOUND(?x)) .
    }</pre>
     
    Other features could include replacement of the home directory like it is done in KConfig. This way the data could be re-imported in another user account.
     
     
    ==== Nepomuk Toolbox ====
    Provide a GUI that allows to call methods such as ''optimize'' and ''rebuildIndex'' on the storage service. The latter method is not commited yet due to the KDE 4.3 feature freeze but will be afterwards.
     
    It would also be useful to have Nepomuk register such operations (including the data conversion when changing backends) via the notification system.
     
    == Ideas ==
     
    There are many ideas on how to improve the Nepomuk system or on how to use it. This is the place to list them all.
     
    Feel free to add your own ideas. Please leave your name in case someone wants to contact you for details or a discussion of the idea.
     
     
     
    === Use Nepomuk in the KDE Menu ===
    One could think of using nepomuk search in the KMenu to look for applications or even files or persons.
     
    === Remember Usage of movie/sound files ===
    Media players such as Dragonplayer or Amarok could remember when movie/sound files have been watched/listened to. The last time is interesting but maybe also a history.
     
    In any case, it allows to quickly access unwatched episodes.
     
    === Tool to gather annotation statistics about selected files ===
    Quoting [http://trueg.wordpress.com/2009/05/21/your-our-nepomuk-ideas/#comment-303 blog comment] as an example: ''"I am using Nepomuk to tag/rate schoolwork from my students. For every paper/file I tag it with seen/unseen and rate it with the actual grade I want to give (0-5). When I have seen them all, I collect the results into a spreadsheet. It would make my life (even) easier if, by selecting a bunch of file I could have a summary (one I could save in some text form) of all ratings/tags for each file in the selection."''
     
    One could think of an action in Dolphin (for a first prototype this is always a good idea) which triggers a collection of all metadata which is then layed out according to the user's wishes: html, plain text, odt, whatever.
     
    === Add support for qualified links/relations ===
     
    This is a somewhat low-level idea with no visible results as long as applications don't use it, but I think that having it implemented would allow for some nice possibilities.
     
    Basically, the goal is to have some generic way of attaching a "quality" to any “thing -- property” assignment, in order to cope with the varying credibility/certainty of different meta-data collection methods such as user input, heuristic algorithms, circumstantial guesses, etc. in a transparent and unified way.
     
    This would among other things allow implementing many automatic-data-collection ideas like NLP-support in a more user-friendly (that is: non-intrusive) fashion.
     
    For more details & discussion see [[Projects/Nepomuk/Qualified Relations Idea]].
     
    === Folder Cloud in Dolphin/KDirOperator ===
    Using information from the Nepomuk DB about usage frequency of folders (or the files within) it would be nice to have the folders be presented in a cloud. More often used or more important folders would appear bigger.
     
    This is a nice idea originally posted on [http://www.kde-look.org/content/show.php/Folder+Cloud?content=101521 kde-look.org].
     
     
     
    === Nepomuk based backup system ===
    Nepomuk has a huge potential for an intelligent backup system. The point here is that Nepomuk could "know" that a certain file on a certain device is the backup of a local file. Then, when the device is available it would trigger an automatic update of the backup.
     
    The user could, for example, just tag a folder with "Backup" (better use a dedicated ontology) and the system would ask where to back it up and perform all the necessary tasks. Backup history and recovery could then be done inside the Nepomuk resource. The key point here is really the fact that the system would "know" what a backup is, recognize one when it sees it and know what to do with it.
     
    === Nepomuk based versioning system ===
    This is partly related to the backup system. User should be able to tag a version of a file, a bit like in svn. When opening a file, there could be a list of saved versions, in case user wants to revert changes or something like that.
     
    This could even be combined with automatic detection. For example in email: somebody sends you the second version of a paper which is then saved. It would be great to remember that the new file is the second version of the first one. Then the system could warn if one opens the older version.
     
    === File Boxes ===
    There is a nice idea about file boxes which allow to temporarily group files to perform some actions on them here: http://bugs.kde.org/show_bug.cgi?id=200461
     
    This could be done using Nepomuk. I am not sure, however, if Nepomuk is really the correct choice here. Maybe a simple kded service and a KIO slave nicely integrated into Dolphin and the file dialog would be sufficient.
     
    === Categorize new files ===
    '''We already have the fileannotation service in [http://websvn.kde.org/trunk/playground/base/nepomuk-kde/fileannotationservice/ playground]. This is the basis for the idea below as it already implements parts of it.''
     
    Let Strigi emit a D-Bus signal on new files (only after the initial indexing so we do not get signals for all files) that appear in typical document folders (so we do not get signals on temp and log files and the like).
     
    When a new file appears propose to relate it to the current Nepomuk context (the context service is in playground: a very simple one only maintaining one URI which is the current context and can be any resource. Typically, however, it would be a project or task or event).
     
    Also use the information extracted by Strigi (mostly nie:plainTextContent and nie:description and nie:name) to generate annotation suggestions through Scribo and propose them to the user, too. This could be done via notifications (in a first version) and later in a nicer plasma GUI.
     
    Here rating of the suggestions is important. Nepomuk::Annotation already provides a relevance() method but most plugins' relevance generation code is rather simple and could use improvement.
     
    '''''Hints:'''''
    * If the above "''Add support for qualified links/relations''" idea were implemented, the user would not have to be bothered to confirm the relation of the new file to the current context. You could just add it as a weak relation ('found by circumstancial evidence'). The user could always confirm it at a later point (e.g. in dolphin) to turn it into a strong relation.
     
    === FIXME: add your own ideas ===
     
    ==== Linking between documents ====
    I usually deal with lots of scientific papers in the form of searchable pdf's.
    My idea is twofold:
    1) Have a 'Google scholar' type search that allows me to see the relations between papers and retrace an idea to its original author.
    2) Each paper refers to a few other that I might have already on disk. My idea is to have okular integration such that when I click on a reference it opens the respective file based on author, date, journal, etc.
     
    ==== Push tag clouds on the web ====
     
    I'd like to save my tag cloud on the web so that when I change computers, I still have my tags. There should also be serveral projects which wikify tag cloud creation and which would serve updates via some kind of RSS feeds. Think of it as some sort of Digg for both online and offline desktop items.
     
    === Augment menu toolbars with semantic search ===
    Ideally, eliminate the use of menu toolbars, rather have a powerful semantic search to query for a given functionality/action etc.
     
    === Pure semantic desktop environment ===
    A minimalistic desktop environment solely based on semantic search. Ideal for small screens (e.g. netbooks/smartbooks), all functionality is accessed via semantic search, rather than the usual assortment of menus or application icon panels. A rather intuitive UI. The user 'talks' to the computer. In essence this takes the idea of beagle/krunner/gnome-go like idea to the next level. Combined with the replacement of toolbar menus, it makes for an efficient use of screen space with an uncluttered UI, plus shifting the input method more to the keyboard side, which can be beneficial to netbooks/laptops on the move (when you don't want to rely on a mouse)
     
    === Link Pictures to Persons ===
     
    Apple already did it the really fancy way: face recognition + linking of faces to people.
     
    We do not have face recognition yet but we can link pictures to persons. Akonadi pushes the contacts in the address book to Nepomuk as nco:PersonContact instances. We can simply use those to allow linking to images.
     
    In playground we already have peopletag which allows to define a region. It would make more sense to integrate it with Gwenview.
     
    The ontology to use it still an issue. After all we want to easily handle
     
    * The direct link between the image and the person
    * The region in the image
     
    Dolphin should be able to display the information, too, i.e. the name of the person, maybe even with a link to the address book.
     
     
    === Automatically annotate system files ===
     
    The example is wallpapers. Installed wallpapers could automatically be marked as begin of type "Wallpaper". This would also require an ontology which includes the term Wallpaper based on PIMO.
     
    == Development status ==
     
    See [[Projects/Nepomuk/DevelopmentStatus]].

    Revision as of 17:07, 10 December 2012


    About Nepomuk

    Nepomuk serves as a cross application semantic storage backend. It aims at collecting data from various sources - file indexing, the web, applications, etc, and linking them all together to form a cohesive map of data.

    This page is dedicated to third party documentation for Nepomuk. To know more about Nepomuk from a user's point of view, head over to the Nepomuk page on UserBase. Or to know more about the Nepomuk community and getting involved in Nepomuk, head over to the Nepomuk Community Page.

    Documentation

    Any new project is intimidating and jumping right into the API Documentation can be scary. So, we have prepared some articles which explain the different aspects of Nepomuk and even touch on some advanced features.

    The documentation of any project is always in progress as the code base is always evolving. If you feel that the documentation is lacking in some regard, please come talk to us. We'd love to hear your feedback, and the documentation might just get improved in the process.

    Nepomuk Mailing List: [email protected]
    IRC Channel: #nepomuk-kde on freenode

    Introductory Material

    If you're just getting started with Nepomuk and want to know a quick way to fetch some data.

    Managing Data

    This section includes more in-depth articles on how manage the data in Nepomuk. As a starting point you should probably open up the Nepomuk API Documentation. It is generally more up to date than the articles mentioned below.

    File Indexing

    With 4.10, the file indexing architecture has substantially changed. We no longer rely on strigi, and have our own plugin based interface.

    Querying

    As you advance into Nepomuk, you'll want to move beyond just fetching and pushing data and will want to query Nepomuk for specialized data. One can query Nepomuk in many different ways, the important part is to optimize your queries and make sure they run well on production systems where the database sizes may way very large.

    Architectural Overview

    If you're looking to get more involved with Nepomuk development process, you should probably need to need to figure out our basic architecture and where you can find all the relevant code.

    Nepomuk Internals

    When you decide to dig even deeper.

    Miscellaneous

    Outdated links

    The following links provide good reads for getting used to the Nepomuk system and its APIs. <br\> They are slightly outdated, but still has some useful material.