(added links to Scribo) |
|||
| Line 60: | Line 60: | ||
Sebastian Trüg mentioned that the "named graphs" feature of Nepomuk, currently used for remembering when a user tagged something, might allow for this kind of optional "quality" to be stored with a relation. | Sebastian Trüg mentioned that the "named graphs" feature of Nepomuk, currently used for remembering when a user tagged something, might allow for this kind of optional "quality" to be stored with a relation. | ||
| + | |||
| + | == Various Links == | ||
| + | |||
| + | * [http://www.scribo.ws Scribo project]: the Scribo project aims at implementing at least partly the features described by Case C in KDE, i.e. providing NLP tools for extracting metadata from text documents. The tools are meant to be configurable for analysing specific types of documents (such as letters, technical documents related to KDE). They rely on various analysis engines: [http://www.proxem.com Antelope by Proxem], CEA NLP tools, INRIA tools, OpenCalais, GATE engine etc. The specificity of the approach is that the user will be able to give feedback to the analysis engines for them to improve their heuristics. Scribo partners will take part in [http://2009.rmll.info/?lang=en RMLL 2009]. The roadmap of Scribo implementation in KDE is available on the [http://wiki.mandriva.com/en/Scribo_KDE_roadmap Mandriva Scribo page]. | ||
Contents |
Seamlessly collecting all kinds of different meta-data about your files (and other things) in one central place makes Nepomuk a really powerful technology for organizing and accessing your documents. Especially the prospect of being able to rely on your Computer to automatically remember any helpful meta-info (like download locations) sounds exciting.
When these possibilities will be more fully exploited in the future, extremely complex information systems might arise – which is not a problem, as long as it all serves to allow applications to assist the user in managing his documents and other data more easily and efficiently. At some point, however, having tons of interesting but undifferentiated information about any specific item (e.g. document) just “pile up” over time in “one big pot” might make it increasingly difficult for an application (say, desktop search) to really make full use of it.
Take, for a hypothetical example, a rich text document created by the user using his favorite Nepomuk-aware text processor.
So I propose that in addition to having Nepomuk collect from different sources all kinds of meaningful properties about it's items in accordance to it's ontologies (like the “is a letter” property for a document), there should also be some unified way of “qualifying” these property-to-thing connections regarding their reliability/source/scope of validity and so on. So instead of merely saving unqualified connections like “x is y”, you might save things like:
and so on, using some kind of “qualifier”-ontology.
Of course, all this information could instead be stored using separate tags (or whatever you call them) for each case, like in the above example you could have an “is letter” property for user annotation, a separate “is document-created-from-letter-template” property, and yet another “is document-deemed-a-letter-by-plugin_xyz” property. However, the information would not be nearly as useful then from the point of view of a client application, as it would have to know about the tag set by the word processor, and also about the existence of the post-processing plugin, in order to make use of it. If on the other hand all the Nepomuk entries logically describing or hinting at the being-a-letter status of the document were actually specified in terms of a single “is letter” property, but using generic qualifiers (see above), it would be easy for client applications to transparently make full use this info in a meaningful way (imagine, for example, a semantic desktop search application that when asked for all letters written in the last 2 months will show user-confirmed letters at the top, while showing documents with a low credibility “is letter” property only when the user clicks a “show further possible results” link. And it would show the same behaviour for all other qualifiable tags/properties, even if it really knows nothing about their meaning).
The more I think about what kind of complex semantic desktop features might be possible in the future, I feel this would be something that would really open up a lot of possibilities in terms of bringing advanced semantic capabilities to the desktop that would be as helpful for the user as possible, while at the same time getting in the way of the his/her actual work as little as possible.
--Sam 10:24, 25 May 2009 (UTC)
Following sources of meta-data collection would most likely benefit from this approach:
Client applications might then, without having to know about the meanings of the qualified properties:
(If you can think of further use cases, please add them to the list so that a developer thinking about designing/implementing this will consider all aspects and implications involved)
Sebastian Trüg mentioned that the "named graphs" feature of Nepomuk, currently used for remembering when a user tagged something, might allow for this kind of optional "quality" to be stored with a relation.