Archive:Development/Tutorials/Metadata/Nepomuk/RDFIntroduction (zh CN): Difference between revisions

Revision as of 06:01, 8 March 2010

Development/Tutorials/Metadata/Nepomuk/RDFIntroduction

Nepomuk的RDF和本体

Tutorial Series	[[../\|Nepomuk]]
Previous
What's Next	[[../ResourceGenerator\|Using the Nepomuk Resource Class Generator]],[[../DataLayout\|Nepomuk的数据布局]]
Further Reading	[[../Resources\|Resource Handling with Nepomuk]], [[../AdvancedQueries\|Advanced Queries with SPARQL]], Sebastian Trueg的Nepomuk博客

Nepomuk的资源描述框架（RDF）和本体（Ontologies）

这个指南基于 Sebastian Trueg 的博文 Nepomuk Appendix A - RDF for Dummies in a Nutshell。

在这儿讨论的所有本体随同 kdebase-runtime 一起安装，因此这些本体总是存在于 Nepomuk 数据仓库中，另外，他们的资源 URI 可以通过 Soprano::Vocabulary namespace (NIE 除外，它可以简单的使用 Soprano 的 onto2vocabularyclass 建立。)

RDF - 资源描述框架

RDF describes a way of storing data. While "classical" databases are based on tables RDF data consists on triples and only triples. Each triple, called statement consists of

subject - predicate - object

The subject is a resource, the predicate is a relation, and the object is either another resource or a literal value. A literal can be a string, integer, double, or any other type defined by XML Schema, and it is even possible to define custom literal types. Thus RDF can represent statements such as "Mary - is mother of - Carl", or "Mary - was born on - 1970-02-23". These are statements about things, hence RDF is a good technology for 元数据。

To reduce ambiguity, resources and relations need to be uniquely identified; for example, in statement above, to identify a particular "Mary", and also to distinguish the maternal relationship from "Baghdad - is mother of - all battles". Since RDF was born as a web technology all resources and relations are identified by a URI, Uniform Resource Identifier. (Hence they have a namespace often ending in a # and a name. Typically abbreviation such as foo:bar are used.) Thus, a dataset in RDF is basically a graph where resources are the nodes, predicates the links, and literals act as leaves.

RDF defines one important default property: rdf:type which allows to assign a type to a resource.

RDFS - RDF Schema

RDFS defines a set of resources and properties extending RDF. This extension basically allows to define ontologies. RDFS defines the two important classes rdfs:Resource and rdfs:Class which introduces the distinction between instances and types, as well as properties to define type hierarchies: rdfs:subClassOf and rdfs:subPropertyOf, and rdfs:domain and rdfs:range to specify details when defining properties.

This allows to create new classes and properties much like in object oriented programming. For example:

@PREFIX foo: <http://foo.bar/types#>

foo:Human rdf:type rdfs:Class .
foo:Woman rdf:type rdfs:Class .
foo:Woman rdfs:subClassOf foo:Human .

foo:isMotherOf rdf:type rdf:Property .
foo:isMotherOf rdfs:domain foo:Woman .
foo:isMotherOf rdfs:range foo:Human .

foo:Mary rdf:type foo:Woman .
foo:Mary foo:isMotherOf foo:Carl .

A simple example of how to define an ontology in RDFS (using the Turtle language). The last two important predicates in RDFS are rdfs:label and rdfs:comment which define human readable names and comments for any resource.

NRL：Nepomuk展示语言（Nepomuk Representation Language）

NRL was developed in Nepomuk to further extend on RDFS. I will not go into detail and explain everything about NRL but keep to what is important with respect to KDE at the moment.

Most importantly NRL changes triples to quadruples where the fourth "parameter" is another resource defining the graph in which the statement is stored (may be empty which means to store in the "default graph"). This graph (or context as it is called in Soprano) is just another resource which groups a set of statements and allows to "attach" information to this set. NRL defines a set of graph types of which two are important here: nrl:InstanceBase and nrl:Ontology. The first one defines graphs that contain instances and the second one, well you guessed it, defines graphs that contain types and predicates.

To make this clearer let's extend our example with NRL stuff:

@PREFIX foo: <http://foo.bar/types#>

foo:graph1 rdf:type nrl:Ontology .
foo:graph2 rdf:type nrl:InstanceBase .

foo:Human rdf:type rdfs:Class foo:graph1.
foo:Woman rdf:type rdfs:Class foo:graph1.
foo:Woman rdfs:subClassOf foo:Human foo:graph1 .

foo:isMotherOf rdf:type rdf:Property foo:graph1 .
foo:isMotherOf rdfs:domain foo:Woman foo:graph1 .
foo:isMotherOf rdfs:range foo:Human foo:graph1 .

foo:Mary rdf:type foo:Woman foo:graph2 .
foo:Mary foo:isMotherOf foo:Carl foo:graph2 .

But making a distinction between ontology and instance resources is not all we gain from contexts.

NAO：Nepomuk标签本体（Nepomuk Annotation Ontology）

NAO already defines resource types and properties you already encountered in KDE: nao:Tag or nao:rating. But it also defines nao:created which is a property that assigns an xls:dateTime literal to a resource, in our case a graph. This way we store information about when a piece of information was inserted into the Nepomuk repository.

foo:graph1 nao:created "2008-02-12T14:43.022Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .

NIE本体（Nepomuk Information Element）

The NIE ontologies describe desktop resources like files, folders, emails, contacts, IM messages, and so on. It is used by file indexing systems like Strigi or Tracker to describe the extracted metadata.

NFO - The Nepomuk File Ontology 描述文件元数据。
NCO - The Nepomuk Contact Ontology 描述地址本条目。
NMO - The Nepomuk Message Ontology describes messages of all kind including emails and IM
NCAL - The Nepomuk Calendar Ontology describes calendar entries
NEXIF - The Nepomuk Exif Ontology describes image metadata
NID3 - The Nepomuk ID3 Ontology describes audio metadata

Xesam - 桌面文件的元数据本体因NIE而废除

Xesam is an ontology that has been developed in regards to desktop file indexing tools such as Strigi. It tries to define classes/types and properties for most of the metadata that occurs in files on the desktop. Simple examples include id3 tags or image size or even email data such as sender or recipient. File Metadata indexed by Strigi on the KDE desktop is stored in the Nepomuk repository using Xesam classes and properties.

SPARQL - RDF查询语言

SPARQL is what we use to query the RDF repository. Its syntax has been designed close to SQL but since it is quite young it is by far not as powerful yet.

Anyway, this is how a simple query that retrieves the mother of Carl looks like:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix foo: <http://foo.bar/types#>

select ?r where { ?r foo:isMotherOf foo:Carl . }

或者如果我们把NRL算进来：

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix foo: <http://foo.bar/types#>
prefix nrl: <http://semanticdesktop.org/ontologies/2007/08/15/nrl#>

select ?r where { graph ?g { ?r foo:isMotherOf foo:Carl . } . ?g rdf:type nrl:InstanceBase . }

A very valuable piece of documentation is the SPARQL quick reference.

其他/定制本体

The ontologies mentioned here form the basis of the data in Nepomuk bu they cannot describe every aspect necessary. If you want to store your own data in Nepomuk and link it with other information it is recommended to follow the following process:

Check if existing standard ontologies provide the classes and properties you need (or some of them). Many, including NRL and NAO, reside at http://www.semanticdesktop.org/ontologies/.
If not, contact the Oscaf project with what you need to get help with the discussions and development
If that does not help either, start your own ontology and if possible propose it as a standard with Oscaf.

@@ Line 11: / Line 11: @@
 }}
-==Nepomuk的RDF和本体==
+==Nepomuk的资源描述框架（RDF）和本体（Ontologies）==
-<i>这个指南基于Sebastian Trueg 的博文 [http://www.kdedevelopers.org/node/3276 Nepomuk Appendix A - RDF for Dummies in a Nutshell]。</i>
+<i>这个指南基于 Sebastian Trueg 的博文 [http://www.kdedevelopers.org/node/3276 Nepomuk Appendix A - RDF for Dummies in a Nutshell]。</i>
-<i>在这儿讨论的所有本体随同kdebase-runtime 一起安装，因此这些本体总是存在于Nepomuk数据仓库中，另外，他们的资源URI可以通过 [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary namespace] (NIE除外，它可以简单的使用Soprano的  [http://soprano.sourceforge.net/apidox/stable/soprano_devel_tools.html onto2vocabularyclass]建立。)</i>
+<i>在这儿讨论的所有本体随同 kdebase-runtime 一起安装，因此这些本体总是存在于 Nepomuk 数据仓库中，另外，他们的资源 URI 可以通过 [http://api.kde.org/kdesupport-api/kdesupport-apidocs/soprano/html/namespaceSoprano_1_1Vocabulary.html Soprano::Vocabulary namespace] (NIE 除外，它可以简单的使用 Soprano 的  [http://soprano.sourceforge.net/apidox/stable/soprano_devel_tools.html onto2vocabularyclass] 建立。)</i>
 ===[http://www.w3.org/TR/rdf-primer/ RDF - 资源描述框架]===