Development/Tutorials/QtDOM Tutorial: Difference between revisions

From KDE TechBase
(Different types of nodes in XML)
Line 28: Line 28:


[[Image:QtDOM_TreeStructure.png]]
[[Image:QtDOM_TreeStructure.png]]
As you can see, there are different types of nodes:
* elements: they are of the form <tagname>...</tagname>
* attributes: they are attributes inside element tags: <tagname attribute=value>
* text nodes: the text content of element tags, the text between <tagname> and </tagname>
* processing instructions: they tell the xml parser / transformer / viewer how to interpret something, their form is <? instruction attribute=value ?>
* comments: &lt;!-- comment -->
* document type: specifies the type of the document (e.g. html 4 transitional), its form is <!DOCTYPE name >


Notice that we did use the same tag <name> inside the <holidayset> and inside the <holiday> tags. We used quite generic names for the tags, which might become a problem with complexer structure, when we want to use the same name for different purposes. For this reason, XML also defines namespaces to allow for the same name (but from a different namespace, so they are actually different names) used in different context. In a later section we will look at these [[#Introduction to XML Namespaces|namespaces]].
Notice that we did use the same tag <name> inside the <holidayset> and inside the <holiday> tags. We used quite generic names for the tags, which might become a problem with complexer structure, when we want to use the same name for different purposes. For this reason, XML also defines namespaces to allow for the same name (but from a different namespace, so they are actually different names) used in different context. In a later section we will look at these [[#Introduction to XML Namespaces|namespaces]].

Revision as of 21:36, 17 January 2007

Short introduction to XML

XML is a general structured format to store and exchange hierarchical data.

If you know HTML, you'll find XML quite similar (in fact, after some small modifications, a HTML file is a valid XML file): XML uses nested tags of the form <tagname>...</tagname> for tags with contents and <tagname/> for tags without content. Each tag can contain other tags, and the tag itself can have attributes of the form <tagname attribute=value>...</tagname>.

The name of the tags is not restricted (unlike HTML, which only defines a given set of proper HTML tags), so you can choose whatever name fits your needs.

As an example, let us assume that you want to store holiday information into a file and use Qt to load or modify it. To get a feeling for how XML looks like, here is one possible format for such a holiday file: <?xml version='1.0' encoding='UTF-8'?> <holidayset country="at">

 <name>Holidays for Austria</name>
 <holiday>
   <name>New Year's Day</name>
   <date>2007-01-01</date>
 </holiday>
 <holiday>
   <name>Christmas</name>
   <date>2007-12-24</date>
 </holiday>

</holidayset> This file defines a holiday set for Austria (notice the country="at" attribute to the holidayset tag). The holiday set, enclosed in <holidayset>...</holidayset> contains two holidays, each enclosed with <holiday>...</holiday>. Each of these holiday elements contains the settings for that holiday enclosed in appropriately named tag.

Such an XML file can be represented as a tree structure, with the XML document being the root of the tree, and each subelement/attribute/text value is a child of it's enclosing XML element. The tree structure corresponding to the holiday file above looks like the following:

As you can see, there are different types of nodes:

  • elements: they are of the form <tagname>...</tagname>
  • attributes: they are attributes inside element tags: <tagname attribute=value>
  • text nodes: the text content of element tags, the text between <tagname> and </tagname>
  • processing instructions: they tell the xml parser / transformer / viewer how to interpret something, their form is <? instruction attribute=value ?>
  • comments: <!-- comment -->
  • document type: specifies the type of the document (e.g. html 4 transitional), its form is <!DOCTYPE name >


Notice that we did use the same tag <name> inside the <holidayset> and inside the <holiday> tags. We used quite generic names for the tags, which might become a problem with complexer structure, when we want to use the same name for different purposes. For this reason, XML also defines namespaces to allow for the same name (but from a different namespace, so they are actually different names) used in different context. In a later section we will look at these namespaces.

Also note that we implicitly used specially formated (ISO-formatted) contents for the date tags, without yet specifying it. Of course we could give any other value, say <date>I hope never</date>, and it would still be a valid XML file, but the parser will not be able to interpret the value as a date. Add such constraints and specific formats / values / value ranges for elements is possible using either a [[wikipedia>DTD]] or an XML Schema. If you have such a definition, a validating parser can check whether a given XML file really adheres to the document structure defined in that schema. Unfortunately, the Qt XML/DOM classes are not validating parsers, so you cannot validate XML documents against a given schema with Qt.


We will use the example from above throughout this tutorial. In our application, we want to store the holiday set in the following class:

class Holiday { public:

 Holiday() {}
 ~Holiday() {}
 QDate mDate;
 QString mName;

};

class HolidaySet { public:

 HolidaySet( const QString &c ) : mCountry( c ) {}
 ~HolidaySet() {}
 QString mCountry, mName;
 QList<Holiday> mHolidays;

};

In production code, you would not make the member variables public and directly access them, but rather add accessors and setter functions: QDate date() { return mDate; } void setDate( const QDate &date ) { mDate = date; } To save space, I decided to neglect that rule of thumb here in this example. As this is a tutorial for XML and Qt DOM, I want to concentrate on the basics of Qt DOM and not on a good general programming style.

As there are only so many sensible names, sooner or later you will find out that you will use the same tagname or attribute name for different cases with different meanings. That is the point where namespaces come in.

Creating a simple XML file with Qt DOM

Let us first look at how to use the Qt classes to generate the XML for the holiday file from the HolidaySet class that you have in memory. For this purpose, Qt offers the classes QDomDocument to represent the whole document and QDomNode and QDomElement to represent each individual tag and attribute.

To understand the code below, one has to be aware that DOM is actually a well-defined API to work with and modify XML documents. That is also the reason why the code above, in particular the addElement method, is not as beautiful as usual Qt-using code is. Instead, the code will be more or less identical in whatever programming language you use.

The XML document is described by a on object of the class QDomDocument with methods to create new elements. The general flow of building up a DOM tree is as follows:

  1. Create the DOM document
  2. For each element of the dom tree:
    1. Create the element using the methods from QDomDocument. The element does not yet have any position within the DOM tree.
    2. Insert the element into its parent node.
    3. If the element should have contents, set the contents, set the attributes, etc.


As a line of code says more then a thousand words, let us look at some sample code to generate the DOM tree from the HolidaySet class:

/* Helper function to generate a DOM Element for the given DOM document

  and append it to the children of the given node. */

QDomElement addElement( QDomDocument &doc, QDomNode &node,

                       const QString &tag, 
                       const QString &value = QString::null )

{

 QDomElement el = doc.createElement( tag );
 node.appendChild( el );
 if ( !value.isNull() ) {
   QDomText txt = doc.createTextNode( value );
   el.appendChild( txt );
 }
 return el;

}


QString holidaySetToXML( const HolidaySet &hs ) {

 QDomDocument doc;
 // generate the <holidayset> tag as the root tag, add the country 
 // attribute if needed
 QDomElement holidaySetElement = addElement( doc, doc, "holidayset" );
 if ( !hs.mCountry.isEmpty() ) 
   holidaySetElement.setAttribute( "country", hs.mCountry );
 
 // Add the <name> and <comment> elements to the holidayset
 if ( !hs.mName.isEmpty() ) 
   addElement( doc, holidaySetElement, "name", hs.mName );
 // Add each holiday as a <holiday>..</holiday> element
 QList<Holiday>::iterator i;
 for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
    QDomElement h = addElement( doc, holidaySetElement, "holiday" );
    addElement( doc, h, "name", (*i).mName );
    addElement( doc, h, "date", (*i).mDate.toString( Qt::ISODate ) );
 }
 return doc.toString();

}

One thing to notice is that all DOM nodes are passed by value (since some programming languages do not define pointers, the DOM API cannot use any pointer-based functionality!).

Let us now slowly step through the code:

  • As you can see, the whole step 2) is done with the addElement helper function (lines 3-14) using the defined DOM methods. The addElement function needs the DOM document to create the new element, it needs the parent node to insert the new element, and it needs the tag name and possible value for the new element. Let's assume the function was called as
    addElement( doc, node, "tag", "contents")
    .
    • Line 7: To create the new element, we call the QDomDocument::addElement method. This creates the <tag> tag, without any contents or attributes. The new tag is also not yet positioned anywhere in the DOM tree.
    • Line 8: To insert the newly created tag as the child of an already existing node, simply call QDomNode::appendChild with the new tag as argument.
    • Lines 9-12: In the DOM representation, the contents of a tag (i.e. the text between <tag> and </tag> in the XML) is represented as a DOM object of type text and is a child of the enclosing tag. For this reason, setting the contents of a tag means creating a text node (like in line 7, only that we do not create an element node, but a text node) and inserting it into the element that we created above. Now we have an element <tag>contents</tag> in XML.
    • Line 13: As we will need the new node to set attributes or insert children, we return it.
  • The holidaySetToXML method (lines 17-40) does the actual conversion of our holiday set to a DOM tree.
    • First it creates an empty DOM document (line 19)
    • The root element (<holidayset>...</holidayset>) is then created using our addElement helper function (line 23). Here the DOM document and the parent are the same (doc). The holidaySetElement is still empty, i.e. we now have an XML repesentation
<holidayset/>
    • which is the same as <holidayset></holidayset>
    • To set an attribute (country="at") for the DOM node, we call QDomElement::setAttribute( key, value ) (line 25). Our document now has the XML representation
      <holidayset country="at"/>
    • To populate the holidayset with the <holiday>...</holiday> entries, we create a new element (line 34) for each of the holidays in a loop. The parent of all these elements is the holiday set element. After line 34 we have an XML

<holidayset country="at"> <holiday/> </holidayset>

    • As you can imagine, to set the <name> and <date> for each holiday, all we have to do is to call addElement with the <holiday> tag as parent. (lines 35/36)


You can now create the XML file contents simply via

// Create the data structure HolidaySet hs("at"); hs.mName="Holidays for Austria"; Holiday h;

h.mDate = QDate( 2007, 01, 01 ); h.mName = QString( "New Year" ); hs.mHolidays.append( h );

h.mDate = QDate( 2006, 12, 24 ); h.mName = QString( "Christmas" ); hs.mHolidays.append( h );

// convert to the XML string QString output = holidaySetToXML( hs ); // output that XML string qDebug()<<output;

Loading a simple XML file using Qt DOM

Let us now look at loading an XML file into memory and parsing it into our HolidaySet memory structure. There are two different strategies for loading XML documents:

  • A SAX parser (Simple API for XML) walks through the XML file sequentially, calling methods like startTag and endTag whenever an opening or closing tag is encountered. There is no hierarchy involved yet (which you can still introduce when building your memory structures in the startTag/endTag methods), but the advantage is that there is no need to keep the whole XML document in memory.
  • DOM (Document Object Model) on the other hand, loads the whole document into memory, splitting it into different nodes and building a hierarchical tree. The advantage is that you do not need to build the hierarchy yourself, while on the other hand the whole document needs to be in memory. For huge documents this can be a real problem, but for our rather small holiday files, we will use DOM.

From the description above it is clear that SAX can only be used to load an XML file, while DOM can also be used to build up or modify existing XML files. In fact, we already did exactly that in the previous chapter where we created the holiday file.


HolidaySet parseXMLwithDOM( QDomDocument &domTree ) {

 HolidaySet hs( QString::null );
 
 QDomElement set = domTree.namedItem("holidayset").toElement();
 if ( set.isNull() ) {
   qWarning() << "No <holidayset> element found at the top-level "
              << "of the XML file!";
   return hs; // no holiday set found
 }
 
 if ( set.hasAttribute("country") ) {
   hs.mCountry = set.attribute("country");
 }
 
 // Way 1: Explicitly search for a given element:
 QDomElement name = set.namedItem("name").toElement();
 if ( !name.isNull() ) { // We have a <name>..</name> element in the set
   hs.mName = name.text();
 }
 // Way 2: Loop through all child nodes with a given tag name.
 QDomElement n = set.firstChildElement( "holiday" );
 for ( ; !n.isNull(); n = n.nextSiblingElement( "holiday" ) ) {
   Holiday h;
   QDomElement e = n.toElement();
   QDomElement v = e.namedItem("name").toElement();
   if ( !v.isNull() ) h.mName = v.text();
   v = e.namedItem("date").toElement();
   if ( !v.isNull() ) {
     h.mDate = QDate::fromString( v.text(), Qt::ISODate );
   }
   hs.mHolidays.append( h );
 }
 // Way 3: Loop through all child nodes and check if it is an element 
 //        with one of the wanted tagnames
 QDomNode nd = set.firstChild();
 for ( ; !nd.isNull(); nd = nd.nextSibling() ) {
   if ( nd.isElement() && nd.toElement().tagName() == "holiday" ) {
     QDomElement n = nd.toElement();
     // Same code as above...
   }
 }
 return hs;

}

Introduction to XML Namespaces

<?xml version='1.0' encoding='UTF-8'?> <h:holidays xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at">

 <h:holiday>
   <h:name>New Year's Day</h:name>
   <h:date>2007-01-01</h:date>
 </h:holiday>
 <h:holiday>
   <h:name>Christmas</h:name>
   <h:date>2007-12-24</h:date>
 </h:holiday>

</h:holidays>

Generating XML documents with namespaces using Qt

Loading XML documents with namespaces using Qt

(Copyright David Faure, put under the GPL) KoXmlElement KoDom::namedItemNS( const KoXmlNode& node, const char* nsURI, const char* localName ) {

   KoXmlNode n = node.firstChild();
   for ( ; !n.isNull(); n = n.nextSibling() ) {
       if ( n.isElement() && n.localName() == localName && n.namespaceURI() == nsURI )
           return n.toElement();
   }
   return KoXmlElement();

}

From KODom.h (by dfaure):

/**

* This namespace contains a few convenience functions to simplify code using QDom
* (when loading OASIS documents, in particular).
*
* To find the child element with a given name, use KoDom::namedItemNS.
*
* To find all child elements with a given name, use
* QDomElement e;
* forEachElement( e, parent )
* {
*     if ( e.localName() == "..." && e.namespaceURI() == KoXmlNS::... )
*     {
*         ...
*     }
* }
* Note that this means you don't ever need to use QDomNode nor toElement anymore!
* Also note that localName is the part without the prefix, this is the whole point
* of namespace-aware methods.
*
* To find the attribute with a given name, use QDomElement::attributeNS.
*
* Do not use getElementsByTagNameNS, it's recursive (which is never needed in KOffice).
* Do not use tagName() or nodeName() or prefix(), since the prefix isn't fixed.
*
* @author David Faure <[email protected]>
*/

Loading XML documents using Qt and the SAX parser

Initial Author: Reinhold Kainhofer