Development/Tutorials/QtDOM Tutorial: Difference between revisions

From KDE TechBase
(wikipedia links to DTD, XML Schema)
m (Text replace - "</code>" to "</syntaxhighlight>")
 
(9 intermediate revisions by 3 users not shown)
Line 9: Line 9:


As an example, let us assume that you want to store holiday information into a file and use Qt to load or modify it. To get a feeling for how XML looks like, here is one possible format for such a holiday file:
As an example, let us assume that you want to store holiday information into a file and use Qt to load or modify it. To get a feeling for how XML looks like, here is one possible format for such a holiday file:
<code xml>
<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<?xml version='1.0' encoding='UTF-8'?>
<holidayset country="at">
<holidayset country="at">
Line 22: Line 22:
   </holiday>
   </holiday>
</holidayset>
</holidayset>
</code>
</syntaxhighlight>
This file defines a holiday set for Austria (notice the country="at" attribute to the holidayset tag). The holiday set, enclosed in <holidayset>...</holidayset> contains two holidays, each enclosed with <holiday>...</holiday>. Each of these holiday elements contains the settings for that holiday enclosed in appropriately named tag.
This file defines a holiday set for Austria (notice the country="at" attribute to the holidayset tag). The holiday set, enclosed in <holidayset>...</holidayset> contains two holidays, each enclosed with <holiday>...</holiday>. Each of these holiday elements contains the settings for that holiday enclosed in appropriately named tag.


Line 46: Line 46:
We will use the example from above throughout this tutorial. In our application, we want to store the holiday set in the following class:
We will use the example from above throughout this tutorial. In our application, we want to store the holiday set in the following class:


<code cppqt>
<syntaxhighlight lang="cpp-qt">
class Holiday
class Holiday
{
{
Line 66: Line 66:
   QList<Holiday> mHolidays;
   QList<Holiday> mHolidays;
};
};
</code>
</syntaxhighlight>


In production code, you would not make the member variables public and directly access them, but rather add accessors and setter functions:
In production code, you would not make the member variables public and directly access them, but rather add accessors and setter functions:
<code cppqt>
<syntaxhighlight lang="cpp-qt">
QDate date() { return mDate; }
QDate date() { return mDate; }
void setDate( const QDate &date ) { mDate = date; }
void setDate( const QDate &date ) { mDate = date; }
</code>
</syntaxhighlight>
To save space, I decided to neglect that rule of thumb here in this example. As this is a tutorial for XML and Qt DOM, I want to concentrate on the basics of Qt DOM and not on a good general programming style.
To save space, I decided to neglect that rule of thumb here in this example. As this is a tutorial for XML and Qt DOM, I want to concentrate on the basics of Qt DOM and not on a good general programming style.


Line 93: Line 93:
As a line of code says more then a thousand words, let us look at some sample code to generate the DOM tree from the HolidaySet class:
As a line of code says more then a thousand words, let us look at some sample code to generate the DOM tree from the HolidaySet class:


<code cppqt n>
<syntaxhighlight lang="cpp-qt" line>


/* Helper function to generate a DOM Element for the given DOM document  
/* Helper function to generate a DOM Element for the given DOM document  
Line 137: Line 137:
   return doc.toString();
   return doc.toString();
}
}
</code>
</syntaxhighlight>


One thing to notice is that all DOM nodes are passed by value (since some programming languages do not define pointers, the DOM API cannot use any pointer-based functionality!).
One thing to notice is that all DOM nodes are passed by value (since some programming languages do not define pointers, the DOM API cannot use any pointer-based functionality!). The Qt implementation of DOM uses explicit sharing (as explained in the API documentation of {{qt|QDomNode}}), so that whenever you change the QDomNode (or of course any derived class object), all copies are changed (since they point to the same data in memory). To create an independent copy of a given QDomNode, there is the method {{qt|QDomNode::cloneNode}}(bool recursive). The bool argument to cloneNode indicates whether the children of the node shall also be copied independently or whether they shall still be explicitly shared with the original node.


Let us now slowly step through the code:
Let us now slowly step through the code:
Line 165: Line 165:
You can now create the XML file contents simply via
You can now create the XML file contents simply via


<code cppqt>
<syntaxhighlight lang="cpp-qt">
// Create the data structure
// Create the data structure
HolidaySet hs("at");
HolidaySet hs("at");
Line 183: Line 183:
// output that XML string
// output that XML string
qDebug()<<output;
qDebug()<<output;
</code>
</syntaxhighlight>


== Loading a simple XML file using Qt DOM ==
== Loading a simple XML file using Qt DOM ==
Line 197: Line 197:
The usual way to parse an XML document into a DOM tree is to use the method QDomDocument::setContent. If this method is successful, the QDomDocument object contains the DOM tree in the usual DOM structure. If an error occurs, the error message and the exact position of the error is stored into the parameters of the call:
The usual way to parse an XML document into a DOM tree is to use the method QDomDocument::setContent. If this method is successful, the QDomDocument object contains the DOM tree in the usual DOM structure. If an error occurs, the error message and the exact position of the error is stored into the parameters of the call:


<code cppqt n>
<syntaxhighlight lang="cpp-qt" line>
QFile f( argv[1] );
QFile f( argv[1] );
QDomDocument doc;
QDomDocument doc;
Line 205: Line 205:
   result = parseXMLwithDOM( doc );
   result = parseXMLwithDOM( doc );
}
}
</code>
</syntaxhighlight>


The resulting QDomDocument now contains the whole DOM tree, similar to the one that we created in the previous section. The DOM tree represents the hierarchical structure as shown in the image above. To obtain an element with a given tag name, one can use the QDomNode::namedItem( "tagname" ) method of the parent node object, which returns a QDomNode (this class is the base type to describe any of the DOM nodes). As we are interested in the element, we have to convert it to a DOM element by the toElement() method. You might find this to be quite awkward to use, in particular since C++ usually offers better ways to work with derived classes. However, the DOM API was designed to work with various different languages, so typical C++-isms cannot be part of the DOM API.
The resulting QDomDocument now contains the whole DOM tree, similar to the one that we created in the previous section. The DOM tree represents the hierarchical structure as shown in the image above. To obtain an element with a given tag name, one can use the QDomNode::namedItem( "tagname" ) method of the parent node object, which returns a QDomNode (this class is the base type to describe any of the DOM nodes). As we are interested in the element, we have to convert it to a DOM element by the toElement() method. You might find this to be quite awkward to use, in particular since C++ usually offers better ways to work with derived classes. However, the DOM API was designed to work with various different languages, so typical C++-isms cannot be part of the DOM API.


To extract the holiday data from the DOM tree, we write the function parseXMLwithDOM:
To extract the holiday data from the DOM tree, we write the function parseXMLwithDOM:
<code cppqt>
<syntaxhighlight lang="cpp-qt">
HolidaySet parseXMLwithDOM( QDomDocument &domTree )  
HolidaySet parseXMLwithDOM( QDomDocument &domTree )  
{
{
   HolidaySet hs( QString::null );
   HolidaySet hs( QString::null );
</code>
</syntaxhighlight>


We first obtain the DOM element that represents the <holidayset>...</holidayset> element using the namedItem and toElement methods. If no child node named "holidayset" exists or it is not an element (e.g. because it is a processing instruction or an attribute), a null element is returned:
We first obtain the DOM element that represents the <holidayset>...</holidayset> element using the namedItem and toElement methods. If no child node named "holidayset" exists or it is not an element (e.g. because it is a processing instruction or an attribute), a null element is returned:
<code cppqt>
<syntaxhighlight lang="cpp-qt">
   QDomElement set = domTree.namedItem("holidayset").toElement();
   QDomElement set = domTree.namedItem("holidayset").toElement();
   if ( set.isNull() ) {
   if ( set.isNull() ) {
Line 224: Line 224:
     return hs; // no holiday set found
     return hs; // no holiday set found
   }
   }
</code>
</syntaxhighlight>


This element possibly has an attribute named "country". This can be checked using the QDomNode::hasAttribute method and the attribute can be obtained using the QDomNode::attribute method:
This element possibly has an attribute named "country". This can be checked using the QDomNode::hasAttribute method and the attribute can be obtained using the QDomNode::attribute method:


<code cppqt>
<syntaxhighlight lang="cpp-qt">
if ( set.hasAttribute("country") ) {
if ( set.hasAttribute("country") ) {
   hs.mCountry = set.attribute("country");
   hs.mCountry = set.attribute("country");
}
}
</code>
</syntaxhighlight>


We can also have a <name>...</name> child element, which can be obtained similar to the holidayset element. We retrieve the text of the element between the enclosing tags by a call to QDomElement::text():
We can also have a <name>...</name> child element, which can be obtained similar to the holidayset element. We retrieve the text of the element between the enclosing tags by a call to QDomElement::text():
<code cppqt>
<syntaxhighlight lang="cpp-qt">
// Search for a given element (only the first matching is returned):
// Search for a given element (only the first matching is returned):
QDomElement name = set.namedItem("name").toElement();
QDomElement name = set.namedItem("name").toElement();
Line 241: Line 241:
   hs.mName = name.text();
   hs.mName = name.text();
}
}
</code>
</syntaxhighlight>


The namedItem method will always return the first child that matches the given name. While this might be fine for the <name> child element, we cannot use this for the <holiday>...</holiday> elements, which will appear more than once. For this reason, there are iterator-like calls firstChild() and and nextSibling() to walk through all child nodes and firstChildElement("tagname") and nextSiblingElement("tagname") to walk only through all child elements with the given tag name. If no further child can be found, a null node is returned:
The namedItem method will always return the first child that matches the given name. While this might be fine for the <name> child element, we cannot use this for the <holiday>...</holiday> elements, which will appear more than once. For this reason, there are iterator-like calls firstChild() and and nextSibling() to walk through all child nodes and firstChildElement("tagname") and nextSiblingElement("tagname") to walk only through all child elements with the given tag name. If no further child can be found, a null node is returned:


<code cppqt>
<syntaxhighlight lang="cpp-qt">
// Way 1: Loop through all child nodes with a given tag name.
// Way 1: Loop through all child nodes with a given tag name.
QDomElement e = set.firstChildElement( "holiday" );
QDomElement e = set.firstChildElement( "holiday" );
Line 268: Line 268:
   }
   }
}
}
</code>
</syntaxhighlight>


The first method is of course the better and simpler choice in our example. However, if the <holidayset> element can have various different child elements, it's often easier and faster to loop through all children only once and condition the code on the name of the tag, which is done in the second example.
The first method is of course the better and simpler choice in our example. However, if the <holidayset> element can have various different child elements, it's often easier and faster to loop through all children only once and condition the code on the name of the tag, which is done in the second example.
Line 276: Line 276:
Now that we have the QDomElement of the <holiday>...</holiday> element, we can easily load its contents into the Holiday structure, using only methods that we have already seen:
Now that we have the QDomElement of the <holiday>...</holiday> element, we can easily load its contents into the Holiday structure, using only methods that we have already seen:


<code cppqt>
<syntaxhighlight lang="cpp-qt">
Holiday h;
Holiday h;
QDomElement v = e.namedItem("name").toElement();
QDomElement v = e.namedItem("name").toElement();
Line 285: Line 285:
}
}
hs.mHolidays.append( h );
hs.mHolidays.append( h );
</code>
</syntaxhighlight>


This concludes our method for loading the holiday set data from the DOM tree representation into the HolidaySet data structure. The whole parseXMLwithDOM function thus reads:
This concludes our method for loading the holiday set data from the DOM tree representation into the HolidaySet data structure. The whole parseXMLwithDOM function thus reads:


<code cppqt n>
<syntaxhighlight lang="cpp-qt" line>
HolidaySet parseXMLwithDOM( QDomDocument &domTree )  
HolidaySet parseXMLwithDOM( QDomDocument &domTree )  
{
{
Line 337: Line 337:
   return hs;
   return hs;
}
}
</code>
</syntaxhighlight>


== Introduction to XML Namespaces ==
== Introduction to XML Namespaces ==


{{improve|This tutorial is not yet finished. In particular, the sections on namespaces are still missing and the SAX example needs to be dbetter described}}
If you are working with complex XML files, sooner or later you will realize that you are using tags and attributes that can be categorized according to their meaning. Even worse, you might encounter tags with the same named used for completely different things. For example, if you are managing a large document describing books and their authors, you might use a <title> element to denote the title of the book, as well as the job title of the author.


So, clearly there should be a way to distinguish these two elements with the same name, but different usages. This can be done via XML namespaces, using prefixes to the elements. A namespace is a unique URI (not necessarily a URL!), e.g. "http://reinhold.kainhofer.com/ns/qtdom-examples" or "urn:kde:developer:tutorials:QtDom:holidays". Rathern than using these long and awkward URIs with each element, you have to define a prefix for each of the namespaces. This is done via an
<syntaxhighlight lang="xml">
  xmlns:yourprefix="your:namespace:uri"
</syntaxhighlight>
attribute to the element or any of its parents. This defines the "yourprefix" prefix as a shortcut to the full namespace URI "your:namespace:uri". Usually, the namespace declaration is added to the root element of hte XML file. To use an element from this namespace, you can simply prepend the prefix to the tag name: <yourprefix:tag>. This tag name together with the prefix is called the "qualified name" (or in short qName) or the element. The various parts of the element (which you will find in the Qt DOM API, too), are:


{{tip|See also the Qt documentation on namespaces (available at the {{qt|QtXML}} page) for a nice example of an xml file where namespaces are useful.}}
{|
|-
| yourprefix:tag    || qualified name (qName)
|-
| yourprefix        || prefix               
|-
| tag                || local name
|-
| your:namespace:uri || Namespace URI
|}


(As a side note: If an element does not use any namespaces and thus also no prefix, e.g. <tag>, all these parts of the qName are empty and only the "tag name" is set to "tag".)


<code xml>
Using this new knowledge about namespaces, of course, we want to generate and process holiday files with proper namespaces. For example, we could define all the tags to be in the namespace "urn:kde:developer:tutorials:QtDom:holidays", indicated by the prefix "h". A typical XML file would then look like:
 
<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<?xml version='1.0' encoding='UTF-8'?>
<h:holidays xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at">
<h:holidays xmlns:h="urn:kde:developer:tutorials:QtDom:holidays"
            h:country="at">
   <h:holiday>
   <h:holiday>
     <h:name>New Year's Day</h:name>
     <h:name>New Year's Day</h:name>
Line 359: Line 377:
   </h:holiday>
   </h:holiday>
</h:holidays>
</h:holidays>
</code>
</syntaxhighlight>


== Generating XML documents with namespaces using Qt ==
== Generating XML documents with namespaces using Qt ==
So far, we created elements (without namespaces) using the QDomDocument:createElement(...) method. Qt, or rather DOM,  offers a simply way to generate elements with a namespace attached:
<syntaxhighlight lang="text">
  QDomDocument::createElementNS( const QString &nsURI, const QString &qName )
</syntaxhighlight>
To create an element with a namespace attached, the only difference to the example from the previous sections is that you need to use this method instead of QDomElement::createElement( const QString &tag ). The first parameter is the full URI of the namespace (e.g. "urn:kde:developer:tutorials:QtDom:holidays" in our example XML code), while the second argument is no longer the tag name along (like "holiday"), but rather the full QName, including the namespace prefix. So, to create the holiday element in the namespace with prefix "h", simply replace the call to
<syntaxhighlight lang="cpp-qt">
  doc.createElement( "holiday" );
</syntaxhighlight>
with
<syntaxhighlight lang="cpp-qt">
  doc.createElementNS( "urn:kde:developer:tutorials:QtDom:holidays", "h:holiday" );
</syntaxhighlight>
Again, we can write our own helper method "addElementNS":
<syntaxhighlight lang="cpp-qt">
/* Helper function to generate a DOM Element for the given DOM document
  and append it to the children of the given node. */
QDomElement addElementNS( QDomDocument &doc, QDomNode &node,
                          const QString &nsURI, const QString &qName,
                          const QString &value = QString::null )
{
  QDomElement el = doc.createElementNS( nsURI, qName );
  node.appendChild( el );
  if ( !value.isNull() ) {
    QDomText txt = doc.createTextNode( value );
    el.appendChild( txt );
  }
  return el;
}
</syntaxhighlight>
Qt will automatically keep track of the namespaces and their prefixes and insert the appropriate xmlns:h attribute to the right element. You don't have to add this attribute manually!
Similar to the createElementNS method, QDomDocument provides the method {{qt|QDomElement::setAttributeNS}} to add a namespaced attribute to a given DOM element.
The whole "holidaySetToXML" method thus becomes:
<syntaxhighlight lang="cpp-qt">
QString holidaySetToXML( const HolidaySet &hs )
{
  QDomDocument doc;
  QDomProcessingInstruction instr = doc.createProcessingInstruction(
                    "xml", "version='1.0' encoding='UTF-8'");
  doc.appendChild(instr);
 
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");
  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  QDomElement holidaySetElement = addElementNS( doc, doc, ns, "h:holidayset" );
  if ( !hs.mCountry.isEmpty() )
    holidaySetElement.setAttributeNS( ns, "h:country", hs.mCountry );
 
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() )
    addElementNS( doc, holidaySetElement, ns, "h:name", hs.mName );
  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
    QDomElement h = addElementNS( doc, holidaySetElement, ns, "h:holiday" );
    addElementNS( doc, h, ns, "h:name", (*i).mName );
    addElementNS( doc, h, ns, "h:date", (*i).mDate.toString( Qt::ISODate ) );
  }
  return doc.toString();
}
}
</syntaxhighlight>
=== Workaround for broken namespace handling in Qt 4.2.0 and before ===
Unfortunately, the correct way laid out above does not produce valid XML with Qt versions at least up to 4.2.0:
<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<h:holidayset xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at" xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" >
<h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">Holidays for Austria</h:name>
<h:holiday xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">
  <h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">New Year</h:name>
  <h:date xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">2007-01-01</h:date>
</h:holiday>
<h:holiday xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">
  <h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">Christmas</h:name>
  <h:date xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">2006-12-24</h:date>
</h:holiday>
</h:holidayset>
</syntaxhighlight>
Apparently, this XML seems more verbose than we actually want: Qt inserts the xmlns:h attribute into every element that uses the "h" namespace! Even worse, the <h:holidayset> element has this attribute set twice... This is a bug in Qt and has been fixed in the latest versions, but for older versions you'll have to resolve to a quite ugly workaround: Simply don't create the elements with createElementNS, but rather createElement (while still giving the whole qName, including the "h:" prefix!). Consequently, you now also have to create the xmlns:h attribute yourself.
<syntaxhighlight lang="cpp-qt">
QString holidaySetToXML( const HolidaySet &hs )
{
  QDomDocument doc;
  QDomProcessingInstruction instr = doc.createProcessingInstruction(
                    "xml", "version='1.0' encoding='UTF-8'");
  doc.appendChild(instr);
 
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");
  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  QDomElement holidaySetElement = addElement( doc, doc, "h:holidayset" );
  holidaySetElement.setAttribute( "xmlns:h", ns );
  if ( !hs.mCountry.isEmpty() )
    holidaySetElement.setAttribute( "h:country", hs.mCountry );
 
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() )
    addElement( doc, holidaySetElement, "h:name", hs.mName );
  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
    QDomElement h = addElement( doc, holidaySetElement, "h:holiday" );
    addElement( doc, h, "h:name", (*i).mName );
    addElement( doc, h, "h:date", (*i).mDate.toString( Qt::ISODate ) );
  }
  return doc.toString();
}
</syntaxhighlight>
The output now becomes the correct XML code:
<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<h:holidayset xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at" >
<h:name>Holidays for Austria</h:name>
<h:holiday>
  <h:name>New Year</h:name>
  <h:date>2007-01-01</h:date>
</h:holiday>
<h:holiday>
  <h:name>Christmas</h:name>
  <h:date>2006-12-24</h:date>
</h:holiday>
</h:holidayset>
</syntaxhighlight>
== Generating XML documents with namespaces using brute-force ==
Of course, noone forces you to use Qt's DOM classes to generate the XML code. After all, the resulting XML is only a simple text! The most straight-forward approach would thus be to directly generate the text that contains the XML.
<syntaxhighlight lang="cpp-qt">
QString holidaySetToXML( const HolidaySet &hs )
{
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");
  QString result("<?xml version='1.0' encoding='UTF-8'?>\n");
 
  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  result += "<h:holidayset xmlns:h=\"" + ns + "\" ";
  if ( !hs.mCountry.isEmpty() )
    result += "h:country=\"" + hs.mCountry + "\" ";
  result += ">\n";
 
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() )
    result += " <h:name>" + hs.mName + "</h:name>\n";
  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
    result += " <h:holiday>\n";
    result += "  <h:name>" + (*i).mName + "</h:name>\n";
    result += "  <h:date>" + (*i).mDate.toString( Qt::ISODate ) + "</h:date>\n";
    result += " </h:holiday>\n";
  }
 
  // Finally, close the embracing <h:holidayset> element
  result += "</h:holidayset>\n";
  return result;
}
</syntaxhighlight>
Of course, this approach works fine if your only goal is to create the XML for outputting it to a file or piping it into another application, library or web service.
==== All is not so well, though! ====
However, as you create the text directly, there is no easy way to modify the XML except for parsing it again into a DOM tree or a similar tree representation.
Another drawback is that there are no checks whether the generated XML is valid at all. You can imagine how easy it is to forget about escaping a "<" in the holiday name (which the DOM classes will automatically do for you) or to misspell a tag or miss a closing tag altogether. The DOM classes give you all these advantages, for practially no price at all.
Imagine for example that one holiday would have a name of
<syntaxhighlight lang="cpp-qt">
  h.mName = QString( "New Year </h:holiday>" );
</syntaxhighlight>
If you directly create the text with the method above, the XML will have a line
<syntaxhighlight lang="xml">
  <h:name>New Year </h:holiday></h:name>
</syntaxhighlight>
which is simply malformed XML!
On the opposite, if you use Qt's DOM classes (or any other DOM implementation for that matter), the resulting XML will be
<syntaxhighlight lang="xml">
  <h:name>New Year &lt;/h:holiday></h:name>
</syntaxhighlight>
which is correct XML, as the &lt; is escaped using the &amp;lt; entity. Of course, the same result can also be obtained by manually calling an escape method (like the one provided by Qt) on each and every string that is appended to the text output. Using Qt's XML classes take away the need to think of such issues and always correctly produce the XML.


== Loading XML documents with namespaces using Qt ==
== Loading XML documents with namespaces using Qt ==


<code cppqt n>
Now that we have created XML documents with namespaced tags, of course, we also want to be able to load them using Qt's native classes. The parsing of the XML into a DOM tree works with the same method QDomDocument::setContent as above. This method also loads all the namespace and prefix information into its QDomNode children and deeper descendants. When we didn't use namespaces, we could use a call like QDomNode::namedItem( "tagname" ) to obtain a child element with the given tag name. Since the tag name is now also connected to a namespace, we also have to make sure that the tagname (obtained via QDomNode::localName()) is in that namespace, i.e. we have to make sure that QDomNode::namespaceURI() matches the required namespace URI.
// (Copyright David Faure, put under the GPL)
 
KoXmlElement KoDom::namedItemNS( const KoXmlNode& node, const char* nsURI, const char* localName )
Unfortunately, Qt -- or rather DOM -- does not offer a method namedItemNS like QDomElement::namedItem to return a child element with a given tag naem from a given namespace. There is a method
<syntaxhighlight lang="cpp-qt">
QDomNodeList elementsByTagNameNS ( const QString & nsURI, const QString & localName ) const
</syntaxhighlight>
but that method is not what we actually want, because it is recursive: It returns a list of all child elements with given tag name and namespace that are found at any depth. Thus it will also return grand- and grand-grand-children etc. of the DOm node. Clearly, we only want immediate children, so this mathod cannot be used.
 
However, of course we can write such a method ourselves (thanks go to David Faure for the hint!):
 
<syntaxhighlight lang="cpp-qt" line>
QDomElement namedItemNS( const QDomNode& node, const char* nsURI, const char* localName )
{
{
    KoXmlNode n = node.firstChild();
  QDomElement n = node.firstChildElement();
    for ( ; !n.isNull(); n = n.nextSibling() ) {
  for ( ; !n.isNull(); n = n.nextSiblingElement() ) {
        if ( n.isElement() && n.localName() == localName && n.namespaceURI() == nsURI )
    if ( n.localName() == localName && n.namespaceURI() == nsURI ) {
            return n.toElement();
      return n;
     }
     }
    return KoXmlElement();
  }
  return QDomElement();
}
}
</code>
</syntaxhighlight>
 
Notice that we checked the local name and the namespace URI to find the appropriate element. The namespace prefix cannot be used, as it is not fixed. You can use whatever prefix you want for a given namespace URI, although for many often used namespaces there are recommended prefix names. Similarly, you should not check the tag name, as it also contains the prefix (or if you use a default prefix, it is missing all namespace information!).
 
Now that we have a method to get the child element with the given prefix and tag name, we might also want to loop over all children with a given tag name from a certain namespace. This can easily be done with the same approach as above. We can simply loop through all child nodes with the wanted tag name, but we also need to make sure the namespace matches:
 
<syntaxhighlight lang="cpp-qt">
QDomElement e = parent.firstChildElement( "holiday" );
while ( !e.isNull() ) {
  if ( e.namespaceURI() == nsURI ) {
    // Do whatever we need to do with the holiday
  }
  e = e.nextSiblingElement( "holiday" );
}
</syntaxhighlight>
 
Since we now care about namespaces, we need to explicitly check whether the namespace URI matches the required namespace. The firstChilElement and nextSiblingElement methods don't care about namespaces at all and return every node with the given tag name from any namespace.


From KODom.h (by dfaure):


<code cppqt>
/**
* This namespace contains a few convenience functions to simplify code using QDom
* (when loading OASIS documents, in particular).
*
* To find the child element with a given name, use KoDom::namedItemNS.
*
* To find all child elements with a given name, use
* QDomElement e;
* forEachElement( e, parent )
* {
*    if ( e.localName() == "..." && e.namespaceURI() == KoXmlNS::... )
*    {
*        ...
*    }
* }
* Note that this means you don't ever need to use QDomNode nor toElement anymore!
* Also note that localName is the part without the prefix, this is the whole point
* of namespace-aware methods.
*
* To find the attribute with a given name, use QDomElement::attributeNS.
*
* Do not use getElementsByTagNameNS, it's recursive (which is never needed in KOffice).
* Do not use tagName() or nodeName() or prefix(), since the prefix isn't fixed.
*
* @author David Faure <[email protected]>
*/
</code>


== Loading XML documents using Qt and the SAX parser ==
== Loading XML documents using Qt and the SAX parser ==


<code cppqt>
While the DOM classes we have used so far are a very convenient method to work with XML documents, they also have their drawbacks. In particular, when we simply want to load an XML document into our own data structure, going the DOM way will build up and keep the whole tree structure of the data in memory! And that tree will use even more resources than the pure-text representation of the XML, which can be crucial for large XML documents.
After reading the XML into the DOM tree, all we do is copy the corresponding values from the tree into our own data structure and discard the whole DOM tree again. Surely, that's quite a waste of resources and there has to be a more efficient way to simply load the XML. One possible solution comes in the form of SAX (simple API for XML), which reads through the XML sequentially and calls hook methods of a QXmlDefaultHandler-derived class. For example, when an opening tag is encountered, the method QXmlDefaultHandler::startElement( nsURI, localName, qName, attributes ) is called. Similarly, QXmlDefaultHandler::startElement is called for a closing element, and QXmlDefaultHandler::characters( string ) is called for the text contents of an element.
Note that you have to keep track of all context information (e.g. enclosing tags/elements etc.) yourself. SAX simply lets you know that it encountered a particular element, not where exactly it happened or what comes before or afterwards in the XML.
 
Using this SAX classes, we have to implement our own QXmlDefaultHandler-derived class, named HolidayHandler in our case, and reimplement its methods where needed. In the startElement method we have to initialize the corresponding data structure for the opened element, in characters we have to store a copy of the text, and in endElement we can finally assign the values of the holiday name or date or insert the closed holiday into our holiday list.
 
The SAX parsing happens using the QXmlSimpleReader class, where we first have to register our QXmlDefaultHandler-derived class as the SAX handler. The QXmlSimleReader::parse method finally triggers the XML parsing.
 
<syntaxhighlight lang="cpp-qt">
class HolidayHandler : public QXmlDefaultHandler
class HolidayHandler : public QXmlDefaultHandler
{
{
Line 495: Line 716:
     return EXIT_FAILURE;
     return EXIT_FAILURE;
   }
   }
</code>
</syntaxhighlight>
 
 
== Transforming XML documents into other formats ==
 
Imagine now, that we want to create a nice-looking HTML file from our holiday XML code. Of course, we can implement the HTML creation in C++ (maybe even using the DOM classes as described above, as HTML is simply a special form of XML). However, hardcoding something in C++ is always very unflexible and any slight change require a recompilation. Furthermore, the users cannot change the procedure at all!.
 
However, there is a great W3C-specified way to transform XML files into practically every other format: XSLT.
 
XSLT is a pattern-based approach to format XML. In an XSL style sheet you define templates, which are applied to all xml elements or nodes that match a given pattern. Inside each template, you simply generate the desired output for that node. As a consequence, XSLT processes is path-independent or state-less. When you process a given tag with a template, you do now know where in the XML tree that element appears -- and you should not need to know for correct XSLT.
 
As an example, let's say we want to create a nice HTML-formatted list of all holidays in the holiday file. One possible XSLT style sheet for the holiday file would be:
 
<syntaxhighlight lang="xml">
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:h="urn:kde:developer:tutorials:QtDom:holidays"
                exclude-result-prefixes='xsl h'
>
<xsl:output method="html"/>
 
<xsl:template match="*"><xsl:apply-templates/></xsl:template>
 
<xsl:template match="h:holidayset">
  <html><body>
    <xsl:if test="h:name"><h1><xsl:value-of select="h:name"/></h1></xsl:if>
    <xsl:if test="@h:country">
      <h2>Country: <xsl:value-of select="@h:country"/></h2>
    </xsl:if>
    <table border="1">
      <tr>
        <th>Date</th><th>Holiday name</th>
      </tr>
      <xsl:apply-templates select="h:holiday">
        <xsl:sort select="h:date"/>
      </xsl:apply-templates>
    </table>
  </body></html>
</xsl:template>
 
<xsl:template match="h:holiday">
  <tr>
    <td><xsl:value-of select="h:date"/></td>
    <td><xsl:value-of select="h:name"/></td>
  </tr>
</xsl:template>


</xsl:stylesheet>
</syntaxhighlight>


''Initial Author:'' [mailto:[email protected] Reinhold Kainhofer]
''Initial Author:'' [mailto:[email protected] Reinhold Kainhofer]

Latest revision as of 20:56, 29 June 2011

Short introduction to XML

XML is a general structured format to store and exchange hierarchical data.

If you know HTML, you'll find XML quite similar (in fact, after some small modifications, a HTML file is a valid XML file): XML uses nested tags of the form <tagname>...</tagname> for tags with contents and <tagname/> for tags without content. Each tag can contain other tags, and the tag itself can have attributes of the form <tagname attribute=value>...</tagname>.

The name of the tags is not restricted (unlike HTML, which only defines a given set of proper HTML tags), so you can choose whatever name fits your needs.

As an example, let us assume that you want to store holiday information into a file and use Qt to load or modify it. To get a feeling for how XML looks like, here is one possible format for such a holiday file:

<?xml version='1.0' encoding='UTF-8'?>
<holidayset country="at">
  <name>Holidays for Austria</name>
  <holiday>
    <name>New Year's Day</name>
    <date>2007-01-01</date>
  </holiday>
  <holiday>
    <name>Christmas</name>
    <date>2007-12-24</date>
  </holiday>
</holidayset>

This file defines a holiday set for Austria (notice the country="at" attribute to the holidayset tag). The holiday set, enclosed in <holidayset>...</holidayset> contains two holidays, each enclosed with <holiday>...</holiday>. Each of these holiday elements contains the settings for that holiday enclosed in appropriately named tag.

Such an XML file can be represented as a tree structure, with the XML document being the root of the tree, and each subelement/attribute/text value is a child of it's enclosing XML element. The tree structure corresponding to the holiday file above looks like the following:

As you can see, there are different types of nodes:

  • elements: they are of the form <tagname>...</tagname>
  • attributes: they are attributes inside element tags: <tagname attribute=value>
  • text nodes: the text content of element tags, the text between <tagname> and </tagname>
  • processing instructions: they tell the xml parser / transformer / viewer how to interpret something, their form is <? instruction attribute=value ?>
  • comments: <!-- comment -->
  • document type: specifies the type of the document (e.g. html 4 transitional), its form is <!DOCTYPE name >


Notice that we did use the same tag <name> inside the <holidayset> and inside the <holiday> tags. We used quite generic names for the tags, which might become a problem with complexer structure, when we want to use the same name for different purposes. For this reason, XML also defines namespaces to allow for the same name (but from a different namespace, so they are actually different names) used in different context. In a later section we will look at these namespaces.

Also note that we implicitly used specially formated (ISO-formatted) contents for the date tags, without yet specifying it. Of course we could give any other value, say <date>I hope never</date>, and it would still be a valid XML file, but the parser will not be able to interpret the value as a date. Add such constraints and specific formats / values / value ranges for elements is possible using either a DTD or an XML Schema. If you have such a definition, a validating parser can check whether a given XML file really adheres to the document structure defined in that schema. Unfortunately, the Qt XML/DOM classes are not validating parsers, so you cannot validate XML documents against a given schema with Qt.


We will use the example from above throughout this tutorial. In our application, we want to store the holiday set in the following class:

class Holiday
{
public:
  Holiday() {}
  ~Holiday() {}

  QDate mDate;
  QString mName;
};

class HolidaySet
{
public:
  HolidaySet( const QString &c ) : mCountry( c ) {}
  ~HolidaySet() {}

  QString mCountry, mName;
  QList<Holiday> mHolidays;
};

In production code, you would not make the member variables public and directly access them, but rather add accessors and setter functions:

QDate date() { return mDate; }
void setDate( const QDate &date ) { mDate = date; }

To save space, I decided to neglect that rule of thumb here in this example. As this is a tutorial for XML and Qt DOM, I want to concentrate on the basics of Qt DOM and not on a good general programming style.

As there are only so many sensible names, sooner or later you will find out that you will use the same tagname or attribute name for different cases with different meanings. That is the point where namespaces come in.

Creating a simple XML file with Qt DOM

Let us first look at how to use the Qt classes to generate the XML for the holiday file from the HolidaySet class that you have in memory. For this purpose, Qt offers the classes QDomDocument to represent the whole document and QDomNode and QDomElement to represent each individual tag and attribute.

To understand the code below, one has to be aware that DOM is actually a well-defined API to work with and modify XML documents. That is also the reason why the code above, in particular the addElement method, is not as beautiful as usual Qt-using code is. Instead, the code will be more or less identical in whatever programming language you use.

The XML document is described by a on object of the class QDomDocument with methods to create new elements. The general flow of building up a DOM tree is as follows:

  1. Create the DOM document
  2. For each element of the dom tree:
    1. Create the element using the methods from QDomDocument. The element does not yet have any position within the DOM tree.
    2. Insert the element into its parent node.
    3. If the element should have contents, set the contents, set the attributes, etc.


As a line of code says more then a thousand words, let us look at some sample code to generate the DOM tree from the HolidaySet class:

/* Helper function to generate a DOM Element for the given DOM document 
   and append it to the children of the given node. */
QDomElement addElement( QDomDocument &doc, QDomNode &node, 
                        const QString &tag, 
                        const QString &value = QString::null )
{
  QDomElement el = doc.createElement( tag );
  node.appendChild( el );
  if ( !value.isNull() ) {
    QDomText txt = doc.createTextNode( value );
    el.appendChild( txt );
  }
  return el;
}


QString holidaySetToXML( const HolidaySet &hs )
{
  QDomDocument doc;
  QDomProcessingInstruction instr = doc.createProcessingInstruction( 
                    "xml", "version='1.0' encoding='UTF-8'");
  doc.appendChild(instr);

  // generate holidayset tag as root, add country attribute if needed
  QDomElement holidaySetElement = addElement( doc, doc, "holidayset" );
  if ( !hs.mCountry.isEmpty() ) 
    holidaySetElement.setAttribute( "country", hs.mCountry );
  
  // Add the <name> and <comment> elements to the holidayset
  if ( !hs.mName.isEmpty() ) 
    addElement( doc, holidaySetElement, "name", hs.mName );

  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::iterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
     QDomElement h = addElement( doc, holidaySetElement, "holiday" );
     addElement( doc, h, "name", (*i).mName );
     addElement( doc, h, "date", (*i).mDate.toString( Qt::ISODate ) );
  }

  return doc.toString();
}

One thing to notice is that all DOM nodes are passed by value (since some programming languages do not define pointers, the DOM API cannot use any pointer-based functionality!). The Qt implementation of DOM uses explicit sharing (as explained in the API documentation of QDomNode), so that whenever you change the QDomNode (or of course any derived class object), all copies are changed (since they point to the same data in memory). To create an independent copy of a given QDomNode, there is the method QDomNode::cloneNode(bool recursive). The bool argument to cloneNode indicates whether the children of the node shall also be copied independently or whether they shall still be explicitly shared with the original node.

Let us now slowly step through the code:

  • As you can see, the whole step 2) is done with the addElement helper function (lines 3-14) using the defined DOM methods. The addElement function needs the DOM document to create the new element, it needs the parent node to insert the new element, and it needs the tag name and possible value for the new element. Let's assume the function was called as
    addElement( doc, node, "tag", "contents")
    .
    • Line 7: To create the new element, we call the QDomDocument::addElement method. This creates the <tag> tag, without any contents or attributes. The new tag is also not yet positioned anywhere in the DOM tree.
    • Line 8: To insert the newly created tag as the child of an already existing node, simply call QDomNode::appendChild with the new tag as argument.
    • Lines 9-12: In the DOM representation, the contents of a tag (i.e. the text between <tag> and </tag> in the XML) is represented as a DOM object of type text and is a child of the enclosing tag. For this reason, setting the contents of a tag means creating a text node (like in line 7, only that we do not create an element node, but a text node) and inserting it into the element that we created above. Now we have an element <tag>contents</tag> in XML.
    • Line 13: As we will need the new node to set attributes or insert children, we return it.
  • The holidaySetToXML method (lines 17-40) does the actual conversion of our holiday set to a DOM tree.
    • First it creates an empty DOM document (line 19)
    • The processing instruction <?xml ...?> is created in line 20 and inserted as the first child of the document in line 21.
    • The root element (<holidayset>...</holidayset>) is then created using our addElement helper function (line 25). Here the DOM document and the parent are the same (doc). The holidaySetElement is still empty, i.e. we now have an XML repesentation
<holidayset/>
    • which is the same as <holidayset></holidayset>
    • To set an attribute (country="at") for the DOM node, we call QDomElement::setAttribute( key, value ) (line 27). Our document now has the XML representation
      <holidayset country="at"/>
    • To populate the holidayset with the <holiday>...</holiday> entries, we create a new element (line 36) for each of the holidays in a loop. The parent of all these elements is the holiday set element. After line 36 we have an XML

<holidayset country="at"> <holiday/> </holidayset>

    • As you can imagine, to set the <name> and <date> for each holiday, all we have to do is to call addElement with the <holiday> tag as parent. (lines 37/38)


You can now create the XML file contents simply via

// Create the data structure
HolidaySet hs("at");
hs.mName="Holidays for Austria";
Holiday h;

h.mDate = QDate( 2007, 01, 01 );
h.mName = QString( "New Year" );
hs.mHolidays.append( h );

h.mDate = QDate( 2006, 12, 24 );
h.mName = QString( "Christmas" );
hs.mHolidays.append( h );

// convert to the XML string
QString output = holidaySetToXML( hs );
// output that XML string
qDebug()<<output;

Loading a simple XML file using Qt DOM

Let us now look at loading an XML file into memory and parsing it into our HolidaySet memory structure. There are two different strategies for loading XML documents:

  • A SAX parser (Simple API for XML) walks through the XML file sequentially, calling methods like startTag and endTag whenever an opening or closing tag is encountered. There is no hierarchy involved yet (which you can still introduce when building your memory structures in the startTag/endTag methods), but the advantage is that there is no need to keep the whole XML document in memory.
  • DOM (Document Object Model) on the other hand, loads the whole document into memory, splitting it into different nodes and building a hierarchical tree. The advantage is that you do not need to build the hierarchy yourself, while on the other hand the whole document needs to be in memory. For huge documents this can be a real problem, but for our rather small holiday files, we will use DOM.

From the description above it is clear that SAX can only be used to load an XML file, while DOM can also be used to build up or modify existing XML files. In fact, we already did exactly that in the previous chapter where we created the holiday file.

In this chapter we will now look at how we can parse the XML holiday set into the HolidaySet structure using the DOM method. The parsing using a SAX parser will be treated in a later chapter.

The usual way to parse an XML document into a DOM tree is to use the method QDomDocument::setContent. If this method is successful, the QDomDocument object contains the DOM tree in the usual DOM structure. If an error occurs, the error message and the exact position of the error is stored into the parameters of the call:

QFile f( argv[1] );
QDomDocument doc;
QString errorMsg;
int errorLine, errorColumn;
if ( doc.setContent( &f, &errorMsg, &errorLine, &errorColumn ) ) {
  result = parseXMLwithDOM( doc );
}

The resulting QDomDocument now contains the whole DOM tree, similar to the one that we created in the previous section. The DOM tree represents the hierarchical structure as shown in the image above. To obtain an element with a given tag name, one can use the QDomNode::namedItem( "tagname" ) method of the parent node object, which returns a QDomNode (this class is the base type to describe any of the DOM nodes). As we are interested in the element, we have to convert it to a DOM element by the toElement() method. You might find this to be quite awkward to use, in particular since C++ usually offers better ways to work with derived classes. However, the DOM API was designed to work with various different languages, so typical C++-isms cannot be part of the DOM API.

To extract the holiday data from the DOM tree, we write the function parseXMLwithDOM:

HolidaySet parseXMLwithDOM( QDomDocument &domTree ) 
{
  HolidaySet hs( QString::null );

We first obtain the DOM element that represents the <holidayset>...</holidayset> element using the namedItem and toElement methods. If no child node named "holidayset" exists or it is not an element (e.g. because it is a processing instruction or an attribute), a null element is returned:

  QDomElement set = domTree.namedItem("holidayset").toElement();
  if ( set.isNull() ) {
    qWarning() << "No <holidayset> element found at the top-level "
               << "of the XML file!";
    return hs; // no holiday set found
  }

This element possibly has an attribute named "country". This can be checked using the QDomNode::hasAttribute method and the attribute can be obtained using the QDomNode::attribute method:

if ( set.hasAttribute("country") ) {
  hs.mCountry = set.attribute("country");
}

We can also have a <name>...</name> child element, which can be obtained similar to the holidayset element. We retrieve the text of the element between the enclosing tags by a call to QDomElement::text():

// Search for a given element (only the first matching is returned):
QDomElement name = set.namedItem("name").toElement();
if ( !name.isNull() ) { // We have a <name>..</name> element in the set
  hs.mName = name.text();
}

The namedItem method will always return the first child that matches the given name. While this might be fine for the <name> child element, we cannot use this for the <holiday>...</holiday> elements, which will appear more than once. For this reason, there are iterator-like calls firstChild() and and nextSibling() to walk through all child nodes and firstChildElement("tagname") and nextSiblingElement("tagname") to walk only through all child elements with the given tag name. If no further child can be found, a null node is returned:

// Way 1: Loop through all child nodes with a given tag name.
QDomElement e = set.firstChildElement( "holiday" );
for ( ; !e.isNull(); e = e.nextSiblingElement( "holiday" ) ) {
  Holiday h;
  // e is the <holiday>...</holiday> element....
  // Load the contents of e into h
  hs.mHolidays.append( h );
}

// Way 2: Loop through all child nodes and check if it is an element 
//        with one of the wanted tagnames
QDomNode nd = set.firstChild();
for ( ; !nd.isNull(); nd = nd.nextSibling() ) {
  if ( nd.isElement() && nd.toElement().tagName() == "holiday" ) {
    QDomElement e = nd.toElement();
    Holiday h;
    // Same code as above...
    // e is the <holiday>...</holiday> element....
    // Load the contents of e into h
    hs.mHolidays.append( h );
  }
}

The first method is of course the better and simpler choice in our example. However, if the <holidayset> element can have various different child elements, it's often easier and faster to loop through all children only once and condition the code on the name of the tag, which is done in the second example.

If one looks at the API documentation of the QDomElement class, one will also find a method elementsByTagName, which returns a QDomNodeList that can be traversed using iterators. The problem with this method is that it is recursive, i.e. it will return all child elements with the given name at any level (in a well-defined canonical order). We, however, only want the immediate children of the <holidayset> element, but not sub- or subsubchildren.

Now that we have the QDomElement of the <holiday>...</holiday> element, we can easily load its contents into the Holiday structure, using only methods that we have already seen:

Holiday h;
QDomElement v = e.namedItem("name").toElement();
if ( !v.isNull() ) h.mName = v.text();
v = e.namedItem("date").toElement();
if ( !v.isNull() ) {
  h.mDate = QDate::fromString( v.text(), Qt::ISODate );
}
hs.mHolidays.append( h );

This concludes our method for loading the holiday set data from the DOM tree representation into the HolidaySet data structure. The whole parseXMLwithDOM function thus reads:

HolidaySet parseXMLwithDOM( QDomDocument &domTree ) 
{
  HolidaySet hs( QString::null );
  
  QDomElement set = domTree.namedItem("holidayset").toElement();
  if ( set.isNull() ) {
    qWarning() << "No <holidayset> element found at the top-level "
               << "of the XML file!";
    return hs; // no holiday set found
  }
  
  if ( set.hasAttribute("country") ) {
    hs.mCountry = set.attribute("country");
  }
  
  // Way 1: Explicitly search for a given element:
  QDomElement name = set.namedItem("name").toElement();
  if ( !name.isNull() ) { // We have a <name>..</name> element in the set
    hs.mName = name.text();
  }

  // Way 2: Loop through all child nodes with a given tag name.
  QDomElement n = set.firstChildElement( "holiday" );
  for ( ; !n.isNull(); n = n.nextSiblingElement( "holiday" ) ) {
    Holiday h;
    QDomElement e = n.toElement();
    QDomElement v = e.namedItem("name").toElement();
    if ( !v.isNull() ) h.mName = v.text();
    v = e.namedItem("date").toElement();
    if ( !v.isNull() ) {
      h.mDate = QDate::fromString( v.text(), Qt::ISODate );
    }
    hs.mHolidays.append( h );
  }

  // Way 3: Loop through all child nodes and check if it is an element 
  //        with one of the wanted tagnames
  QDomNode nd = set.firstChild();
  for ( ; !nd.isNull(); nd = nd.nextSibling() ) {
    if ( nd.isElement() && nd.toElement().tagName() == "holiday" ) {
      QDomElement n = nd.toElement();
      // Same code as above...
    }
  }

  return hs;
}

Introduction to XML Namespaces

If you are working with complex XML files, sooner or later you will realize that you are using tags and attributes that can be categorized according to their meaning. Even worse, you might encounter tags with the same named used for completely different things. For example, if you are managing a large document describing books and their authors, you might use a <title> element to denote the title of the book, as well as the job title of the author.

So, clearly there should be a way to distinguish these two elements with the same name, but different usages. This can be done via XML namespaces, using prefixes to the elements. A namespace is a unique URI (not necessarily a URL!), e.g. "http://reinhold.kainhofer.com/ns/qtdom-examples" or "urn:kde:developer:tutorials:QtDom:holidays". Rathern than using these long and awkward URIs with each element, you have to define a prefix for each of the namespaces. This is done via an

  xmlns:yourprefix="your:namespace:uri"

attribute to the element or any of its parents. This defines the "yourprefix" prefix as a shortcut to the full namespace URI "your:namespace:uri". Usually, the namespace declaration is added to the root element of hte XML file. To use an element from this namespace, you can simply prepend the prefix to the tag name: <yourprefix:tag>. This tag name together with the prefix is called the "qualified name" (or in short qName) or the element. The various parts of the element (which you will find in the Qt DOM API, too), are:

yourprefix:tag qualified name (qName)
yourprefix prefix
tag local name
your:namespace:uri Namespace URI

(As a side note: If an element does not use any namespaces and thus also no prefix, e.g. <tag>, all these parts of the qName are empty and only the "tag name" is set to "tag".)

Using this new knowledge about namespaces, of course, we want to generate and process holiday files with proper namespaces. For example, we could define all the tags to be in the namespace "urn:kde:developer:tutorials:QtDom:holidays", indicated by the prefix "h". A typical XML file would then look like:

<?xml version='1.0' encoding='UTF-8'?>
<h:holidays xmlns:h="urn:kde:developer:tutorials:QtDom:holidays"
            h:country="at">
  <h:holiday>
    <h:name>New Year's Day</h:name>
    <h:date>2007-01-01</h:date>
  </h:holiday>
  <h:holiday>
    <h:name>Christmas</h:name>
    <h:date>2007-12-24</h:date>
  </h:holiday>
</h:holidays>

Generating XML documents with namespaces using Qt

So far, we created elements (without namespaces) using the QDomDocument:createElement(...) method. Qt, or rather DOM, offers a simply way to generate elements with a namespace attached:

  QDomDocument::createElementNS( const QString &nsURI, const QString &qName )

To create an element with a namespace attached, the only difference to the example from the previous sections is that you need to use this method instead of QDomElement::createElement( const QString &tag ). The first parameter is the full URI of the namespace (e.g. "urn:kde:developer:tutorials:QtDom:holidays" in our example XML code), while the second argument is no longer the tag name along (like "holiday"), but rather the full QName, including the namespace prefix. So, to create the holiday element in the namespace with prefix "h", simply replace the call to

  doc.createElement( "holiday" );

with

  doc.createElementNS( "urn:kde:developer:tutorials:QtDom:holidays", "h:holiday" );

Again, we can write our own helper method "addElementNS":

/* Helper function to generate a DOM Element for the given DOM document 
   and append it to the children of the given node. */
QDomElement addElementNS( QDomDocument &doc, QDomNode &node, 
                          const QString &nsURI, const QString &qName, 
                          const QString &value = QString::null )
{
  QDomElement el = doc.createElementNS( nsURI, qName );
  node.appendChild( el );
  if ( !value.isNull() ) {
    QDomText txt = doc.createTextNode( value );
    el.appendChild( txt );
  }
  return el;
}

Qt will automatically keep track of the namespaces and their prefixes and insert the appropriate xmlns:h attribute to the right element. You don't have to add this attribute manually!

Similar to the createElementNS method, QDomDocument provides the method QDomElement::setAttributeNS to add a namespaced attribute to a given DOM element.

The whole "holidaySetToXML" method thus becomes:

QString holidaySetToXML( const HolidaySet &hs )
{
  QDomDocument doc;
  QDomProcessingInstruction instr = doc.createProcessingInstruction( 
                    "xml", "version='1.0' encoding='UTF-8'");
  doc.appendChild(instr);
  
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");

  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  QDomElement holidaySetElement = addElementNS( doc, doc, ns, "h:holidayset" );
  if ( !hs.mCountry.isEmpty() ) 
    holidaySetElement.setAttributeNS( ns, "h:country", hs.mCountry );
  
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() ) 
    addElementNS( doc, holidaySetElement, ns, "h:name", hs.mName );

  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
     QDomElement h = addElementNS( doc, holidaySetElement, ns, "h:holiday" );
     addElementNS( doc, h, ns, "h:name", (*i).mName );
     addElementNS( doc, h, ns, "h:date", (*i).mDate.toString( Qt::ISODate ) );
  }

  return doc.toString();
}
}

Workaround for broken namespace handling in Qt 4.2.0 and before

Unfortunately, the correct way laid out above does not produce valid XML with Qt versions at least up to 4.2.0:

<?xml version='1.0' encoding='UTF-8'?>
<h:holidayset xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at" xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" >
 <h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">Holidays for Austria</h:name>
 <h:holiday xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">
  <h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">New Year</h:name>
  <h:date xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">2007-01-01</h:date>
 </h:holiday>
 <h:holiday xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">
  <h:name xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">Christmas</h:name>
  <h:date xmlns:h="urn:kde:developer:tutorials:QtDom:holidays">2006-12-24</h:date>
 </h:holiday>
</h:holidayset>

Apparently, this XML seems more verbose than we actually want: Qt inserts the xmlns:h attribute into every element that uses the "h" namespace! Even worse, the <h:holidayset> element has this attribute set twice... This is a bug in Qt and has been fixed in the latest versions, but for older versions you'll have to resolve to a quite ugly workaround: Simply don't create the elements with createElementNS, but rather createElement (while still giving the whole qName, including the "h:" prefix!). Consequently, you now also have to create the xmlns:h attribute yourself.

QString holidaySetToXML( const HolidaySet &hs )
{
  QDomDocument doc;
  QDomProcessingInstruction instr = doc.createProcessingInstruction( 
                    "xml", "version='1.0' encoding='UTF-8'");
  doc.appendChild(instr);
  
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");

  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  QDomElement holidaySetElement = addElement( doc, doc, "h:holidayset" );
  holidaySetElement.setAttribute( "xmlns:h", ns );
  if ( !hs.mCountry.isEmpty() ) 
    holidaySetElement.setAttribute( "h:country", hs.mCountry );
  
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() ) 
    addElement( doc, holidaySetElement, "h:name", hs.mName );

  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
     QDomElement h = addElement( doc, holidaySetElement, "h:holiday" );
     addElement( doc, h, "h:name", (*i).mName );
     addElement( doc, h, "h:date", (*i).mDate.toString( Qt::ISODate ) );
  }

  return doc.toString();
}

The output now becomes the correct XML code:

<?xml version='1.0' encoding='UTF-8'?>
<h:holidayset xmlns:h="urn:kde:developer:tutorials:QtDom:holidays" h:country="at" >
 <h:name>Holidays for Austria</h:name>
 <h:holiday>
  <h:name>New Year</h:name>
  <h:date>2007-01-01</h:date>
 </h:holiday>
 <h:holiday>
  <h:name>Christmas</h:name>
  <h:date>2006-12-24</h:date>
 </h:holiday>
</h:holidayset>

Generating XML documents with namespaces using brute-force

Of course, noone forces you to use Qt's DOM classes to generate the XML code. After all, the resulting XML is only a simple text! The most straight-forward approach would thus be to directly generate the text that contains the XML.

QString holidaySetToXML( const HolidaySet &hs )
{
  const QString ns("urn:kde:developer:tutorials:QtDom:holidays");
  QString result("<?xml version='1.0' encoding='UTF-8'?>\n");
  
  // generate the <h:holidayset> tag as the root tag, add the country
  // attribute if needed
  result += "<h:holidayset xmlns:h=\"" + ns + "\" ";
  if ( !hs.mCountry.isEmpty() ) 
    result += "h:country=\"" + hs.mCountry + "\" ";
  result += ">\n";
  
  // Add the <h:name> element to the holidayset
  if ( !hs.mName.isEmpty() ) 
    result += " <h:name>" + hs.mName + "</h:name>\n";

  // Add each holiday as a <holiday>..</holiday> element
  QList<Holiday>::ConstIterator i;
  for ( i = hs.mHolidays.begin(); i != hs.mHolidays.end(); ++i) {
    result += " <h:holiday>\n";
    result += "  <h:name>" + (*i).mName + "</h:name>\n";
    result += "  <h:date>" + (*i).mDate.toString( Qt::ISODate ) + "</h:date>\n";
    result += " </h:holiday>\n";
  }
  
  // Finally, close the embracing <h:holidayset> element
  result += "</h:holidayset>\n";

  return result;
}

Of course, this approach works fine if your only goal is to create the XML for outputting it to a file or piping it into another application, library or web service.

All is not so well, though!

However, as you create the text directly, there is no easy way to modify the XML except for parsing it again into a DOM tree or a similar tree representation.

Another drawback is that there are no checks whether the generated XML is valid at all. You can imagine how easy it is to forget about escaping a "<" in the holiday name (which the DOM classes will automatically do for you) or to misspell a tag or miss a closing tag altogether. The DOM classes give you all these advantages, for practially no price at all.

Imagine for example that one holiday would have a name of

  h.mName = QString( "New Year </h:holiday>" );

If you directly create the text with the method above, the XML will have a line

  <h:name>New Year </h:holiday></h:name>

which is simply malformed XML! On the opposite, if you use Qt's DOM classes (or any other DOM implementation for that matter), the resulting XML will be

  <h:name>New Year &lt;/h:holiday></h:name>

which is correct XML, as the < is escaped using the &lt; entity. Of course, the same result can also be obtained by manually calling an escape method (like the one provided by Qt) on each and every string that is appended to the text output. Using Qt's XML classes take away the need to think of such issues and always correctly produce the XML.



Loading XML documents with namespaces using Qt

Now that we have created XML documents with namespaced tags, of course, we also want to be able to load them using Qt's native classes. The parsing of the XML into a DOM tree works with the same method QDomDocument::setContent as above. This method also loads all the namespace and prefix information into its QDomNode children and deeper descendants. When we didn't use namespaces, we could use a call like QDomNode::namedItem( "tagname" ) to obtain a child element with the given tag name. Since the tag name is now also connected to a namespace, we also have to make sure that the tagname (obtained via QDomNode::localName()) is in that namespace, i.e. we have to make sure that QDomNode::namespaceURI() matches the required namespace URI.

Unfortunately, Qt -- or rather DOM -- does not offer a method namedItemNS like QDomElement::namedItem to return a child element with a given tag naem from a given namespace. There is a method

QDomNodeList elementsByTagNameNS ( const QString & nsURI, const QString & localName ) const

but that method is not what we actually want, because it is recursive: It returns a list of all child elements with given tag name and namespace that are found at any depth. Thus it will also return grand- and grand-grand-children etc. of the DOm node. Clearly, we only want immediate children, so this mathod cannot be used.

However, of course we can write such a method ourselves (thanks go to David Faure for the hint!):

QDomElement namedItemNS( const QDomNode& node, const char* nsURI, const char* localName )
{
  QDomElement n = node.firstChildElement();
  for ( ; !n.isNull(); n = n.nextSiblingElement() ) {
    if ( n.localName() == localName && n.namespaceURI() == nsURI ) {
      return n;
    }
  }
  return QDomElement();
}

Notice that we checked the local name and the namespace URI to find the appropriate element. The namespace prefix cannot be used, as it is not fixed. You can use whatever prefix you want for a given namespace URI, although for many often used namespaces there are recommended prefix names. Similarly, you should not check the tag name, as it also contains the prefix (or if you use a default prefix, it is missing all namespace information!).

Now that we have a method to get the child element with the given prefix and tag name, we might also want to loop over all children with a given tag name from a certain namespace. This can easily be done with the same approach as above. We can simply loop through all child nodes with the wanted tag name, but we also need to make sure the namespace matches:

QDomElement e = parent.firstChildElement( "holiday" );
while ( !e.isNull() ) {
  if ( e.namespaceURI() == nsURI ) {
    // Do whatever we need to do with the holiday
  }
  e = e.nextSiblingElement( "holiday" );
}

Since we now care about namespaces, we need to explicitly check whether the namespace URI matches the required namespace. The firstChilElement and nextSiblingElement methods don't care about namespaces at all and return every node with the given tag name from any namespace.


Loading XML documents using Qt and the SAX parser

While the DOM classes we have used so far are a very convenient method to work with XML documents, they also have their drawbacks. In particular, when we simply want to load an XML document into our own data structure, going the DOM way will build up and keep the whole tree structure of the data in memory! And that tree will use even more resources than the pure-text representation of the XML, which can be crucial for large XML documents. After reading the XML into the DOM tree, all we do is copy the corresponding values from the tree into our own data structure and discard the whole DOM tree again. Surely, that's quite a waste of resources and there has to be a more efficient way to simply load the XML. One possible solution comes in the form of SAX (simple API for XML), which reads through the XML sequentially and calls hook methods of a QXmlDefaultHandler-derived class. For example, when an opening tag is encountered, the method QXmlDefaultHandler::startElement( nsURI, localName, qName, attributes ) is called. Similarly, QXmlDefaultHandler::startElement is called for a closing element, and QXmlDefaultHandler::characters( string ) is called for the text contents of an element. Note that you have to keep track of all context information (e.g. enclosing tags/elements etc.) yourself. SAX simply lets you know that it encountered a particular element, not where exactly it happened or what comes before or afterwards in the XML.

Using this SAX classes, we have to implement our own QXmlDefaultHandler-derived class, named HolidayHandler in our case, and reimplement its methods where needed. In the startElement method we have to initialize the corresponding data structure for the opened element, in characters we have to store a copy of the text, and in endElement we can finally assign the values of the holiday name or date or insert the closed holiday into our holiday list.

The SAX parsing happens using the QXmlSimpleReader class, where we first have to register our QXmlDefaultHandler-derived class as the SAX handler. The QXmlSimleReader::parse method finally triggers the XML parsing.

class HolidayHandler : public QXmlDefaultHandler
{
public:
  HolidayHandler() : QXmlDefaultHandler(), holiday(0), holidayset(QString::null) {}
  bool startElement( const QString &/*namespaceURI*/, const QString &localName, const QString &/*qName*/, const QXmlAttributes & atts )
  {
    if ( localName == "holidayset" ) {
      QString country = atts.value("value");
      if ( !country.isEmpty() ) holidayset.mCountry = country;
    } else if ( localName == "holiday" ) {
      if ( !holiday ) holiday = new Holiday;
    }
    content.clear();
    return true;
  }
  
  virtual bool endElement ( const QString &/*namespaceURI*/, const QString & localName, const QString &/*qName*/ )
  {
    if ( localName == "name" ) {
      if ( holiday ) holiday->mName = content;
      else holidayset.mName = content;
    } else if ( localName == "date" ) {
      QDate d = QDate::fromString( content, Qt::ISODate );
      if ( d.isValid() && holiday ) holiday->mDate = d;
    } else if ( localName == "holiday" ) {
      holidayset.mHolidays.append( *holiday );
      holiday = 0;
    }
    content.clear();
    return true;
  }

  bool characters(const QString &str)
  {
    content += str;
    return true;
  }

  bool fatalError(const QXmlParseException &exception)
  {
    qDebug() << "Parse error at line " << exception.lineNumber() 
             << ", column " << exception.columnNumber() << ":\n" 
             << exception.message();
    return false;
  }
  
  HolidaySet holidaySet() { return holidayset; }
  
protected:
  Holiday *holiday;
  HolidaySet holidayset;
  QString content;
};


int main(int argc, char *argv[])
{
  if ( argc < 2 ) {
    qWarning() << "Please give the XML file as argument!"<<endl << "Usage: qtdom_parse_dom filename.xml"<<endl;
    return EXIT_FAILURE;
  }
  HolidaySet result( QString::null );
  
  HolidayHandler handler;
  QXmlSimpleReader reader;
  reader.setContentHandler(&handler);
  reader.setErrorHandler(&handler);

  QFile file(argv[1]);
  if (!file.open(QFile::ReadOnly | QFile::Text)) {
      QMessageBox::warning(0, "QtDOM SAX example",
                           QString("Cannot read file %1:\n%2.")
                           .arg(argv[1])
                           .arg(file.errorString()));
      return EXIT_FAILURE;
  }

  QXmlInputSource xmlInputSource(&file);
  if ( reader.parse(xmlInputSource) ) {
    result = handler.holidaySet();
  } else {
    return EXIT_FAILURE;
  }


Transforming XML documents into other formats

Imagine now, that we want to create a nice-looking HTML file from our holiday XML code. Of course, we can implement the HTML creation in C++ (maybe even using the DOM classes as described above, as HTML is simply a special form of XML). However, hardcoding something in C++ is always very unflexible and any slight change require a recompilation. Furthermore, the users cannot change the procedure at all!.

However, there is a great W3C-specified way to transform XML files into practically every other format: XSLT.

XSLT is a pattern-based approach to format XML. In an XSL style sheet you define templates, which are applied to all xml elements or nodes that match a given pattern. Inside each template, you simply generate the desired output for that node. As a consequence, XSLT processes is path-independent or state-less. When you process a given tag with a template, you do now know where in the XML tree that element appears -- and you should not need to know for correct XSLT.

As an example, let's say we want to create a nice HTML-formatted list of all holidays in the holiday file. One possible XSLT style sheet for the holiday file would be:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:h="urn:kde:developer:tutorials:QtDom:holidays"
                exclude-result-prefixes='xsl h'
>
<xsl:output method="html"/>

<xsl:template match="*"><xsl:apply-templates/></xsl:template>

<xsl:template match="h:holidayset">
  <html><body>
    <xsl:if test="h:name"><h1><xsl:value-of select="h:name"/></h1></xsl:if>
    <xsl:if test="@h:country">
      <h2>Country: <xsl:value-of select="@h:country"/></h2>
    </xsl:if>
    <table border="1">
      <tr>
        <th>Date</th><th>Holiday name</th>
      </tr>
      <xsl:apply-templates select="h:holiday">
        <xsl:sort select="h:date"/>
      </xsl:apply-templates>
    </table>
  </body></html>
</xsl:template>

<xsl:template match="h:holiday">
  <tr>
    <td><xsl:value-of select="h:date"/></td>
    <td><xsl:value-of select="h:name"/></td>
  </tr>
</xsl:template>

</xsl:stylesheet>

Initial Author: Reinhold Kainhofer