KDE TechBase
  • Page
  • Discussion
  • Edit
  • History
KDE TechBase is a Wiki - You can help! Please contribute! Questions?

Development/Tutorials/Programming Tutorial KDE 4/How to write an XML parser

< Development | Tutorials | Programming Tutorial KDE 4
Warning
noframe
This page has been nominated for speedy deletion. This delay is intended to give the contributor time to modify the page to make it relevant. If it is relevant, please remove this tag.

If you disagree with its speedy deletion, remove the template and discuss it on its talk page.

Administrators: Please check the page history, especially the last diff, and What links here before deleting.

A parser is used to distinguish between formal language and bulk data of a given grammar. See http://en.wikipedia.org/wiki/Parser for more information. There are two ways to write a parser: to split up the content of a file into an object as known from object-oriented programming (the DOM approach) or to trigger a function everytime a reader occurs a given syntax tag (the QXML approach).

[edit] The QXML approach

parser.h:

  1. /*
  2. parser.h - demonstration of a parser in C++
  3. */
  4.  
  5. #ifndef PARSER_H
  6. #define PARSER_H
  7.  
  8. #include <qstring.h>
  9. #include <QtXml/QXmlDefaultHandler>
  10. #include <QtXml/QXmlAttributes>
  11.  
  12. class Parser : public QXmlDefaultHandler
  13. {
  14. public:
  15.  
  16. Parser();
  17.  
  18. /** given by the framework from qxml. Called when parsing the xml-document starts. */
  19. bool startDocument();
  20.  
  21. /** given by the framework from qxml. Called when the reader occurs an open tag (e.g. \<b\> ) */
  22. bool startElement( const QString&, const QString&, const QString& qName, const QXmlAttributes& att );
  23.  
  24. };
  25.  
  26.  
  27. #endif

parser.cpp:

  1. /*
  2. parser.cpp - demonstration of a parser in C++
  3. */
  4.  
  5. #include "parser.h"
  6. #include <kdebug.h>
  7.  
  8. Parser::Parser()
  9. {
  10. }
  11. bool Parser::startDocument()
  12. {
  13. kDebug() << "Searching document for tags";
  14. return true;
  15. }
  16. bool Parser::startElement( const QString&, const QString&, const QString& qName, const QXmlAttributes& att )
  17. {
  18. kDebug() << "Found Element" << qName;
  19. return true;
  20. }

hello.cpp:

  1. /*
  2. hello.cpp
  3. compile it with
  4. g++ -I. -I/home/kde-devel/kde/include -I/home/kde-devel/qt-unstable/include/Qt -I/home/kde-devel/qt-unstable/include /home/kde-devel/qt-unstable/include/QtXml parser.h parser.cpp hello.cpp -L/home/kde-devel/kde/lib -L/home/kde-devel/qt-unstable/lib -lQtCore_debug -lQtXml_debug -lkdeui
  5. */
  6.  
  7.  
  8. #include <qstring.h>
  9. #include <QXmlInputSource>
  10. #include <qfile.h>
  11. #include <parser.h>
  12.  
  13. int main()
  14. {
  15. Parser* handler=new Parser();
  16. QXmlInputSource* source=new QXmlInputSource(new QFile("hello.htm"));
  17. QXmlSimpleReader reader;
  18. reader.setContentHandler( handler );
  19. reader.parse( source );
  20. }

[edit] The DOM approach

  1. /*
  2. dom.cpp
  3. A demonstration how to use the dom parsing framework.
  4. Prints the first subnode of an HTML file, i.e. typically
  5. "head" or "body".
  6. compile it like this:
  7. g++ -I. -I/opt/kde3/include -I/usr/lib/qt3/include dom.cpp \
  8. -L/opt/kde3/lib -L/usr/lib/qt3/lib -lqt-mt -lkdeui
  9. */
  10. #include <qdom.h>
  11. #include <qfile.h>
  12. #include <kdebug.h>
  13.  
  14. int main()
  15. {
  16. QDomDocument doc( "myDocument" );
  17. QFile qf("hello.htm");
  18. doc.setContent( &qf );
  19. QDomElement docElement = doc.documentElement();
  20. QDomNode node;
  21. node = docElement.firstChild();
  22. kdDebug() << node.nodeName() << endl;
  23. }

[edit] Drawbacks

HTML parsing only works for "legal" html documents. For example, look at this code:

<html>
  <body>
      <a href="http://www.kde.org/"></a>
      <a href="/index.php?title=Special:User&returnto=Main_Page">Log in</a>
      <a href="http://www.gmx.de"></a>
  </body>
</html>

This code contains a & and will bring your parser to an error.

See here:

<html>
  <body>
      <a href="http://www.kde.org/"></a>
      <a href="/index.php" nowrap>Log in</a>
      <a href="http://www.gmx.de"></a>
  </body>
</html>

This code will throw an error because of the nowrap that is not xml-conform.

Retrieved from "http://techbase.kde.org/Development/Tutorials/Programming_Tutorial_KDE_4/How_to_write_an_XML_parser"

Category: Marked for Deletion

Navigation

  • Home
  • Help
  • Recent changes

Sections

  • Getting started
  • Development
  • Schedules
  • Policies
  • Contribute
  • Projects

Toolbox

  • What links here
  • Related changes
  • Upload file
  • Special pages
  • Printable version
  • Permanent link

Personal tools

  • Log in / create account
KDE® and the K Desktop Environment® logo are registered trademarks of KDE e.V. Qt® and Trolltech® are registered trademarks of Trolltech ASA. Linux® is a registered Trademark of Linus Torvalds. | Legal