Development/Tutorials/Programming Tutorial KDE 3/KHTML

From KDE TechBase
Revision as of 20:32, 29 June 2011 by Neverendingo (talk | contribs) (Text replace - "<code cppqt n>" to "<syntaxhighlight lang="cpp-qt" line>")
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

For HTML parsing, you have the following possibilities:

  • QXML
  • QDOM
  • Perl
  • XHTML

Obviously, QXML and QDOM need XML-compliant HTML pages, and the least HTML pages are XML-compliant. Perl is not the scope of this site. This tutorial chooses the XHTML approach.

First step

As we remember from http://developernew.kde.org/Development/Tutorials/Programming_Tutorial_KDE_4/How_to_write_an_HTML_parser, biggest thing is to be able to parse non-XML-conform syntax. It works with the following program.

tags.cpp <syntaxhighlight lang="cpp-qt" line>

  1. include <kapplication.h>
  2. include <kaboutdata.h>
  3. include <kcmdlineargs.h>
  4. include <dom/html_document.h>

int main (int argc, char *argv[]) {

       KAboutData aboutData( "test", "test",
       "1.0", "test", KAboutData::License_GPL,
       "(c) 2006" );
       KCmdLineArgs::init( argc, argv, &aboutData );
       KApplication khello;
       DOM::HTMLDocument doc;
       DOM::DOMString tag("*");
       DOM::DOMString uri("<html><body><a href=\"http://www.kde.org/\"></a><a href=\"/index.php\" nowrap>Log in</a><a href=\"http://www.gmx.de\"></a></body></html>");
       doc.loadXML(uri);
       kdDebug() << "Does this doc have child elements ? " << doc.hasChildNodes() << endl;
       for (int i=0; i<doc.getElementsByTagName(tag).length(); i++) kdDebug() << doc.getElementsByTagName(tag).item(i).nodeName().string() << endl;
       kdDebug() << "Size of your doc " << sizeof(doc.firstChild()) << endl;
       kdDebug() << doc.isHTMLDocument() << endl;
       kdDebug() << doc.toString().string() << endl;

}


Compile it like this:

gcc -I/usr/lib/qt3/include -I/opt/kde3/include \
-L/opt/kde3/lib -lkdeui -lkhtml -o tags tags.cpp

Second

<syntaxhighlight lang="cpp-qt">

  1. include <kapplication.h>
  2. include <kaboutdata.h>
  3. include <kcmdlineargs.h>
  4. include <dom/html_document.h>
  5. include <dom/html_element.h>
  6. include <dom/dom_node.h>

int main (int argc, char *argv[]) {

       KAboutData aboutData( "test", "test",
       "1.0", "test", KAboutData::License_GPL,
       "(c) 2006" );
       KCmdLineArgs::init( argc, argv, &aboutData );
       KApplication khello;
       DOM::HTMLDocument doc;
       DOM::DOMString tag("*");
       DOM::DOMString uri("<html><body><a href=\"http://www.kde.org/\">fat</a><a href=\"/index.php\" nowrap>Log in</a><a href=\"http://www.gmx.de\"></a></body></html>");
       doc.loadXML(uri);
       kdDebug() << "Here's a list of the document elements" << endl;
       for (int i=0; i<doc.getElementsByTagName(tag).length(); i++) kdDebug() << doc.getElementsByTagName(tag).item(i).nodeName().string() << endl;
      
       DOM::HTMLDocument doc2;
       DOM::DOMString uri2("<html><body>this is htmlfat</body></html>");
       doc2.loadXML(uri2);
       kdDebug() << "This is the in-memory html:" << endl;
       kdDebug() << doc.toString().string() << endl;
       doc.body().insertBefore(doc.body().firstChild().firstChild(),doc.body().firstChild());
       kdDebug() << "Moving around nodes" << endl;
       kdDebug() << doc.toString().string() << endl;

}