Development/Tutorials/Programming Tutorial KDE 3/KHTML
For HTML parsing, you have the following possibilities:
- QXML
- QDOM
- perl
- khtml
Obviously, QXML and QDOM need xml-compliant html pages, and the least html pages are xml-compliant. Perl is not the scope of this site. So, this tutorial choses the khtml approach.
First step
Our first khtml-program does plain nothing: <highlightSyntax language="cpp">
- include <qstring.h>
- include <kapplication.h>
- include <kaboutdata.h>
- include <kmessagebox.h>
- include <kcmdlineargs.h>
- include <dom/html_document.h>
int main (int argc, char *argv[]) {
KAboutData aboutData( "test", "test", "1.0", "test", KAboutData::License_GPL, "(c) 2006" ); KCmdLineArgs::init( argc, argv, &aboutData ); KApplication khello; DOM::HTMLDocument();
} </highlightSyntax> It can be compiled like:
gcc -I/usr/lib/qt3/include -I/opt/kde3/include \ -L/opt/kde3/lib -lkdeui -lkhtml -o khtml khtml.cpp
Showing tags
The next program is more advanced, it shows you the first tags of an html file: <highlightSyntax language="cpp">
- include <kapplication.h>
- include <kaboutdata.h>
- include <kcmdlineargs.h>
- include <dom/html_document.h>
int main (int argc, char *argv[]) {
KAboutData aboutData( "test", "test", "1.0", "test", KAboutData::License_GPL, "(c) 2006" ); KCmdLineArgs::init( argc, argv, &aboutData ); KApplication khello; DOM::Document doc=DOM::Document(); DOM::HTMLDocument htmldoc=DOM::HTMLDocument(); DOM::DOMString tag("*"); doc.loadXML("hello.htm"); kdDebug() << "Does this doc have child elements ? " << doc.hasChildNodes() << endl; kdDebug() << "First child node name: " << doc.firstChild().nodeName().string() << endl; kdDebug() << "First grandchild node name: " << doc.firstChild().firstChild().nodeName().string() << endl; kdDebug() << "Count of elements in your doc " << doc.getElementsByTagName(tag).length()<< endl; kdDebug() << "Size of your doc " << sizeof(doc) << endl; kdDebug() << doc->toString().string() << endl;
} </highlightSyntax> You can use this e.g. with the following hello.htm:
<html> <head> <title>blah</title> </head> <body> <b>fat</b> <a href="http://www.de">denic</a> </body> </html>