Development/Tutorials/Programming Tutorial KDE 3/KHTML: Difference between revisions

    From KDE TechBase
    Line 70: Line 70:
    </html>
    </html>
    </pre>
    </pre>
    You get an error because your file is not UTF-16 encoded. Here's how I proceed:
    scorpio:~/html # hexdump hello.htm
    0000000 3c00 000a
    0000003

    Revision as of 15:52, 29 October 2006

    For HTML parsing, you have the following possibilities:

    • QXML
    • QDOM
    • perl
    • khtml

    Obviously, QXML and QDOM need xml-compliant html pages, and the least html pages are xml-compliant. Perl is not the scope of this site. So, this tutorial choses the khtml approach.

    First step

    Our first khtml-program does plain nothing: <highlightSyntax language="cpp">

    1. include <qstring.h>
    2. include <kapplication.h>
    3. include <kaboutdata.h>
    4. include <kmessagebox.h>
    5. include <kcmdlineargs.h>
    6. include <dom/html_document.h>

    int main (int argc, char *argv[]) {

           KAboutData aboutData( "test", "test",
           "1.0", "test", KAboutData::License_GPL,
           "(c) 2006" );
           KCmdLineArgs::init( argc, argv, &aboutData );
           KApplication khello;
           DOM::HTMLDocument();
    

    } </highlightSyntax> It can be compiled like:

    gcc -I/usr/lib/qt3/include -I/opt/kde3/include \
    -L/opt/kde3/lib -lkdeui -lkhtml -o khtml khtml.cpp
    

    Showing tags

    The next program is more advanced, it shows you the first tags of an html file: <highlightSyntax language="cpp">

    1. include <kapplication.h>
    2. include <kaboutdata.h>
    3. include <kcmdlineargs.h>
    4. include <dom/html_document.h>

    int main (int argc, char *argv[]) {

     KAboutData aboutData( "test", "test",
     "1.0", "test", KAboutData::License_GPL,
     "(c) 2006" );
     KCmdLineArgs::init( argc, argv, &aboutData );
     KApplication khello;
    
     DOM::Document doc=DOM::Document();
     DOM::HTMLDocument htmldoc=DOM::HTMLDocument();
     DOM::DOMString tag("*");
     doc.loadXML("hello.htm");
     kdDebug() << "Does this doc have child elements ? " << doc.hasChildNodes() << endl;
     kdDebug() << "First child node name: " << doc.firstChild().nodeName().string() << endl;
     kdDebug() << "First grandchild node name: " << doc.firstChild().firstChild().nodeName().string() << endl;
     kdDebug() << "Count of elements in your doc " << doc.getElementsByTagName(tag).length()<< endl;
     kdDebug() << "Size of your doc " << sizeof(doc) << endl;
     kdDebug() << doc.toString().string() << endl;
    

    } </highlightSyntax> You can use this e.g. with the following hello.htm:

    <html>
    <head>
    <title>blah</title>
    </head>
    <body>
    <b>fat</b>
    <a href="http://www.de">denic</a>
    </body>
    </html>
    

    You get an error because your file is not UTF-16 encoded. Here's how I proceed:

    scorpio:~/html # hexdump hello.htm
    0000000 3c00 000a
    0000003