Development/Tutorials/Programming Tutorial KDE 3/KHTML

< Development‎ | Tutorials‎ | Programming Tutorial KDE 3
Revision as of 20:32, 29 June 2011 by Neverendingo (Talk | contribs) (Text replace - "<code cppqt n>" to "<syntaxhighlight lang="cpp-qt" line>")

Jump to: navigation, search

For HTML parsing, you have the following possibilities:

  • QXML
  • QDOM
  • Perl
  • XHTML

Obviously, QXML and QDOM need XML-compliant HTML pages, and the least HTML pages are XML-compliant. Perl is not the scope of this site. This tutorial chooses the XHTML approach.

First step

As we remember from http://developernew.kde.org/Development/Tutorials/Programming_Tutorial_KDE_4/How_to_write_an_HTML_parser, biggest thing is to be able to parse non-XML-conform syntax. It works with the following program.

tags.cpp

 1 #include <kapplication.h>
 2 #include <kaboutdata.h>
 3 #include <kcmdlineargs.h>
 4 #include <dom/html_document.h>
 5 
 6 int main (int argc, char *argv[])
 7 {
 8         KAboutData aboutData( "test", "test",
 9         "1.0", "test", KAboutData::License_GPL,
10         "(c) 2006" );
11         KCmdLineArgs::init( argc, argv, &aboutData );
12         KApplication khello;
13 
14         DOM::HTMLDocument doc;
15         DOM::DOMString tag("*");
16         DOM::DOMString uri("<html><body><a href=\"http://www.kde.org/\"></a><a href=\"/index.php\" nowrap>Log in</a><a href=\"http://www.gmx.de\"></a></body></html>");
17 
18         doc.loadXML(uri);
19         kdDebug() << "Does this doc have child elements ? " << doc.hasChildNodes() << endl;
20         for (int i=0; i<doc.getElementsByTagName(tag).length(); i++) kdDebug() << doc.getElementsByTagName(tag).item(i).nodeName().string() << endl;
21         kdDebug() << "Size of your doc " << sizeof(doc.firstChild()) << endl;
22         kdDebug() << doc.isHTMLDocument() << endl;
23         kdDebug() << doc.toString().string() << endl;
24 }
25 </code>
26 
27 
28 Compile it like this:
29  gcc -I/usr/lib/qt3/include -I/opt/kde3/include \
30  -L/opt/kde3/lib -lkdeui -lkhtml -o tags tags.cpp
31 
32 =Second=
33 
34 <syntaxhighlight lang="cpp-qt">
35 #include <kapplication.h>
36 #include <kaboutdata.h>
37 #include <kcmdlineargs.h>
38 #include <dom/html_document.h>
39 #include <dom/html_element.h>
40 #include <dom/dom_node.h>
41 
42 int main (int argc, char *argv[])
43 {
44         KAboutData aboutData( "test", "test",
45         "1.0", "test", KAboutData::License_GPL,
46         "(c) 2006" );
47         KCmdLineArgs::init( argc, argv, &aboutData );
48         KApplication khello;
49 
50         DOM::HTMLDocument doc;
51         DOM::DOMString tag("*");
52         DOM::DOMString uri("<html><body><a href=\"http://www.kde.org/\"><b>fat</b></a><a href=\"/index.php\" nowrap>Log in</a><a href=\"http://www.gmx.de\"></a></body></html>");
53 
54         doc.loadXML(uri);
55         kdDebug() << "Here's a list of the document elements" << endl;
56         for (int i=0; i<doc.getElementsByTagName(tag).length(); i++) kdDebug() << doc.getElementsByTagName(tag).item(i).nodeName().string() << endl;
57        
58         DOM::HTMLDocument doc2;
59         DOM::DOMString uri2("<html><body>this is html<b>fat</b></body></html>");
60         doc2.loadXML(uri2);
61         kdDebug() << "This is the in-memory html:" << endl;
62         kdDebug() << doc.toString().string() << endl;
63         doc.body().insertBefore(doc.body().firstChild().firstChild(),doc.body().firstChild());
64         kdDebug() << "Moving around nodes" << endl;
65         kdDebug() << doc.toString().string() << endl;
66 }
67 </code>