Translate

Jump to: navigation, search
Settings

Information about the group Translation of the wiki page Development/KDevelop-PG-Qt Introduction.
NavigationShowing messages from 1 to 100 of 200. [ Previous page ] [ Next page ]
Development/KDevelop-PG-Qt IntroductionCurrent message text
...troduction/Page display title/ruDevelopment/Описание KDevelop-PG-Qt
...KDevelop-PG-Qt Introduction/1/ru== Вводная часть ==
...KDevelop-PG-Qt Introduction/2/ru'''KDevelop-PG-Qt''' является генератором кода парсера для ''KDevplatform'' и используется некоторыми расширениями поддержки языков программирования в ''KDevelop'' (например: Ruby, PHP, Java...).
...KDevelop-PG-Qt Introduction/3/ru'''KDevelop-PG-Qt''' основывается на классах Qt, а так же оригинальном парсере '''KDevelop-PG''', который использует типы данных из библиотеки STL, однако у него более широкие возможности. Большинство возможностей схожи, хотя возможно что ...-Qt парсер-генератор более современен и функционально богаче, чем обычный генератор в стиле STL. Для создания парсеров, используемых для расширений поддержки языков программирования в KDevelop следует использовать ...-Qt версию парсер-генератора.
...KDevelop-PG-Qt Introduction/4/ru== Детальная информация  ==
...KDevelop-PG-Qt Introduction/5/ruЭтот документ не подразумевается как полноценное и глубокое описание для всех частей '''KDevelop-PG'''. Вместо этого он предназначен для краткого введения и, что более важно, ссылается на разработчиков.
...KDevelop-PG-Qt Introduction/6/ruДля получения более детальной информации, прочтите отличную диссертацию бакалавра Якоба Петровитса (Jakob Petsovits). Вы найдёте её поссылке с секции Web-ссылок в нижней части этой страницы (''примечание переводчика: на самом деле, к сожалению, этот документ был удалён и ссылка ведёт вникуда. Ведётся поиск этого документа'').
...KDevelop-PG-Qt Introduction/7/ru== Приложение  ==
...KDevelop-PG-Qt Introduction/8/ru=== Как использовать ===
...KDevelop-PG-Qt Introduction/9/ru'''KDevelop-PG-Qt''' можно найти в репозитории [https://projects.kde.org/projects/extragear/kdevelop/utilities/kdevelop-pg-qt git]. В исходном коде имеется 4 пакета с примерами.<br />
Чтобы скачать попробуйте:
{{Input|1=git clone git://anongit.kde.org/kdevelop-pg-qt.git}}
или: 
{{Input|1=git clone kde:kdevelop-pg-qt}} (если имеете установленный '''git''' с настроенным URL префиксом "kde")
...Develop-PG-Qt Introduction/10/ruПри запуске программа требует файл .g, называемый грамматическим:
{{Input|1=./kdev-pg-qt --output=''prefix'' syntax.g}}
...Develop-PG-Qt Introduction/11/ruЗначение параметра ''--output'' определяет префикс выходных файлов, а так же пространство имён для генерируемого кода.
Программа '''Kate''' производит простую подсветку синтаксиса для грамматических файлов '''KDevelop-PG-Qt'''.
...Develop-PG-Qt Introduction/12/ru=== Формат на выходе ===
...Develop-PG-Qt Introduction/13/ruWhile evaluating the grammar and generating its parser files, the application will output information about so called ''conflicts'' to STDOUT. As said above, the following files will actually be prefixed.
...Develop-PG-Qt Introduction/14/ru==== ast.h  ====
...Develop-PG-Qt Introduction/15/ruAST stands for [http://en.wikipedia.org/wiki/Abstract_syntax_tree Abstract Syntax Tree]. It defines the data structure in which the parse tree is saved. Each node is a struct with the postfix ''Ast'', which contains members that point to any possible sub elements.
...Develop-PG-Qt Introduction/16/ru==== parser.h and parser.cpp  ====
...Develop-PG-Qt Introduction/17/ruOne important part of ''parser.h'' is the definition of the parser tokens, the ''TokenType'' enum. The TokenStream of your lexer should to use this. You have to write your own lexer or let one generate by '''Flex'''. See also the part about Tokenizers/Lexers below.
...Develop-PG-Qt Introduction/18/ruHaving the token stream available, you create your root item and call the parser on the parse method for the top-level AST item, e.g. DocumentAst* =&gt; parseDocument(&amp;root). On success, root will contain the AST.<br />
...Develop-PG-Qt Introduction/19/ruThe parser will have one parse method for each possible node of the AST. This is nice for e.g. an expression parser or parsers that should only parse a sub-element of a full document.
...Develop-PG-Qt Introduction/20/ru==== visitor.h and visitor.cpp  ====
...Develop-PG-Qt Introduction/21/ruThe Visitor class provides an abstract interface to walk the AST. Most of the time you don't need to use this directly, the DefaultVisitor takes some work off your shoulders.
...Develop-PG-Qt Introduction/22/ru==== defaultvisitor.h and defaultvisitor.cpp  ====
...Develop-PG-Qt Introduction/23/ruThe DefaultVisitor is an implementation of the abstract Visitor interface and automatically visits each node in the AST. Hence, this is probably the best candidate for a base class for your personal visitors. Most language plugins use these in their Builder classes to create the DUChain.<br />
...Develop-PG-Qt Introduction/24/ru=== Command-Line-Options  ===
...Develop-PG-Qt Introduction/25/ru* --namespace=''namespace'' - sets the C++ namespace for the generated sources independently from the file prefix. When this option is set, you can also use / in the --ouput option
* --no-ast - don't create the ast.h file, more to that below
* --debug-visitor - generates a debug visitor that prints the AST
* --serialize-visitor - generates code for serialization via a QIODevice
* --terminals - all tokens will be written into the file ''kdev-pg-terminals''
* --symbols - all possible nodes from the AST (not the leafs) will be written into the file ''kdev-pg-symbols'' 
* --rules - all grammar rules with informationen about their syntactic correlations will be written into a file called ''kdev-pg-rules''. useful for debugging and solving conflicts
* --token-text - generates a function to map token-numbers onto token-names
* --help - print usage information
...Develop-PG-Qt Introduction/26/ru== Tokenizers/Lexers ==
...Develop-PG-Qt Introduction/27/ruAs mentioned, '''KDevelop-PG-Qt''' requires a Tokenizer. You can either let '''KDevelop-PG-Qt''' generate one for you, write one per hand, as it has been done for C++ and PHP, or you can use external tools like '''Flex'''.
...Develop-PG-Qt Introduction/28/ruThe tokenizer's job, in principle, boils down to:
...Develop-PG-Qt Introduction/29/ru* converting keywords and chars with special meanings to tokens
* converting literals and identifier to tokens
* clean out anything that doesn't change the semantics, e.g. comments or whitespace (the latter of course not in Python)
* while doing the above, handling character encoding (we recommend using UTF8 as much as possible)
...Develop-PG-Qt Introduction/30/ruThe rest, e.g. actually building the tree and evaluating the semantics, is part of the parser and the AST visitors.<br>
...Develop-PG-Qt Introduction/31/ru=== Using KDevelop-PG-Qt ===
...Develop-PG-Qt Introduction/32/ru'''KDevelop-PG-Qt''' can generate lexers being well integrated into its architecture (you do not have to create a token-stream-class invoking lex or something like that). See examples/foolisp in the code for a simplistic example, there is also an incomplete PHP-Lexer for demonstration purposes.
...Develop-PG-Qt Introduction/33/ru==== Regular Expressions ====
Regular expressions are used to write rules using the KDevelop-PG-Qt, we use the following syntax (α and β are arbitrary regular expressions, a and b characters):
*α|β accepts any word accepted by α or accepted by β
*α&β accepts any word accepted by both α and β
*α^β accepts any word accepted by a but not by β
*~α accepts any word not accepted by α
*?α like α, but also accepts the empty word
*α* accepts any (maybe empty) sequence of words accepted by α
*α+ accepts any nonempty sequence of words accepted by α (equivalent to αα*)
*α@β accepts any nonempty sequence of words accepted by α separated by words accepted by β (equivalent to α(βα)*)
*αβ accepts words consisting of a word accepted by α followed by a word accepted by β
*[…] switches to “disjunctive” environment, αβ will get interpreted as α|β, you can use (…) inside the brackets to go back to normal mode
*. accepts any single character
*a-b accepts a single character between a and b (including a and b) in the Unicode (of course only characters that can be represented in the used encoding)
*"…" will accept the word enclosed by the quotation marks, escape sequences will still get interpreted
*a accepts the word consisting of the single character a
*Any escape sequence (see below), accepts the word consisting of the character represented by the escape sequence
*{⟨name⟩} accepts any word accepted by the regex named ⟨name⟩
...evelop-PG-Qt Introduction/178/ruAll regular expressions are case sensitive. Sorry, there is currently no way for insensitivity.
...evelop-PG-Qt Introduction/179/ru==== Known Escape Sequences ====
There are several escape sequences which can be used to encode special characters:
*\n, \t, \f, \v, \r, \0, \b, \a like in C
*\x, \X, \u or \U followed by hex digits: character represented by this Unicode value (in hex)
*\d, \D followed by decimal digits, same, but in decimal representation
*\o, \O followed by octal digits, same, but in octal representation
*\y, \Y followed by binary digits, same, but in binary representation
...evelop-PG-Qt Introduction/180/ru==== Predefined named regex ====
Some regexes are predefined and can be used using braces {⟨name⟩}. They get imported from the official Unicode data, some important regexes:
*{alphabetic} any alphabetic character
*{num} any numeric character
*{ascii-range} any character representable in ASCII
*{latin1-range} any character representable in Latin 1 (8 Bit)
*{uppercase}
*{lowercase}
*{math}
...evelop-PG-Qt Introduction/181/ru==== Rules ====
Rules can be written as:
{{Input|1=
⟨regular expression⟩ TOKEN;
}}
...evelop-PG-Qt Introduction/182/ruThen the Lexer will generate the token TOKEN for lexemes matching the given regular expression. Which token will be chosen if there are multiple options? We use the ''first longest match'' rule: It will take the longest possible match (eating as many characters as possible), if there are multiple of those matches, it will take the first one.
...evelop-PG-Qt Introduction/183/ruRules can perform code actions and you can also omit tokens (then no token will be generated):
{{Input|1=
⟨regular expression⟩ [: ⟨code⟩ :] TOKEN;
⟨regular expression⟩ [: ⟨code⟩ :];
⟨regular expression⟩ ;
}}
...evelop-PG-Qt Introduction/184/ruThere is rudimentary support for ''lookahead'' and so called (our invention) ''barriers'':
{{Input|1=
⟨regular expression⟩ %la(⟨regular expression⟩);
⟨regular expression⟩ %ba(⟨regular expression⟩);
}}
The first rule will only accept words if they match the first regular expression and are followed by anything matching the expression specified using %la. The second rule will accept words matched by the first regular expression but will never run into a character sequence matching the regex specified by %ba. However, currently only rules with fixed length are allowed in %la and %ba (for example foo|bar, but not qux|garply). When applying the “first longest match” rule the %la/%ba expressions count, too.
...evelop-PG-Qt Introduction/185/ruYou can create your own named regexes using an arrow:
{{Input|1=
⟨regular expression⟩ -> ⟨identifier⟩;
}}
The first character of the identifier should not be upper case.
...evelop-PG-Qt Introduction/186/ruAdditionally there are two special actions:
{{Input|1=
⟨regular expression⟩ %fail;
⟨regular expression⟩ %continue;
}}
...evelop-PG-Qt Introduction/187/ru%fail will stop tokenization. %continue will make the matched characters part of the next token.
...evelop-PG-Qt Introduction/188/ru==== Rulesets ====
...evelop-PG-Qt Introduction/189/ruA grammar file can contain multiple ''rulesets''. A ruleset is a set of rules, as described in the previous section. It gets declared using:
{{Input|1=
%lexer "name" ->
  ⟨rules⟩
  ;
}}
...evelop-PG-Qt Introduction/190/ruFor your main-ruleset you omit the name (the name will be “start”).
...evelop-PG-Qt Introduction/191/ruUsually the start-ruleset will be used. But you can change the ruleset in code actions using the macro lxSET_RULE_SET(⟨name⟩). You can specify code to be executed when entering or leaving a ruleset by using %enter [: ⟨code⟩ :]; or %leave [: ⟨code⟩ :]; respectively inside the definition of the ruleset.
...evelop-PG-Qt Introduction/192/ru==== Further Configuration and Output ====
The standard statements %lexer_declaration_header and %lexer_bits_header are available to include files in the generated lexer.h/lexer.cpp.
...evelop-PG-Qt Introduction/193/ruBy using %lexer_base you can specify the baseclass for the lexer-class, by default it is the TokenStream class defined by KDevelop-PG-Qt.
...evelop-PG-Qt Introduction/194/ruAfter %lexerclass(bits) you can specify code to be inserted in lexer.cpp.
...evelop-PG-Qt Introduction/195/ruYou have to specify an encoding the lexer should work with internally using %input_encoding "⟨encoding⟩", possible values:
*ASCII (7-Bit)
*Latin 1 (8-Bit)
*UTF-8 (8-Bit, full Unicode)
*UCS-2 (16-Bit, UCS-2 part of Unicode)
*UTF-16 (16-Bit, full Unicode)
*UTF-32 (32-Bit, full Unicode)
...evelop-PG-Qt Introduction/196/ruWith %input_stream you can specify which class the lexer should use to get the characters to process, there are some predefined classes:
*QStringIterator, reads from QString, required (internal) encoding: UTF-16 or UCS-2
*QByteArrayIterator, reads from QByteArray, required encoding: ASCII, Latin-1 or UTF-8
*QUtf16ToUcs4Iterator, reads from UTF-16 QString, required encoding: UTF-32 (UCS-4)
*QUtf8ToUcs4Iterator, reads from UTF-8 QByteArray, required encoding: UTF-32 (UCS-4)
*QUtf8ToUcs2Iterator, reads from UTF-8 QByteArray, required encoding: UCS-2
*QUtf8ToUtf16Iterator, reads from UTF-8 QByteArray, required encoding: UTF-16
*QUtf8ToAsciiIterator, reads from UTF-8 QByteArray, will ignore all non-ASCII characters, reqired encoding: ASCII
...evelop-PG-Qt Introduction/197/ruWhether you choose UTF-8, UTF-16 or UTF-32 is irrelevant for functionality, but it may significantly affect compile-time and run-time performance (you may want to test your Lexer with ASCII if compilation takes too long). For example you want to work with a QByteArray containing UTF-8 data, and you want to get full Unicode support: You could either use the QByteArrayIterator and UTF-8 as internal encoding, or the QUtf8ToUtf16Iterator and UTF-16, or the QUtf8ToUcs4Iterator and UTF-32.
...evelop-PG-Qt Introduction/198/ruYou can also choose between %table_lexer; and %sequence_lexer; In the first case transitions between states of the lexer will get represented by big tables while generating the lexer (cases in the generated code). In the second case sequences will get stored in a compressed data structure and transitions will get represented by nested if-statements. For UTF-32 %table_lexer is infeasible, thus there %sequence_lexer is the onliest option.
...evelop-PG-Qt Introduction/199/ruInside your actions of the lexer you can use some predefined macros:
{{Input|1=
lxCURR_POS  // position in the input (some kind of iterator or pointer)
lxCURR_IDX  // index of the position in the input
           // (it is the index as presented in the input, for example: input is a QByteArray, index incrementation per byte, but the lexer may operate on 32-bit codepoints)
lxCONTINUE  // like %continue, add the current lexeme to the next token
lxLENGTH    // length of the current lexeme (as presented in the input)
lxBEGIN_POS // position of the first character of the current lexeme
lxBEGIN_IDX  // corresponding index
lxNAMED_TOKEN(⟨name⟩, ⟨type⟩) // create a variable representing named ⟨name⟩ with the token type ⟨type⟩
lxTOKEN(⟨type⟩)              // create such a variable named “token”
lxDONE      // return the token generated before
lxRETURN(X) // create a token of type X and return it
lxEOF      // create the EOF-token
lxFINISH    // create the EOF-token and return it  (will stop tokenization)
lxFAIL      // raise the tokenization error
lxSKIP      // continue with the next lexeme (do not return a token, you should not have created one before)
lxNEXT_CHR(⟨chr⟩)            // set the variable ⟨chr⟩ to the next char in the input
yytoken    // current token
}}
...Develop-PG-Qt Introduction/34/ru=== Using Flex ===
...Develop-PG-Qt Introduction/35/ruWith the existing examples, it shouldn't be too hard to write such a lexer. Between most languages, especially those ''"inheriting"'' C, there are many common syntactic elements. Especially comments and literals can be handled just the same way over and over again. Adding a simple token is trivial:
...Develop-PG-Qt Introduction/36/ru{{Input|1="special-command"    return Parser::Token_SPECIAL_COMMAND; }}
...Develop-PG-Qt Introduction/37/ruThat's pretty much it, take a look at eg. ''java.ll'' for an excellent example. However, it is quite tricky and ugly to handle UTF-8 with Flex.
...Develop-PG-Qt Introduction/38/ru== How to write Grammar-Files  ==
...Develop-PG-Qt Introduction/39/ru=== Context-Free Grammars  ===
...Develop-PG-Qt Introduction/40/ru'''KDevelop-PG-Qt''' uses so called [http://en.wikipedia.org/wiki/Context-free_grammars context-free grammars] using a concept of non-terminals (nodes) and terminals(tokens). While writing the grammar for the basic structure of your language, you should try to mimic the semantics of the language. Lets take a look at an example:
...Develop-PG-Qt Introduction/41/ruC++-document consists of lots of declarations and definitions, a class definition could be handled e.g. in the following way:
...Develop-PG-Qt Introduction/42/ru#''CLASS-token''
#a ''identifier''
#the ''{-token''
#a ''member-declarations-list''
#the ''}-token''
#and finally the '';-token''
...Develop-PG-Qt Introduction/43/ruThe ''member-declarations-list'' is of course not a part of any C++ description, it is just a ''helper'' to explain the structure of a given semantic part of your language. The grammar could then define how exactly such helper might look like.
...Develop-PG-Qt Introduction/44/ru=== Basic Syntax  ===
...Develop-PG-Qt Introduction/45/ruNow let us have a look at a basic example, a declaration in C++, as described in grammar syntax:
...Develop-PG-Qt Introduction/46/ru{{Input|1=
   class_declaration
 | struct_declaration
 | function_declaration
 | union_declaration
 | namespace_declaration
 | typedef_declaration
 | extern_declaration
-> declaration ;
}}
...Develop-PG-Qt Introduction/47/ruThis is called a ''rule'' definition. Every lower-case string in the grammar file references such a rule. Our case above defines what a ''declaration'' looks like. The ''|''-char stands for a logical ''or'', all rules have to end on two semicolons.
...Develop-PG-Qt Introduction/48/ruIn the example we reference other rules which also have to be defined. Here's for example the ''class_declaration'', note the tokens in all-upper-case:
...Develop-PG-Qt Introduction/49/ru{{Input|1=
   CLASS IDENTIFIER SEMICOLON
 | CLASS IDENTIFIER LBRACE class_declaration* RBRACE SEMICOLON
-> class_declaration ;
}}
...Develop-PG-Qt Introduction/50/ruThere is a new char in there: The asterisk has the same meaning as in regular expressions, i.e. that the previous rule can occur arbitrarily often or not at all.
...Develop-PG-Qt Introduction/51/ruIn a grammar <code> 0 </code> stands for an empty token. Using it in addition with parenthesizing and the logical ''or'' from above, you can express optional elements:
...Develop-PG-Qt Introduction/52/ru{{Input|1=
  some_required_rule SOME_TOKEN
   <nowiki>( some_optional_stuff | some_other_stuff | 0 )</nowiki>
-> my_rule ;
}}
...Develop-PG-Qt Introduction/53/ruAll symbols never occuring on the left side of a rule are start-symbols. You can use one of them to start parsing.
...Develop-PG-Qt Introduction/54/ru=== Making matched rules available to Visitors  ===
...Develop-PG-Qt Introduction/55/ruThe simple rule above could be used to parse the token stream, yet no elements would be saved in the parsetree. This can be easily done though:
...Develop-PG-Qt Introduction/56/ru{{Input|1=
   class_declaration=class_declaration
 | struct_declaration=struct_declaration
 | function_declaration=function_declaration
 | union_declaration=union_declaration
 | namespace_declaration=namespace_declaration
 | typedef_declaration=typedef_declaration
 | extern_declaration=extern_declaration
-> declaration ;
}}
...Develop-PG-Qt Introduction/57/ruThe DeclarationAst struct now contains pointers to each of these elements. During the parse process the pointer for each found element gets set, all others become NULL. To store lists of elements, prepend the identifier with a hash <keycap> # </keycap>:
...Develop-PG-Qt Introduction/58/ru{{Input|1=
   CLASS IDENTIFIER SEMICOLON
 | CLASS IDENTIFIER LBRACE (#class_declaration=class_declaration)* RBRACE SEMICOLON
-> class_declaration ;
}}
...Develop-PG-Qt Introduction/59/ru'''TODO: internal structure of the list, important for Visitors'''
...Develop-PG-Qt Introduction/60/ruIdentifier and targets can be used in more than one place:
...Develop-PG-Qt Introduction/61/ru{{Input|1=
   #one=one (#one=one)*
-> one_or_more ;
}}
...Develop-PG-Qt Introduction/62/ruIn the example above, all matches to the rule ''one'' will be stored in one and the same list ''one''.
...Develop-PG-Qt Introduction/63/ru=== Defining available Tokens  ===
...Develop-PG-Qt Introduction/64/ruSomewhere in the grammar (you should probably put it near the head) you'll have to define a list of available Tokens. From this list, the ''TokenType'' enum in ''parser.h'' will be created. Additionally to the enum value names you should define an explanation name which will e.g. be used in error messages. Note that the representation of a Token inside the source code is not required for the grammar/parser as it operates on a TokenStream, see Lexer/Tokenizer section above.
...Develop-PG-Qt Introduction/65/ru{{Input|1=
%token T1 ("T1-Name"), T2 ("T2-Name"), COMMA (";"), SEMICOLON (";") ;
}}
...Develop-PG-Qt Introduction/66/ruIt is possible to use ''%token'' multiple times to group tokens in the grammar. Though all tokens will still be put into the same ''TokenType'' enum.
...Develop-PG-Qt Introduction/67/ru'''TODO: explain process of using the parser Tokens'''
...Develop-PG-Qt Introduction/68/ru=== Special Syntax...  ===
...Develop-PG-Qt Introduction/69/ru==== ...to use inside Rules  ====
...Develop-PG-Qt Introduction/70/ru===== List of one or more elements =====
...Develop-PG-Qt Introduction/71/ruAlternatively to the asterisk (''*'') you can use a plus sign (''+'') to mark lists of one-or-more elements:
...Develop-PG-Qt Introduction/72/ru{{Input|1=
   (#one=one)+
-> one_or_more ;
}}
...Develop-PG-Qt Introduction/73/ru==== Separated lists ====
...Develop-PG-Qt Introduction/74/ruUsing the ''#rule @ TOKEN'' syntax you can mark a list of ''rule'', separated by ''TOKEN'':
...Develop-PG-Qt Introduction/75/ru{{Input|1=
   #item=item @ COMMA
-> comma_separated_list ;
}}
...Develop-PG-Qt Introduction/76/ru==== Optional items ====
...Develop-PG-Qt Introduction/77/ruAlternatively to the above mentioned ''(item=item | 0)'' syntax you can use the following to mark optional items:
NavigationShowing messages from 1 to 100 of 200. [ Previous page ] [ Next page ]
KDE® and the K Desktop Environment® logo are registered trademarks of KDE e.V.Legal