Projects/Summer of Code/2007/Projects/KAider: Difference between revisions

    From KDE TechBase
     
    (85 intermediate revisions by 7 users not shown)
    Line 1: Line 1:
    ''' WARNING ''' KAider was renamed to '''[http://userbase.kde.org/Lokalize Lokalize]''' and will be included in kdesdk package for KDE 4.1
    Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI).
    See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]
    ==Current state==
    ==Current state==
    Already has:
    Already has:
    * syntax highlighting
    * syntax highlighting
    * spellcheck (problems with dividing filter:doesnt check the last word)
    * spellcheck (sonnet needs improvement)
    * search-n-replace, ignoring accel marks
    * search-n-replace, ignoring accel marks
    * small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor)
    * formats .po file output better so less diff is generated by scripty
    * entry bookmarks
    * small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor), entry bookmarks
    * viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
    * viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
    * merge mode for editors (QA) or when several translators work on the same file [http://kv-53.narod.ru/kaider2.png screenshot]
    * merge mode for editors (QA) or when several translators work on the same file [http://kv-53.narod.ru/kaider2.png screenshot]
    * basic projectmanager functionality [http://kv-53.narod.ru/kaider1.png screenshot]
    * basic projectmanager functionality [http://kv-53.narod.ru/kaider1.png screenshot]
    * glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. KAider displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary [http://kv-53.narod.ru/kaider3.png screenshot]
    * Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts [http://kv-53.narod.ru/kaider_tm.png screenshot]
     
    * for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
    See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]
    * glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. [http://kv-53.narod.ru/kaider3.png screenshot 1] [http://kv-53.narod.ru/kaider4.png screenshot 2]
    * Search/Replace functions in multiple files at once.
    * Spellchecking of multiple files at once.
    * beginnings of XLIFF support


    ==Compiling==
    ==Compiling==
    After you [[Getting_Started/Build/KDE4|set kde env up]] (compiling kdelibs is enough):
    After you [[Getting_Started/Build|set kde env up]] (compiling kdelibs+kdebase is enough):
      cd trunk
      svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
      svn up playground/devtools/kaider
      cd kdesdk && svn up cmake doc lokalize
    mkdir kdesdk/build && chmod a+w kdesdk/build
      su kde-devel
      su kde-devel
    mkdir playground/devtools/kaider/build
      cd kdesdk/build
      cd playground/devtools/kaider/build
      cmakekde ..
      cmakekde ..


    as a root, run sshd and then from the usual shell:
    as a root, run sshd and then from the usual shell:
      ssh -XC kde-devel@localhost
      ssh -XC kde-devel@localhost
      kaider
      lokalize
     
    you can get catalogmanager by specifying --project option
    lokalize --project /path/to/index.ktp


    ==Roadmap==
    See [[Projects/Summer_of_Code/2007/Projects/KAider#Setup]].
    *[basic framework DONE] project management -- 1-2 weeks
    *[DONE] context glossary -- 0.5-1 week
    *translation DB (QtSql) -- 2 weeks
    *[DONE] mode for merging translations for editors (QA) -- 1 week
    *scripting API + sipping on google translate for live glossary (kross) - 2 weeks
    *the remaining time is for perfection/polishing/small improvements and xliff+qt-linguist support


    ==What i'm doing these days==
    ===Debian users===
    *kross WebQuery framework
    You can install the latest version of lokalize from experimental repository: [http://packages.debian.org/experimental/lokalize]
    *impovements on ProjectView (dbus, etc), Glossary


    ==Ideas==
    ==Setup==
    Further work:
    * Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
    * Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
    * Populate Translation Memory by dropping .po files onto TM View
     
    See [http://websvn.kde.org/trunk/l10n-kde4/ru/] for an example project structure
     
    ==Maxims==
    * Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
    * Do automatization _everywhere_ possible
    * Focus on translation quality. This is open source -- so source code is available (for change)
     
    ==Further work==
    *WebQuery for twin languages (like Ukrainian and Russian)
    *xliff+qt-linguist support (see [[#KBabel features to be implemented|KBabel features to be implemented]])
    * Glossary checklists: check for forbidden terms in new translation
    * project-wise and program-wise: webquery scripts, glossaries, TMs
    * check for different translations of the same msgid (use strigi?)
    * back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
    * Tighten SVN support: svn diff-like feature
    * Automatic Glossary building
    * Research on dividing into sentences rules (e.g. srx)
    * Research on dividing into sentences rules (e.g. srx)
    * Automate submitting translation suggestions to translate.google.com [Kross action]
    * fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]


    Not for KDE:
    Not for KDE:
    Line 46: Line 71:
    * Make nice windoze package for the windowzerz
    * Make nice windoze package for the windowzerz


    ==KBabel features to be implemented==
    Competitors (ideas):
    ...in the smarter way :). After or during the summer.
    * [http://www.triplespin.com/en/products/locfactoryeditor.html#whatsnew locfactoryeditor] --Mac only
    * Showing source code by references in message comments
    * [http://www.heartsome.net/EN/downloads.html Heartsome] --multiplatform, costs money
    * Character selection tool integration (kdelibs rule)
    * Go over [http://sourceforge.net/tracker/?atid=520350&group_id=68187&func=browse OmegaT wishlist] and ensure every sane wish is implemented
    * Sending the file using email (kdepimlibs rule)
    * persistent bookmarks for messages in a file
    * A plugin framework for validation tools for consistency checks (again kross)
    * Automatic syntax check with msgfmt when saving and if an error occured easy navigation to messages, which contain errors
    * Integrated basic CVS and SVN support
    * extended marking of .po and .pot files (e.g. translator that currently works on the file)
    * Automatic comparisons and statistics of POT and PO files for a quick overview which and how many files are translated (or not) and which files may be obsolete
    * Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed
    * Search/Replace functions in multiple files at once.
    * Spellchecking of multiple files at once.


    ==Setup==
    Converters (use, acting as a front-end):
    * Create project, saving *.ktp file to l10n-kde4/<LangCode>
    * [http://file2xliff4j.sourceforge.net/javadoc/file2xliff4j/package-summary.html file2xliff4j] --java-based
    * Create in the same dir file called terms.tbx, and fill it with initial contents:
    * [https://open-language-tools.dev.java.net] --java-based
    * [http://translate.sourceforge.net/wiki/toolkit/index Translate Toolkit] --python-based, checks, other goodness


    ===Glossary===
    ==KBabel features to be implemented==
    This is how a TBX glossary looks like:
    ...in a smarter way :)
    * persistent bookmarks for messages in a file saved in the project
    * extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
    * A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
    * Sending the file using email [Kross (project) action]
    * Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
    * PO File Header change [Kross action (+triggered on saving)]
    * Printing of selected messages (eg fuzzy ) [Kross action]
    Also:
    * msgid-diff-msgstr from [http://lichota.net/~krzysiek/projects/msgtools/] (features for all other commands are already implemented, if you haven't noticed this)


        <?xml version="1.0" encoding="UTF-8"?>
    ==KBabel features NOT to be implemented==
        <!DOCTYPE martif PUBLIC "ISO 12200:1999A//DTD MARTIF core (DXFcdV04)//EN" "TBXcdv04.dtd">
    * Character selection tool integration, sort by the frequency
       
    Why? Better improve system-wide charselect tool, OR...
        <martif type="TBX" xml:lang="en">
    modify your xorg keyboard layout!
            <martifHeader>
                <fileDesc>
                    <titleStmt>
                        <title>KDE Russian Team Glossary</title>
                    </titleStmt>
                </fileDesc>
            </martifHeader>
            <text>
                <body>
                    <termEntry id="1">
                        <descrip type="subjectField">Security</descrip>
                        <descrip type="relatedConcept">authorization</descrip>
                        <langSet xml:lang="en">
                            <ntig>
                                <termGrp>
                                    <term>authentication</term>
                                    <termNote type="partOfSpeech">noun</termNote>
                                </termGrp>
                                <descrip type="context">.</descrip>
                            </ntig>
                        </langSet>
                        <langSet xml:lang="ru">
                            <ntig>
                                <termGrp>
                                    <term>идентификация</term>
                                    <termNote type="partOfSpeech">сущ.</termNote>
                                </termGrp>
                            </ntig>
                        </langSet>
                        <langSet xml:lang="ru">
                            <ntig>
                                <termGrp>
                                    <term>подтверждение подлинности</term>
                                    <termNote type="partOfSpeech">сущ.</termNote>
                                </termGrp>
                            </ntig>
                        </langSet>
                    </termEntry>
                    <termEntry id="2">
                        <descrip type="subjectField">Security</descrip>
                        <descrip type="relatedConcept">authentication</descrip>
                        <langSet xml:lang="en">
                            <ntig>
                                <termGrp>
                                    <term>authorization</term>
                                    <termNote type="partOfSpeech">noun</termNote>
                                </termGrp>
                                <descrip type="context"></descrip>
                            </ntig>
                        </langSet>
                        <langSet xml:lang="ru">
                            <ntig>
                                <termGrp>
                                    <term>определение прав</term>
                                    <termNote type="partOfSpeech">сущ.</termNote>
                                </termGrp>
                            </ntig>
                        </langSet>
                    </termEntry>
                </body>
            </text>
        </martif>

    Latest revision as of 23:02, 19 March 2011

    WARNING KAider was renamed to Lokalize and will be included in kdesdk package for KDE 4.1

    Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI). See KAider/Introduction

    Current state

    Already has:

    • syntax highlighting
    • spellcheck (sonnet needs improvement)
    • search-n-replace, ignoring accel marks
    • formats .po file output better so less diff is generated by scripty
    • small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor), entry bookmarks
    • viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
    • merge mode for editors (QA) or when several translators work on the same file screenshot
    • basic projectmanager functionality screenshot
    • Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts screenshot
    • for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
    • glossary with basic tbx format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. screenshot 1 screenshot 2
    • Search/Replace functions in multiple files at once.
    • Spellchecking of multiple files at once.
    • beginnings of XLIFF support

    Compiling

    After you set kde env up (compiling kdelibs+kdebase is enough):

    svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
    cd kdesdk && svn up cmake doc lokalize
    mkdir kdesdk/build && chmod a+w kdesdk/build
    su kde-devel
    cd kdesdk/build
    cmakekde ..
    

    as a root, run sshd and then from the usual shell:

    ssh -XC kde-devel@localhost
    lokalize 
    

    you can get catalogmanager by specifying --project option

    lokalize --project /path/to/index.ktp
    

    See Projects/Summer_of_Code/2007/Projects/KAider#Setup.

    Debian users

    You can install the latest version of lokalize from experimental repository: [1]

    Setup

    • Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
    • Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
    • Populate Translation Memory by dropping .po files onto TM View

    See [2] for an example project structure

    Maxims

    • Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
    • Do automatization _everywhere_ possible
    • Focus on translation quality. This is open source -- so source code is available (for change)

    Further work

    • WebQuery for twin languages (like Ukrainian and Russian)
    • xliff+qt-linguist support (see KBabel features to be implemented)
    • Glossary checklists: check for forbidden terms in new translation
    • project-wise and program-wise: webquery scripts, glossaries, TMs
    • check for different translations of the same msgid (use strigi?)
    • back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
    • Tighten SVN support: svn diff-like feature
    • Automatic Glossary building
    • Research on dividing into sentences rules (e.g. srx)
    • Automate submitting translation suggestions to translate.google.com [Kross action]
    • fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]

    Not for KDE:

    • Be complete computer-aided translation system by providing e.g. actions to import+export openoffice, txt and documents of other formats by calling appropriate scripts/commands. Define for that general kross actions interface.
    • Make nice windoze package for the windowzerz

    Competitors (ideas):

    Converters (use, acting as a front-end):

    KBabel features to be implemented

    ...in a smarter way :)

    • persistent bookmarks for messages in a file saved in the project
    • extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
    • A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
    • Sending the file using email [Kross (project) action]
    • Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
    • PO File Header change [Kross action (+triggered on saving)]
    • Printing of selected messages (eg fuzzy ) [Kross action]

    Also:

    • msgid-diff-msgstr from [4] (features for all other commands are already implemented, if you haven't noticed this)

    KBabel features NOT to be implemented

    • Character selection tool integration, sort by the frequency

    Why? Better improve system-wide charselect tool, OR... modify your xorg keyboard layout!