Projects/Summer of Code/2007/Projects/KAider: Difference between revisions

    From KDE TechBase
     
    (30 intermediate revisions by 5 users not shown)
    Line 1: Line 1:
    KAider is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). KAider implies parapgraph-by-paragrah translation approach (when translating documentation) and message-by-message approach (when translating GUI).
    ''' WARNING ''' KAider was renamed to '''[http://userbase.kde.org/Lokalize Lokalize]''' and will be included in kdesdk package for KDE 4.1
     
    Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI).
    See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]
    See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]


    Line 14: Line 16:
    * Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts [http://kv-53.narod.ru/kaider_tm.png screenshot]
    * Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts [http://kv-53.narod.ru/kaider_tm.png screenshot]
    * for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
    * for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
    * glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. KAider displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary [http://kv-53.narod.ru/kaider3.png screenshot]
    * glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. [http://kv-53.narod.ru/kaider3.png screenshot 1] [http://kv-53.narod.ru/kaider4.png screenshot 2]
    * webquery view, flexible thanks to kross
    * Search/Replace functions in multiple files at once.
    * Search/Replace functions in multiple files at once.
    * Spellchecking of multiple files at once.
    * Spellchecking of multiple files at once.
    * beginnings of XLIFF support


    ==Compiling==
    ==Compiling==
    After you [[Getting_Started/Build/KDE4|set kde env up]] (compiling kdelibs is enough):
    After you [[Getting_Started/Build|set kde env up]] (compiling kdelibs+kdebase is enough):
      cd trunk/extragear
      svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
      svn up sdk
      cd kdesdk && svn up cmake doc lokalize
    mkdir kdesdk/build && chmod a+w kdesdk/build
      su kde-devel
      su kde-devel
    mkdir sdk/kaider/build
      cd kdesdk/build
      cd sdk/kaider/build
      cmakekde ..
      cmakekde ..


    as a root, run sshd and then from the usual shell:
    as a root, run sshd and then from the usual shell:
      ssh -XC kde-devel@localhost
      ssh -XC kde-devel@localhost
      kaider
      lokalize


    you can get catalogmanager by specifying --project option
    you can get catalogmanager by specifying --project option
      kaider --project /path/to/index.ktp
      lokalize --project /path/to/index.ktp
     
    See [[Projects/Summer_of_Code/2007/Projects/KAider#Setup]].


    See http://websvn.kde.org/trunk/l10n-kde4/ru/
    ===Debian users===
    You can install the latest version of lokalize from experimental repository: [http://packages.debian.org/experimental/lokalize]
     
    ==Setup==
    * Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
    * Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
    * Populate Translation Memory by dropping .po files onto TM View
     
    See [http://websvn.kde.org/trunk/l10n-kde4/ru/] for an example project structure


    ==Maxims==
    ==Maxims==
    * Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
    * Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
    * Do automatization _everywhere_ it possible
    * Do automatization _everywhere_ possible
    * Focus on quality. This is open source -- so source code is available (for change)
    * Focus on translation quality. This is open source -- so source code is available (for change)


    ==Further work==
    ==Further work==
    *dbus
    *WebQuery for twin languages (like Ukrainian and Russian)
    *WebQuery for twin languages (like ukrainian and russian)
    *xliff+qt-linguist support (see [[#KBabel features to be implemented|KBabel features to be implemented]])
    *xliff+qt-linguist support (see [[#KBabel features to be implemented|KBabel features to be implemented]])
    * Glossary checklists: check for forbidden terms in new translation
    * project-wise and program-wise: webquery scripts, glossaries, TMs
    * check for different translations of the same msgid (use strigi?)
    * back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
    * Tighten SVN support: svn diff-like feature
    * Automatic Glossary building
    * Automatic Glossary building
    * Research on dividing into sentences rules (e.g. srx)
    * Research on dividing into sentences rules (e.g. srx)
    * project-wise and program-wise: webquery scripts, glossaries, TMs
    * Glossary checklists: check for forbidden terms in new translation
    * Tighten SVN support: svn diff-like feature
    * Automate submitting translation suggestions to translate.google.com [Kross action]
    * Automate submitting translation suggestions to translate.google.com [Kross action]
    * back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
    * [Kross (project) action] that merges translations with updated template
    * fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]
    * fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]


    Line 60: Line 71:
    * Make nice windoze package for the windowzerz
    * Make nice windoze package for the windowzerz


    Competitors:
    Competitors (ideas):
    * [http://www.triplespin.com/en/products/locfactoryeditor.html#whatsnew locfactoryeditor] --Mac only
    * [http://www.triplespin.com/en/products/locfactoryeditor.html#whatsnew locfactoryeditor] --Mac only
    * [http://www.heartsome.net/EN/downloads.html Heartsome] - multiplatform, costs money
    * [http://www.heartsome.net/EN/downloads.html Heartsome] --multiplatform, costs money
    * Go over [http://sourceforge.net/tracker/?atid=520350&group_id=68187&func=browse OmegaT wishlist] and ensure every sane wish is implemented
     
    Converters (use, acting as a front-end):
    * [http://file2xliff4j.sourceforge.net/javadoc/file2xliff4j/package-summary.html file2xliff4j] --java-based
    * [https://open-language-tools.dev.java.net] --java-based
    * [http://translate.sourceforge.net/wiki/toolkit/index Translate Toolkit] --python-based, checks, other goodness


    ==KBabel features to be implemented==
    ==KBabel features to be implemented==
    ...in a smarter way :). After or during the summer.
    ...in a smarter way :)
    * persistent bookmarks for messages in a file saved in the project
    * persistent bookmarks for messages in a file saved in the project
    * extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
    * extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
    * Opening source code by references in message comments [Kross action]
    * A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
    * A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
    * Sending the file using email [Kross (project) action]
    * Sending the file using email [Kross (project) action]
    * Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
    * Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
    * CVS and SVN support [Kross project action] (is 'svn ci' so hard?)
    * statistics
    * PO File Header change [Kross action (+triggered on saving)]
    * PO File Header change [Kross action (+triggered on saving)]
    * Printing of selected messages (eg fuzzy ) [Kross action]
    * Printing of selected messages (eg fuzzy ) [Kross action]
    Line 83: Line 97:
    Why? Better improve system-wide charselect tool, OR...
    Why? Better improve system-wide charselect tool, OR...
    modify your xorg keyboard layout!
    modify your xorg keyboard layout!
    * Automatic ("rough" in kbabel terms) translation.
    Why? because sometimes one English string may have two or more different translations depending on the context.
    What I'm going to do is implement _interactive_ (or message-by-message) rough translation. If the message is already translated somewhere else, it suggests the translations (several, not one!) and displays them in the helper window. Translator may then choose one of the translation suggestions by pressing ctrl+1, ctrl+2, .. or ctrl+9, which will immediately insert it into msgstr  (replacing the old translation if it exists).
    What old rough translation didn't provide is the ability to choose.
    UPDATE: ok, i'll probably allow translator to autopopulate catalog with:
    *one-choice suggestions (e.g. if there are only one 100% suggestion), optionally mark fuzzy
    *the most recent suggestions for the cases of several 100% items, mark fuzzy
    ==Setup==
    * Create project, saving *.ktp file to l10n-kde4/<LangCode>
    * Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
    * Populate Translation Memory by dropping .po files onto TM View

    Latest revision as of 23:02, 19 March 2011

    WARNING KAider was renamed to Lokalize and will be included in kdesdk package for KDE 4.1

    Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI). See KAider/Introduction

    Current state

    Already has:

    • syntax highlighting
    • spellcheck (sonnet needs improvement)
    • search-n-replace, ignoring accel marks
    • formats .po file output better so less diff is generated by scripty
    • small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor), entry bookmarks
    • viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
    • merge mode for editors (QA) or when several translators work on the same file screenshot
    • basic projectmanager functionality screenshot
    • Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts screenshot
    • for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
    • glossary with basic tbx format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. screenshot 1 screenshot 2
    • Search/Replace functions in multiple files at once.
    • Spellchecking of multiple files at once.
    • beginnings of XLIFF support

    Compiling

    After you set kde env up (compiling kdelibs+kdebase is enough):

    svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
    cd kdesdk && svn up cmake doc lokalize
    mkdir kdesdk/build && chmod a+w kdesdk/build
    su kde-devel
    cd kdesdk/build
    cmakekde ..
    

    as a root, run sshd and then from the usual shell:

    ssh -XC kde-devel@localhost
    lokalize 
    

    you can get catalogmanager by specifying --project option

    lokalize --project /path/to/index.ktp
    

    See Projects/Summer_of_Code/2007/Projects/KAider#Setup.

    Debian users

    You can install the latest version of lokalize from experimental repository: [1]

    Setup

    • Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
    • Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
    • Populate Translation Memory by dropping .po files onto TM View

    See [2] for an example project structure

    Maxims

    • Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
    • Do automatization _everywhere_ possible
    • Focus on translation quality. This is open source -- so source code is available (for change)

    Further work

    • WebQuery for twin languages (like Ukrainian and Russian)
    • xliff+qt-linguist support (see KBabel features to be implemented)
    • Glossary checklists: check for forbidden terms in new translation
    • project-wise and program-wise: webquery scripts, glossaries, TMs
    • check for different translations of the same msgid (use strigi?)
    • back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
    • Tighten SVN support: svn diff-like feature
    • Automatic Glossary building
    • Research on dividing into sentences rules (e.g. srx)
    • Automate submitting translation suggestions to translate.google.com [Kross action]
    • fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]

    Not for KDE:

    • Be complete computer-aided translation system by providing e.g. actions to import+export openoffice, txt and documents of other formats by calling appropriate scripts/commands. Define for that general kross actions interface.
    • Make nice windoze package for the windowzerz

    Competitors (ideas):

    Converters (use, acting as a front-end):

    KBabel features to be implemented

    ...in a smarter way :)

    • persistent bookmarks for messages in a file saved in the project
    • extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
    • A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
    • Sending the file using email [Kross (project) action]
    • Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
    • PO File Header change [Kross action (+triggered on saving)]
    • Printing of selected messages (eg fuzzy ) [Kross action]

    Also:

    • msgid-diff-msgstr from [4] (features for all other commands are already implemented, if you haven't noticed this)

    KBabel features NOT to be implemented

    • Character selection tool integration, sort by the frequency

    Why? Better improve system-wide charselect tool, OR... modify your xorg keyboard layout!