Projects/Summer of Code/2007/Projects/KAider: Difference between revisions

From KDE TechBase
 
(71 intermediate revisions by 7 users not shown)
Line 1: Line 1:
KAider is a computer-aided translation system that focuses on productivity and performance (nojava!). It implies parapgraph-by-paragrah translation approach (when translating documentation) and message-by-message approach whentranslation GUI.
''' WARNING ''' KAider was renamed to '''[http://userbase.kde.org/Lokalize Lokalize]''' and will be included in kdesdk package for KDE 4.1
 
Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI).
See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]
See [[Projects/Summer_of_Code/2007/Projects/KAider/Introduction|KAider/Introduction]]


Line 5: Line 7:
Already has:
Already has:
* syntax highlighting
* syntax highlighting
* spellcheck (problems with dividing filter:doesnt check the last word)
* spellcheck (sonnet needs improvement)
* search-n-replace, ignoring accel marks
* search-n-replace, ignoring accel marks
* small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor)
* formats .po file output better so less diff is generated by scripty
* entry bookmarks
* small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor), entry bookmarks
* viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
* viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
* merge mode for editors (QA) or when several translators work on the same file [http://kv-53.narod.ru/kaider2.png screenshot]
* merge mode for editors (QA) or when several translators work on the same file [http://kv-53.narod.ru/kaider2.png screenshot]
* basic projectmanager functionality [http://kv-53.narod.ru/kaider1.png screenshot]
* basic projectmanager functionality [http://kv-53.narod.ru/kaider1.png screenshot]
* glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. KAider displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary [http://kv-53.narod.ru/kaider3.png screenshot]
* Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts [http://kv-53.narod.ru/kaider_tm.png screenshot]
* for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
* glossary with basic [http://www.lisa.org/standards/tbx/ tbx] format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. [http://kv-53.narod.ru/kaider3.png screenshot 1] [http://kv-53.narod.ru/kaider4.png screenshot 2]
* Search/Replace functions in multiple files at once.
* Spellchecking of multiple files at once.
* beginnings of XLIFF support


==Compiling==
==Compiling==
After you [[Getting_Started/Build/KDE4|set kde env up]] (compiling kdelibs is enough):
After you [[Getting_Started/Build|set kde env up]] (compiling kdelibs+kdebase is enough):
  cd trunk
  svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
  svn up playground/devtools/kaider
  cd kdesdk && svn up cmake doc lokalize
mkdir kdesdk/build && chmod a+w kdesdk/build
  su kde-devel
  su kde-devel
mkdir playground/devtools/kaider/build
  cd kdesdk/build
  cd playground/devtools/kaider/build
  cmakekde ..
  cmakekde ..


as a root, run sshd and then from the usual shell:
as a root, run sshd and then from the usual shell:
  ssh -XC kde-devel@localhost
  ssh -XC kde-devel@localhost
  kaider
  lokalize


==Roadmap==
you can get catalogmanager by specifying --project option
*[basic framework DONE] project management -- 1-2 weeks
lokalize --project /path/to/index.ktp
*[DONE] context glossary -- 0.5-1 week
*translation DB (QtSql) -- 2 weeks
*[DONE] mode for merging translations for editors (QA) -- 1 week
*scripting API + sipping on google translate for live glossary (kross) - 2 weeks
*the remaining time is for perfection/polishing/small improvements and xliff+qt-linguist support


==What i'm doing these days==
See [[Projects/Summer_of_Code/2007/Projects/KAider#Setup]].
*kross WebQuery framework - testing, final refinements
*impovements on ProjectView (dbus, etc), Glossary


==Ideas==
===Debian users===
Current:
You can install the latest version of lokalize from experimental repository: [http://packages.debian.org/experimental/lokalize]
* Webquery scripts for close languages;
 
==Setup==
* Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
* Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
* Populate Translation Memory by dropping .po files onto TM View
 
See [http://websvn.kde.org/trunk/l10n-kde4/ru/] for an example project structure
 
==Maxims==
* Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
* Do automatization _everywhere_ possible
* Focus on translation quality. This is open source -- so source code is available (for change)
 
==Further work==
*WebQuery for twin languages (like Ukrainian and Russian)
*xliff+qt-linguist support (see [[#KBabel features to be implemented|KBabel features to be implemented]])
* Glossary checklists: check for forbidden terms in new translation
* project-wise and program-wise: webquery scripts, glossaries, TMs
* project-wise and program-wise: webquery scripts, glossaries, TMs
 
* check for different translations of the same msgid (use strigi?)
Further work:
* back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
* Tighten SVN support: svn diff-like feature
* Automatic Glossary building
* Research on dividing into sentences rules (e.g. srx)
* Research on dividing into sentences rules (e.g. srx)
* Automate submitting translation suggestions to translate.google.com [Kross action]
* Automate submitting translation suggestions to translate.google.com [Kross action]
* fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]


Not for KDE:
Not for KDE:
* Be complete computer-aided translation system by providing e.g. actions to import+export openoffice, txt and documents of other formats by calling appropriate scripts/commands. Define for that general kross actions interface.
* Be complete computer-aided translation system by providing e.g. actions to import+export openoffice, txt and documents of other formats by calling appropriate scripts/commands. Define for that general kross actions interface.
* Make nice windoze package for the windowzerz
* Make nice windoze package for the windowzerz
Competitors (ideas):
* [http://www.triplespin.com/en/products/locfactoryeditor.html#whatsnew locfactoryeditor] --Mac only
* [http://www.heartsome.net/EN/downloads.html Heartsome] --multiplatform, costs money
* Go over [http://sourceforge.net/tracker/?atid=520350&group_id=68187&func=browse OmegaT wishlist] and ensure every sane wish is implemented
Converters (use, acting as a front-end):
* [http://file2xliff4j.sourceforge.net/javadoc/file2xliff4j/package-summary.html file2xliff4j] --java-based
* [https://open-language-tools.dev.java.net] --java-based
* [http://translate.sourceforge.net/wiki/toolkit/index Translate Toolkit] --python-based, checks, other goodness


==KBabel features to be implemented==
==KBabel features to be implemented==
...in the smarter way :). After or during the summer.
...in a smarter way :)
* Character selection tool integration (kdelibs rule); sort by the frequency
* persistent bookmarks for messages in a file saved in the project
* persistent bookmarks for messages in a file saved in the project
* extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
* extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
* Search/Replace functions in multiple files at once.
* Spellchecking of multiple files at once.
* Opening source code by references in message comments [Kross action]
* A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
* A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
* Sending the file using email [Kross (project) action]
* Sending the file using email [Kross (project) action]
* Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
* Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
* CVS and SVN support [Kross project action] (is 'svn ci' so hard?)
* Automatic comparisons and statistics of POT and PO files for a quick overview which and how many files are translated (or not) and which files may be obsolete + [Kross (project) action] that merges translations with updated template
* PO File Header change [Kross action (+triggered on saving)]
* PO File Header change [Kross action (+triggered on saving)]
* Printing of selected messages (eg fuzzy ) [Kross action]
* Printing of selected messages (eg fuzzy ) [Kross action]
Also:
* msgid-diff-msgstr from [http://lichota.net/~krzysiek/projects/msgtools/] (features for all other commands are already implemented, if you haven't noticed this)


==KBabel features NOT to be implemented==
==KBabel features NOT to be implemented==
* Automatic ("rough" in kbabel terms) translation. Pure machine translation is a joke. all mahcine-made translations must be verified by human.
* Character selection tool integration, sort by the frequency
 
Why? Better improve system-wide charselect tool, OR...
==Setup==
modify your xorg keyboard layout!
* Create project, saving *.ktp file to l10n-kde4/<LangCode>
* Create in the same dir file called terms.tbx, and fill it with initial contents:
 
===Glossary===
This is how a TBX glossary looks like:
 
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE martif PUBLIC "ISO 12200:1999A//DTD MARTIF core (DXFcdV04)//EN" "TBXcdv04.dtd">
   
    <martif type="TBX" xml:lang="en">
        <martifHeader>
            <fileDesc>
                <titleStmt>
                    <title>KDE Russian Team Glossary</title>
                </titleStmt>
            </fileDesc>
        </martifHeader>
        <text>
            <body>
                <termEntry id="1">
                    <descrip type="subjectField">Security</descrip>
                    <descrip type="relatedConcept">authorization</descrip>
                    <langSet xml:lang="en">
                        <ntig>
                            <termGrp>
                                <term>authentication</term>
                                <termNote type="partOfSpeech">noun</termNote>
                            </termGrp>
                            <descrip type="context">.</descrip>
                        </ntig>
                    </langSet>
                    <langSet xml:lang="ru">
                        <ntig>
                            <termGrp>
                                <term>идентификация</term>
                                <termNote type="partOfSpeech">сущ.</termNote>
                            </termGrp>
                        </ntig>
                    </langSet>
                    <langSet xml:lang="ru">
                        <ntig>
                            <termGrp>
                                <term>подтверждение подлинности</term>
                                <termNote type="partOfSpeech">сущ.</termNote>
                            </termGrp>
                        </ntig>
                    </langSet>
                </termEntry>
                <termEntry id="2">
                    <descrip type="subjectField">Security</descrip>
                    <descrip type="relatedConcept">authentication</descrip>
                    <langSet xml:lang="en">
                        <ntig>
                            <termGrp>
                                <term>authorization</term>
                                <termNote type="partOfSpeech">noun</termNote>
                            </termGrp>
                            <descrip type="context"></descrip>
                        </ntig>
                    </langSet>
                    <langSet xml:lang="ru">
                        <ntig>
                            <termGrp>
                                <term>определение прав</term>
                                <termNote type="partOfSpeech">сущ.</termNote>
                            </termGrp>
                        </ntig>
                    </langSet>
                </termEntry>
            </body>
        </text>
    </martif>

Latest revision as of 23:02, 19 March 2011

WARNING KAider was renamed to Lokalize and will be included in kdesdk package for KDE 4.1

Lokalize is a computer-aided translation system that focuses on productivity and performance. Translator does only creative work (of delivering message in his/her mother language in laconic and easy to understand form). Lokalize implies paragraph-by-paragraph translation approach (when translating documentation) and message-by-message approach (when translating GUI). See KAider/Introduction

Current state

Already has:

  • syntax highlighting
  • spellcheck (sonnet needs improvement)
  • search-n-replace, ignoring accel marks
  • formats .po file output better so less diff is generated by scripty
  • small features like quick tag insert, placing text cursor right after the tag in the beginning (e.g. '<qt>|foobar</qt>' where "|" is a cursor), entry bookmarks
  • viewer of the difference between current msgid and previous one (i.e. msgid translation of which current msgstr really is -- for fuzzies generated with --previous gettext option)
  • merge mode for editors (QA) or when several translators work on the same file screenshot
  • basic projectmanager functionality screenshot
  • Translation Memory (threaded) with shortcuts for inserting suggestions into current 'msgstr', scores are computed based on common/total length ratio, removed+added length, and count of removed+added parts screenshot
  • for difference representation in all places word-by-word algorithm is used (based on the Longest Common Sequence o(n*n) algorithm and my own experience)
  • glossary with basic tbx format support. Lokalize displays relevant entries on-the-fly and provides shortcuts to insert them. also, you can add new glossary terms via context menu of the glossary. screenshot 1 screenshot 2
  • Search/Replace functions in multiple files at once.
  • Spellchecking of multiple files at once.
  • beginnings of XLIFF support

Compiling

After you set kde env up (compiling kdelibs+kdebase is enough):

svn checkout -N svn://anonsvn.kde.org/home/kde/trunk/KDE/kdesdk/
cd kdesdk && svn up cmake doc lokalize
mkdir kdesdk/build && chmod a+w kdesdk/build
su kde-devel
cd kdesdk/build
cmakekde ..

as a root, run sshd and then from the usual shell:

ssh -XC kde-devel@localhost
lokalize 

you can get catalogmanager by specifying --project option

lokalize --project /path/to/index.ktp

See Projects/Summer_of_Code/2007/Projects/KAider#Setup.

Debian users

You can install the latest version of lokalize from experimental repository: [1]

Setup

  • Create project, saving *.ktp file to l10n-kde4/<LangCode>/ dir
  • Populate Glossary via GlossaryView context menu (.tbx file will be created automatically for you on the first entry addition).
  • Populate Translation Memory by dropping .po files onto TM View

See [2] for an example project structure

Maxims

  • Majority of actions must be accessible via keyboard (because it is faster to press a shotcut than to be frustrated with a mouse)
  • Do automatization _everywhere_ possible
  • Focus on translation quality. This is open source -- so source code is available (for change)

Further work

  • WebQuery for twin languages (like Ukrainian and Russian)
  • xliff+qt-linguist support (see KBabel features to be implemented)
  • Glossary checklists: check for forbidden terms in new translation
  • project-wise and program-wise: webquery scripts, glossaries, TMs
  • check for different translations of the same msgid (use strigi?)
  • back-checking, to see whether a term/string in the target language has been used as translation for different things, not just that one source has the same translation everywhere.
  • Tighten SVN support: svn diff-like feature
  • Automatic Glossary building
  • Research on dividing into sentences rules (e.g. srx)
  • Automate submitting translation suggestions to translate.google.com [Kross action]
  • fill TM with content of /usr/share/locale/<lang>/LC_MESSAGES/*.mo [Kross action]

Not for KDE:

  • Be complete computer-aided translation system by providing e.g. actions to import+export openoffice, txt and documents of other formats by calling appropriate scripts/commands. Define for that general kross actions interface.
  • Make nice windoze package for the windowzerz

Competitors (ideas):

Converters (use, acting as a front-end):

KBabel features to be implemented

...in a smarter way :)

  • persistent bookmarks for messages in a file saved in the project
  • extended marking of .po and .pot files (e.g. translator that currently works on the file and cince when) saved in the project
  • A plugin framework for validation tools for consistency checks [Kross action triggered on saving]
  • Sending the file using email [Kross (project) action]
  • Automatic syntax check with msgfmt when saving and, if an error occured, easy navigation to messages, which contain errors. == Syntax check (msgfmt --statistics) for existing files to control if the translated files will compile and, accordingly, work when distributed [Kross (project) action]
  • PO File Header change [Kross action (+triggered on saving)]
  • Printing of selected messages (eg fuzzy ) [Kross action]

Also:

  • msgid-diff-msgstr from [4] (features for all other commands are already implemented, if you haven't noticed this)

KBabel features NOT to be implemented

  • Character selection tool integration, sort by the frequency

Why? Better improve system-wide charselect tool, OR... modify your xorg keyboard layout!