Localization/Workflows/PO Ascription: Difference between revisions

From KDE TechBase
m (Text replace - "<code python>" to "<syntaxhighlight lang="python">")
m (Text replace - "<code bash>" to "<syntaxhighlight lang="bash">")
Line 145: Line 145:
and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:
and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:


<code bash>
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-support
$ cd $KDEREPO/trunk/l10n-support
$ poascribe commit -u uhero --all-reviewed -C LANG/summit/
$ poascribe commit -u uhero --all-reviewed -C LANG/summit/
Line 174: Line 174:
<!--If, on the contrary, it is known who translated and reviewed what, ascription can be performed piece-wise with user names of real translators:
<!--If, on the contrary, it is known who translated and reviewed what, ascription can be performed piece-wise with user names of real translators:


<code bash>
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-support/LANG/summit
$ cd $KDEREPO/trunk/l10n-support/LANG/summit
$ poascribe commit -u alice --all-reviewed -C kdelibs/ kdepimlibs/ kdebase/
$ poascribe commit -u alice --all-reviewed -C kdelibs/ kdepimlibs/ kdebase/
Line 209: Line 209:
The newly created ascription tree (and any modifications to summit tree) can now be committed as usual:
The newly created ascription tree (and any modifications to summit tree) can now be committed as usual:


<code bash>
<syntaxhighlight lang="bash">
$ svn add LANG/summit-ascript
$ svn add LANG/summit-ascript
$ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."
$ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."
Line 227: Line 227:
Translators can then submit updated PO files simply by substituting <tt>svn commit</tt> (or whatever the VCS commit command is) with <tt>poascribe commit</tt> (<tt>co</tt> or <tt>ci</tt> for short):
Translators can then submit updated PO files simply by substituting <tt>svn commit</tt> (or whatever the VCS commit command is) with <tt>poascribe commit</tt> (<tt>co</tt> or <tt>ci</tt> for short):


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci LANG/summit/messages/kdefoo/*fooapp*.po
$ poascribe ci LANG/summit/messages/kdefoo/*fooapp*.po
LANG/summit/messages/kdefoo/fooapp.po  (144)
LANG/summit/messages/kdefoo/fooapp.po  (144)
Line 252: Line 252:
These translators may send in their work by email, to ''any'' team member with commit access (not necessarily the coordinator or a reviewer); this team member can commit received files without any review, as review can be conducted at any later time. If Bob sends some files to Alice, she can commit them immediately by stating Bob's user name:
These translators may send in their work by email, to ''any'' team member with commit access (not necessarily the coordinator or a reviewer); this team member can commit received files without any review, as review can be conducted at any later time. If Bob sends some files to Alice, she can commit them immediately by stating Bob's user name:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci -u bob ...
$ poascribe ci -u bob ...
</code>
</code>
Line 266: Line 266:
At the very basic level (which is the only level in classical review by stages), messages can be classified into simply unreviewed and reviewed, without further qualifiers. Alice now wants to review all unreviewed messages in a group of PO files, say <tt>kdetoys</tt> module. She issues (<tt>di</tt> is short for <tt>diff</tt>):
At the very basic level (which is the only level in classical review by stages), messages can be classified into simply unreviewed and reviewed, without further qualifiers. Alice now wants to review all unreviewed messages in a group of PO files, say <tt>kdetoys</tt> module. She issues (<tt>di</tt> is short for <tt>diff</tt>):


<code bash>
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-support
$ cd $KDEREPO/trunk/l10n-support
$ poascribe di LANG/summit/messages/kdetoys/
$ poascribe di LANG/summit/messages/kdetoys/
Line 305: Line 305:
Once the review is complete, Alice simply commits the reviewed files:
Once the review is complete, Alice simply commits the reviewed files:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci LANG/summit/messages/kdetoys/
$ poascribe ci LANG/summit/messages/kdetoys/
LANG/summit/messages/kdegames/bovo.po  (0/2)
LANG/summit/messages/kdegames/bovo.po  (0/2)
Line 327: Line 327:
When many files with few changes in each are to be reviewed, it becomes burdensome to manually open each and every diffed for review, and then to make sure that all are committed with <tt>poascribe ci</tt>. To make this easier, <tt>-w torevivew.out</tt> option can be added to <tt>poascribe di</tt>, which requests that paths of all diffed PO files are written into <tt>torevivew.out</tt> file. This file can then be used to batch open POs for review in the editor, as well as fed back on <tt>poascribe ci</tt> with <tt>-f torevivew.out</tt>. There is also the <tt>-o</tt> option which causes <tt>poascribe</tt> to directly open PO files in a PO editor (though this is currently applicable only to Lokalize). Putting it together, to efficiently review a whole bunch of small changes throughout many files, Alice can:
When many files with few changes in each are to be reviewed, it becomes burdensome to manually open each and every diffed for review, and then to make sure that all are committed with <tt>poascribe ci</tt>. To make this easier, <tt>-w torevivew.out</tt> option can be added to <tt>poascribe di</tt>, which requests that paths of all diffed PO files are written into <tt>torevivew.out</tt> file. This file can then be used to batch open POs for review in the editor, as well as fed back on <tt>poascribe ci</tt> with <tt>-f torevivew.out</tt>. There is also the <tt>-o</tt> option which causes <tt>poascribe</tt> to directly open PO files in a PO editor (though this is currently applicable only to Lokalize). Putting it together, to efficiently review a whole bunch of small changes throughout many files, Alice can:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di PATHS... -w toreview.out -o lokalize
$ poascribe di PATHS... -w toreview.out -o lokalize
$ # ...only marked messages opened in Lokalize, review them...
$ # ...only marked messages opened in Lokalize, review them...
Line 337: Line 337:
Invocations of <tt>poascribe di</tt> without any options, as in the previous section, were actually equivalent to this:
Invocations of <tt>poascribe di</tt> without any options, as in the previous section, were actually equivalent to this:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s modar PATHS...
$ poascribe di -s modar PATHS...
</code>
</code>
Line 345: Line 345:
There are various specialized selectors, and fall into two groups: ''shallow selectors'' and ''history selectors''. Shallow selectors look only into the current version of the message, and cannot select historical versions, which means that they cannot provide embedded diffs. History selectors (<tt>modar</tt> is of this type) can select messages from history and provide diffs. Several selectors can be issued on the command line, and the message is selected only if all selectors select it. Shallow selectors are then normally used as a pre-filter for history selectors. For example, to select messages modified after last reviewed, but only those found in stable branch, <tt>branch</tt> and <tt>modar</tt> selectors are chained:
There are various specialized selectors, and fall into two groups: ''shallow selectors'' and ''history selectors''. Shallow selectors look only into the current version of the message, and cannot select historical versions, which means that they cannot provide embedded diffs. History selectors (<tt>modar</tt> is of this type) can select messages from history and provide diffs. Several selectors can be issued on the command line, and the message is selected only if all selectors select it. Shallow selectors are then normally used as a pre-filter for history selectors. For example, to select messages modified after last reviewed, but only those found in stable branch, <tt>branch</tt> and <tt>modar</tt> selectors are chained:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s branch:stable -s modar PATHS...
$ poascribe di -s branch:stable -s modar PATHS...
</code>
</code>
Line 353: Line 353:
Selectors can take parameters themselves, like <tt>branch:stable</tt> in the previous example. Parameters are separated from the selector name by any non-alphanumeric character; this is colon by convention, but if a parameter contains a colon, something like slash, tilde, etc. can be used. Number of parameters can be variable, and <tt>modar</tt> in particular can take from none to three. If Alice wants to review only those messages modified ''by Charlie'' since last review, she states this by first argument to <tt>modar</tt>:
Selectors can take parameters themselves, like <tt>branch:stable</tt> in the previous example. Parameters are separated from the selector name by any non-alphanumeric character; this is colon by convention, but if a parameter contains a colon, something like slash, tilde, etc. can be used. Number of parameters can be variable, and <tt>modar</tt> in particular can take from none to three. If Alice wants to review only those messages modified ''by Charlie'' since last review, she states this by first argument to <tt>modar</tt>:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s modar:charlie PATHS...
$ poascribe di -s modar:charlie PATHS...
</code>
</code>
Line 359: Line 359:
If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review ''by her'' with second parameter to <tt>modar</tt>:
If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review ''by her'' with second parameter to <tt>modar</tt>:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s modar::alice PATHS...
$ poascribe di -s modar::alice PATHS...
</code>
</code>
Line 369: Line 369:
Any selector can be negated by prepending <tt>n</tt> to its name. For example, the history selector <tt>modafter:DATE</tt> selects first modification after the given date; to select messages modified after last review, but only if modified during June 2010:
Any selector can be negated by prepending <tt>n</tt> to its name. For example, the history selector <tt>modafter:DATE</tt> selects first modification after the given date; to select messages modified after last review, but only if modified during June 2010:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...
$ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...
</code>
</code>
Line 377: Line 377:
Selectors can be issued in other modes too. If the PO file is big and Alice has reviewed messages up to and including entry 246 when she has to pause until another day, she can commit reviews only up to this entry by issuing the <tt>espan</tt> selector:
Selectors can be issued in other modes too. If the PO file is big and Alice has reviewed messages up to and including entry 246 when she has to pause until another day, she can commit reviews only up to this entry by issuing the <tt>espan</tt> selector:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci -s espan::246 PATHS...
$ poascribe ci -s espan::246 PATHS...
</code>
</code>
Line 401: Line 401:
Dan selects messages for review similarly to Alice, but aditionally giving the <tt>lstyle</tt> tag as ''third'' parameter of <tt>modar</tt>, and indicating that ascribed reviews should be tagged as <tt>lstyle</tt>:
Dan selects messages for review similarly to Alice, but aditionally giving the <tt>lstyle</tt> tag as ''third'' parameter of <tt>modar</tt>, and indicating that ascribed reviews should be tagged as <tt>lstyle</tt>:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe di -s modar:::lstyle -t lstyle PATHS...
$ poascribe di -s modar:::lstyle -t lstyle PATHS...
</code>
</code>
Line 407: Line 407:
After finishing the review, Dan commits as usual:
After finishing the review, Dan commits as usual:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci PATHS...
$ poascribe ci PATHS...
</code>
</code>
Line 454: Line 454:
Modifications made to summit catalogs by merging with templates must also be ascribed. Therefore, after merging the summit (<tt>posummit ... merge ...</tt>) the coordinator substitutes the VCS command:
Modifications made to summit catalogs by merging with templates must also be ascribed. Therefore, after merging the summit (<tt>posummit ... merge ...</tt>) the coordinator substitutes the VCS command:


<code bash>
<syntaxhighlight lang="bash">
$ svn commit LANG/summit/messages/ -m "Merged summit."
$ svn commit LANG/summit/messages/ -m "Merged summit."
</code>
</code>
Line 460: Line 460:
with the <tt>poascribe</tt> command in <tt>commit</tt> mode:
with the <tt>poascribe</tt> command in <tt>commit</tt> mode:


<code bash>
<syntaxhighlight lang="bash">
$ poascribe ci LANG/summit/messages/ -m "Merged summit."
$ poascribe ci LANG/summit/messages/ -m "Merged summit."
</code>
</code>
Line 513: Line 513:
By default, <tt>posummit</tt> uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the <tt>-a</tt> option:
By default, <tt>posummit</tt> uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the <tt>-a</tt> option:


<code bash>
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-support
$ cd $KDEREPO/trunk/l10n-support
$ posummit scripts/messages.summit LANG scatter -a emergency
$ posummit scripts/messages.summit LANG scatter -a emergency

Revision as of 20:42, 29 June 2011


Localization/Workflows/PO_Summit


Reviewing by Ascriptions
On Localization   Workflows
Prerequisites   Translating in Summit
Related Articles   Pology, Language Coordinator
External Reading   n/a

Why Review Translations?

Especially to new translators, it may not be obvious to which extent the translation needs to be reviewed. If the translator has exercised due diligence, how "wrong" can the translation be? Even if the translator has good command of the source language (English in context of this article), the answer is "very wrong", when all aspects are considered. Here are some of them.

With comparatively simple grammar of English, the meaning of a short English sentence -- as typically encountered in application user interfaces -- is very dependent on the surrounding context. This context may not be obvious when the translator is going through isolated messages in the translation file, so he may commit the worst of errors from the user's viewpoint, the senseless translation. An experienced reviewer will have developed sense for troublesome contexts, and will have several means to decisively determine the context (including, for example, running the development version of the application).

Even if the context is correctly established, the translator may use "wrong" terminology, which is the next worse thing for the user. A term used in translation does not need to be wrong by itself, in fact it may be exactly the correct term -- in another translation project. The reviewer will have more experience with terminology of the present project, and be able to bring the translation in line with it.

Style in the technical sense is a consistent choice between several perfectly valid constructs in target language when applied to text in the given technical context. For example, how to translate menu titles and items, button labels, or tooltips. The choices may include noun or verb forms, particular grammar categories, tone of address, and so on. There may be a style guide to the project which details such choices, and the reviewer will know it well.

Style in the linguistic sense is especially applicable to longer texts, such as tooltips in user interfaces, and passages in documentation. A typical error of a new translator is to closely adhere to English style and grammar. This may produce translation which is semantically and grammatically valid in the target language, but very out of style -- the "translationese". Reviewer is there to naturalize such passages.

Finally, the reviewer may be an experienced translator, but that does not mean that his own translations need no review. Immersion into the source language, distraction, fatigue, will lead the reviewer into any of the above errors in translation, only with less frequency. So reviewers should also review one anothers' translations.

Classical Reviewing by Stages

Classical review workflow by stages seems simple enough. Translator translates a PO file (or updates existing translation), and declares it ready to review. A reviewer reviews it, and declares it ready to "commit". Committing here should be understood generally, as inclusion into the pool from which translations are periodically shipped to end users. A committer finally commits the file. The process is iterative: the reviewer may return the file to the translator, and translator later again declare it as ready for review. There may be several stages of review (such as proof-reading, approving), each of which may return the translation to a previous stage, or forward it to some special stage. The process can also be more finer grained, where each message in the file goes through stages separately.

Regardless of the particularities, workflows of this kind all have the following in common. Members of the translation team are assigned roles -- such as translator, reviewer, approver, committer -- by which they enter into the workflow (single person can have more roles). The later review stages must wait for the earlier stages to complete, and the translation cannot be updated again before the current version clears the pipeline (or the pipeline is aborted). Most importantly, once the translation is committed, it becomes part of simply "admitted" translations, with no further qualifiers.

The system of prescribed roles requires that team members assign them between themselves, stick to them, and shuffle them along the way. The prescribed review pipeline requires a tool to enforce and keep track of the stages in which translations are. This makes the review workflow complex and rigid, most probably with choke points for efficiency. Distribution of roles may become disbalanced by people coming and going, or the workflow tool may be prohibitive to some scenarios (e.g. single translator making small adjustments in dozens of files across the project, but having to upload each manually through a web interface).

Of course, "rigid", "complex", "inefficient", are comparative qualifications, so what is it that the classical review by stages can be compared to in this way?

Reviewing by Ascriptions

Reviewing by ascriptions is even simpler conceptually, and yet less rigid, less complex, and much more efficient than the review by stages. It works on the message-level, rather than file-level. Anyone can simply translate some messages and directly commit modified files, without any review, but with ascribing modifications to own name. Anyone can review any committed messages at any moment, commit the modifications-on-review and ascribe reviews to own name and (possibly) to certain class -- full review, review of context, of terminology, of style, etc. Only when the translation is to be shipped to end users, the insufficiently reviewed messages are automatically omitted from the package, by evaluating the ascription history of each message.

Most importantly, based on the ascription history, the reviewer can select only some particular messages, and review only the difference between their historical and current versions. For example, Alice can select to review only messages modified since she or Bob had last reviewed them for style; she could see the difference from that last review to current version, e.g. if in the whole paragraph only a single word has changed by Charlie when he reviewed the terminology. In terms of PO workflow, the ascription history propagates through merges, so the reviewer can compare the change in original and the change in translation since the last review, to judge if one fits the other.

Since everyone just commits, translations can be efficiently kept in a version control repository, with the ascription system added on top. After having done some translating, the team member simply substitutes commit command of the version control system (VCS) with ascribe-modifications command of the ascription system (AS, which calls the underlying VCS internally). After reviewing, the team member uses ascribe-reviews command of the AS to commit reviews to ascription history (as well as modifications made during the review). To select messages for review, the team member issues diff-for-review command of the AS (with suitable parameters to narrow the set) and selected messages are marked in-place in PO files and embedded with differences, and possibly popped open in a PO editor.

When the translations are to be released, the team coordinator issues filter-for-release command of the AS, which takes the working PO files and creates final PO files with insufficiently reviewed messages removed. "Release time" is used here only figuratively: this should be a fully automatic process, so it can be performed at any interval of convenience.

What constitutes "sufficient review" can be defined in fine detail. It could be specified that messages modified by Alice need to have only review for terminology, but not necessarily for style; Charlie may belong to the group which needs to be reviewed on style, but not necessarily on context; Bob's reviews for style may be nice to have, but never blocking if missing. These decisions do not preclude released messages to be reviewed later on missing points, after higher priority reviews have been completed. The definition of sufficiency may be changed at any point, e.g. as team members get more experienced and require less review, without interfering with direct translation and review work.


In summary, with reviewing by ascriptions the lean efficiency of raw VCS operation is preserved while providing for great flexibility of review. All team members can be given commit access, no web or email detours are needed. There are no prescribed roles, but an equivalent of role assignment happens automatically at last possible moment, and can take into account both translators' and reviewers' abilities. There is no staging between completing and committing the translation, which enables translator to keep on polishing the translation undisturbed until the reviewer comes around. There is no inefficiency in handling small changes throughout many files, since single AS command commits all changes just as single VCS command would. AS in effect abstracts VCS, so general team members do not have to know the particularities of the underlying VCS. On commit operations, AS can also apply checks (e.g. decline to commit syntactically invalid PO files) and modifications (e.g. update translator's data in the PO header).

Ascription System in Pology

Pology is a collection of various modular tools for supporting translation based on PO files. Among them is the script poascribe, which implements an ascription system (AS); at present, it can use Subversion or Git as the underlying VCS. poascribe is still in experimental stage, so what follows is a brief description of how to use it in context of the KDE translation project. However, very little is truly specific to KDE; the only major assumption is that there exists a VCS repository with PO files of a given language grouped together, and that the translation team can use it without special restrictions.

Very important for the AS is how branches are handled (in KDE, the rolling trunk and stable branches). AS can in principle be deployed by branch, but then there is the added complexity of porting translations between branches, which ascriptions should follow. Therefore, the AS implemented by poascribe is currently limited to assumption that there is a single branch of translations at all times. The article "Translating in Summit" explains how a KDE translation team can set up and operate such a single branch, the summit, and this is the prerequisite for the following instructions. (Note that the summit system is useful on its own, and should be conductive to any kind of review workflow.)

Setting Up

The summit branch for the language LANG is positioned like this in the KDE repository:

$KDEREPO/

   trunk/
       l10n-support/
           LANG/
               summit/
                   messages/
                   docmessages/

The team coordinator already has this part of the repository tree locally due to regular summit operations. For the same reason Pology is already set up. Setting up the ascription system is now simple. The file ascription-config is created in the parent directory of the summit:

...

   LANG/
       ascription-config
       summit/
           messages/
           docmessages/

with the following contents:

  1. ---------------------------
  2. Global ascription settings.

[global]

  1. Roots of the catalog and ascription trees.

catalog-root = summit ascript-root = summit-ascript

  1. The underlying version control system.

version-control = svn

  1. Data for updating catalog headers.
  2. - language code

language = LANG

  1. - full language name

language-team = LANGUAGE

  1. - email address of the team

team-email = [email protected]

  1. Default commit message.

commit-message = Translation updates.

  1. -----------------------
  2. Registered translators.

[user-alice] name = Alice Akmalryn original-name = Алиса Акмалрин email = [email protected]

[user-bob] name = Bob Byomkin original-name = Бобан Бјомкин email = [email protected]

  1. ...and so on.

Some notes:

  • The ascript-root setting should be exactly summit-ascript, for the reason mentioned later.
  • commit-message field, if defined, allows team members to commit without providing a commit message. The value given by this field will be used by default, with translator's user name appended to the end in special syntax. For example: Translation updates. [>alice]. (Translator's user name is also appended to manually supplied commit messages.) Translators can still supply a commit message when they wish, as shown later. If this field is not set, the commit message is supplied as usual on committing.
  • Team members are defined by [user-USERNAME] sections. Ascription user names can be any valid ASCII identifier: ASCII letters, digits and underscores only, digit cannot be the first character. Ascription user names have no technical relation to the underlying VCS accounts, though it is mnemonically convenient if they are the same (in case of SVN). This means that a translator who does not have a VCS account (yet) can and should be added here, with assigned user name (best one suitable as SVN account name later); why this should be done will be explained later.
  • original-name field in user sections is there in case the preferred renderings of the name in English and in target language are not the same. When this is not the case, original-name can be omitted.

As soon as the ascription-config file is committed, the ascription system is ready for operation. Only regular modifications to this file are those of adding new team members. (On the other hand, team members should never be removed, because even after they no longer contribute, their ascription records remain in the system.)

Initial Ascription

The most common situation at start of ascription workflow is that there already exists a body of translations, contributed to by many different people over time. The coordinator should ascribe all existing translations as initial modifications, but to whom? It cannot be said precisely who translated what. The solution is to introduce a generic user in ascription-config, suitably known as "Unknown Hero" (or "Lost Translator", you can be inventive):

[user-uhero] name = Unknown Hero original-name = Незнани јунак

and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:

<syntaxhighlight lang="bash"> $ cd $KDEREPO/trunk/l10n-support $ poascribe commit -u uhero --all-reviewed -C LANG/summit/

The argument commit is the ascription mode, and the -u option provides the user name to which ascriptions are made. This is an important point: ascriptions are made to a user defined in ascription configuration, and have nothing to do with VCS accounts; someone who has the account can commit in the name of someone who does not. It is the --all-reviewed option that declares all messages to be reviewed as well (note that it is normally used only this once, and not for normal day to day reviewing). The -C option prevents automatic adding and committing to version control, which is useful for this initial step. Finally the paths which contain all summit catalogs are given.

When the poascribe command is issued, a progress bar will appear, and the following output will start to unfold:

LANG/summit/messages/extragear-base/rellinks.po (50/50) LANG/summit/messages/extragear-base/autorefresh.po (13/13) LANG/summit/messages/extragear-base/babelfish.po (38/38) ... LANG/summit/messages/qt/libphonon.po (13/13) LANG/summit/messages/qt/phonon-xine.po (24/24) LANG/summit/messages/qt/phonon_gstreamer.po (12/12) ===== Ascription summary: - modified reviewed translated 111775 111775 fuzzy 26943 26943 obsolete/t 2965 2965 obsolete/f 1626 1626

The number in parenthesis indicates how many messages have been ascribed in the given PO file (modified/reviewed), and at the end the totals are given. Ascribing the complete summit for the first time will take quite some time (on the order of 10-20 minutes).


After the initial ascription has been made, the ascription tree will appear next to the summit tree. This tree will contain one ascription PO file for each summit PO file, with the same name and relative location within the tree:

...

   LANG/
       ascription-config
       summit/
           messages/
               kdelibs/
                   kcertpart.po
                   kdelibs4.po
                   ...
               ...
           docmessages/
       summit-ascript/
           messages/
               kdelibs/
                   kcertpart.po
                   kdelibs4.po
                   ...
               ...
           docmessages/

During the ascription some summit PO files may have been modified as well, in that any previous fields (#| ...) on translated messages have been removed. (These fields are sometimes erroneously left in by older PO editors.)

The newly created ascription tree (and any modifications to summit tree) can now be committed as usual:

<syntaxhighlight lang="bash"> $ svn add LANG/summit-ascript $ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."

Daily Use for Translators

Team members other than the coordinator, whether translators or reviewers, need to keep around only the trunk/l10n-support/LANG/ directory. But they always need to update this directory fully (rather than just one particular module or file under .../*messages/), so that the summit tree and the ascription tree (and configuration) are kept in sync.

In order not to have to issue their own user name (-u option to poascribe) all the time, translators can set it in Pology user configuration ~/.pologyrc, in [poascribe] section:

[poascribe] user = alice

Translators can then submit updated PO files simply by substituting svn commit (or whatever the VCS commit command is) with poascribe commit (co or ci for short):

<syntaxhighlight lang="bash"> $ poascribe ci LANG/summit/messages/kdefoo/*fooapp*.po LANG/summit/messages/kdefoo/fooapp.po (144) LANG/summit/messages/kdefoo/libfooapp.po (25) ===== Ascription summary: - modified translated 169 >>>>> VCS is committing catalogs: Sending LANG/summit/messages/kdefoo/fooapp.po Sending LANG/summit/messages/kdefoo/libfooapp.po Sending LANG/summit-ascript/messages/kdefoo/fooapp.po Sending LANG/summit-ascript/messages/kdefoo/libfooapp.po Transmitting file data .... Committed revision 1267069. $

poascribe will add ascription records into ascription catalogs corresponding to summit catalogs to be committed, and commit them all. Like svn commit, poascribe ci can take any number of file or directory paths, and can be issued from any working directory (it will always find ascription catalogs). If default commit message has not been set in the ascription configuration, poascribe will ask for it; or it can be given in command line through -m option.

Translators Without Commit Access

With the ascription system in place, every regular team member should have commit access. But, there may be some period of time before new translators are given accounts, revision control may be too technical for some, and even those with the account may not be able to commit temporarily for some reason.

These translators may send in their work by email, to any team member with commit access (not necessarily the coordinator or a reviewer); this team member can commit received files without any review, as review can be conducted at any later time. If Bob sends some files to Alice, she can commit them immediately by stating Bob's user name:

<syntaxhighlight lang="bash"> $ poascribe ci -u bob ...

For this to work, the translator who sent in the files has to be defined in the ascription configuration. There are no hidden costs or security issues to this (as opposed to opening a VCS account), so every new translator should be defined there before any work of that person is committed.

Daily Use for Reviewers

The ascription system opens up all sorts of possibilities for concrete review patterns. Reviewers should keep in mind that for each message the full modification and review history is available, so that the team can think about how to make good use of it. Therefore, what follows are some examples to illustrate the review facilities that poascribe provides.

Basic Reviewing

At the very basic level (which is the only level in classical review by stages), messages can be classified into simply unreviewed and reviewed, without further qualifiers. Alice now wants to review all unreviewed messages in a group of PO files, say kdetoys module. She issues (di is short for diff):

<syntaxhighlight lang="bash"> $ cd $KDEREPO/trunk/l10n-support $ poascribe di LANG/summit/messages/kdetoys/ LANG/summit/messages/kdegames/bovo.po (2) LANG/summit/messages/kdegames/kdiamond.po (7) LANG/summit/messages/kdegames/palapeli.po (12) ===== Diffed for review: 21

Unreviewed messages have now been marked and diffed, inside the listed PO files. What is this about "diffing"? If the files had already been reviewed before, some of the messages modified since then (those marked for review) may have changed very little (e.g. a few words in a paragraph-length message, or even just punctuation). Therefore, for each message marked for review, Alice also wants to see the diff since last review to current version. Here are two messages in typical review states added by poascribe di:

  1. . +> trunk stable
  2. . ascto: charlie:m
    gui/mainwindow.cc:372
  3. , ediff

msgid "GAME OVER. {-You won-}{+Tie+}!" msgstr "KRAJ IGRE. {-Pobeda-}{+Nerešeno+}!"

  1. . +> trunk stable
  2. . ascto: bob:m charlie:m
    game-state.cpp:117
  3. , ediff-total

msgid "Click the pause button again to resume the game." msgstr "Kliknite ponovo na dugme pauze da nastavite igru."

and the first one in Kate:

Message diffed for review by poascribe in Kate.
Message diffed for review by poascribe in Kate.

In the first message, the first to note is the #. ascto: comment. This comment succinctly lists who did what with the message since the last review; here charlie:m means that Charlie is the one who modified it. Then, there is the ediff flag, which alice can use it to jump through messages marked for review. Finally, the original and translation have been diffed; here they show that, since the last review, the message was fuzzied by changing "You won" to "Tie", and what Charlie did in translation to unfuzzy it. Even on a message as short as this, the diff tells something useful to Alice: the phrase "Game over" likely has a formulaic translation, and the fact that it is not part of the diff means that the earlier reviewer had made sure it is consistent, so Alice does not have to check that.

The #. ascto: comment of the second message reveals that both Charlie and Bob had been translating it. ediff-total flag instead of plain ediff means that this message had no review at all up to now, so there are no embedded diffs in text fields.

Alice can now go through marked files and messages, review translations, and possibly make modifications. When making changes in a message with embedded diffs, she can freely edit text outside of difference segments and within {+...+} segments (as these are the ones which belong to current version of the text). While reviewing, Alice does not remove any of the added message elements while reviewing (save for an occasional difference segment, when translation should be modified), as these elements are needed for later. If a message is particularly hard and Alice wants to defer its review for later, she can add the unreviewed (or nrev for short) flag to it.

Once the review is complete, Alice simply commits the reviewed files:

<syntaxhighlight lang="bash"> $ poascribe ci LANG/summit/messages/kdetoys/ LANG/summit/messages/kdegames/bovo.po (0/2) LANG/summit/messages/kdegames/kdiamond.po (0/7) LANG/summit/messages/kdegames/palapeli.po (3/12) ===== Ascription summary: - modified reviewed translated 3 21 >>>>> VCS is committing catalogs: Sending LANG/summit/messages/kdegames/palapeli.po Sending LANG/summit-ascript/messages/kdegames/bovo.po Sending LANG/summit-ascript/messages/kdegames/kdiamond.po Sending LANG/summit-ascript/messages/kdegames/palapeli.po Transmitting file data .... Committed revision 1284220. $

Three things have happened here. First, all review states (flags, embedded diffs, etc.) have been removed, restoring the PO file to normal. Then, any modifications that Alice have made during review are ascribed to her (here 3 out of 21 messages). Finally, all marked messages are ascribed as reviewed by Alice (any with unreviewed/nrev flags would have been omitted here). When committing, the only summit catalog that got committed is the one with modifications made during review, and all the ascription catalogs were committed because of the reviews recorded in them.

When many files with few changes in each are to be reviewed, it becomes burdensome to manually open each and every diffed for review, and then to make sure that all are committed with poascribe ci. To make this easier, -w torevivew.out option can be added to poascribe di, which requests that paths of all diffed PO files are written into torevivew.out file. This file can then be used to batch open POs for review in the editor, as well as fed back on poascribe ci with -f torevivew.out. There is also the -o option which causes poascribe to directly open PO files in a PO editor (though this is currently applicable only to Lokalize). Putting it together, to efficiently review a whole bunch of small changes throughout many files, Alice can:

<syntaxhighlight lang="bash"> $ poascribe di PATHS... -w toreview.out -o lokalize $ # ...only marked messages opened in Lokalize, review them... $ poascribe ci -f toreview.out

Selecting Messages for Review

Invocations of poascribe di without any options, as in the previous section, were actually equivalent to this:

<syntaxhighlight lang="bash"> $ poascribe di -s modar PATHS...

Option -s is issuing the message selector. modar is the default selector for diff mode, and stands for MODified-After-Review: it selects the earliest historical modification of the message after the last (or no) review of that message, if there is any such. By selecting a historical modification of the message, the diff from it to current version can be computed and embedded into the PO file, as in previous examples.

There are various specialized selectors, and fall into two groups: shallow selectors and history selectors. Shallow selectors look only into the current version of the message, and cannot select historical versions, which means that they cannot provide embedded diffs. History selectors (modar is of this type) can select messages from history and provide diffs. Several selectors can be issued on the command line, and the message is selected only if all selectors select it. Shallow selectors are then normally used as a pre-filter for history selectors. For example, to select messages modified after last reviewed, but only those found in stable branch, branch and modar selectors are chained:

<syntaxhighlight lang="bash"> $ poascribe di -s branch:stable -s modar PATHS...

It is important that the history selector is given last, because the last selector determines which historical message is selected. If the ordering had been reversed here, same messages would get selected, but they would not have embedded diffs, because branch is a shallow selector.

Selectors can take parameters themselves, like branch:stable in the previous example. Parameters are separated from the selector name by any non-alphanumeric character; this is colon by convention, but if a parameter contains a colon, something like slash, tilde, etc. can be used. Number of parameters can be variable, and modar in particular can take from none to three. If Alice wants to review only those messages modified by Charlie since last review, she states this by first argument to modar:

<syntaxhighlight lang="bash"> $ poascribe di -s modar:charlie PATHS...

If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review by her with second parameter to modar:

<syntaxhighlight lang="bash"> $ poascribe di -s modar::alice PATHS...

Here the first parameter ("modified by..."), which is not needed, must be explicitly skipped, before going to the second parameter ("reviewed by..."). The third optional parameter of modar will be mentioned in the next section.

When a selector parameter is a user name, normally it can also be a comma-separated list of user names (modar:bob,charlie) or prefixed with tilde to negate, i.e. select all other users (modar:~alice).

Any selector can be negated by prepending n to its name. For example, the history selector modafter:DATE selects first modification after the given date; to select messages modified after last review, but only if modified during June 2010:

<syntaxhighlight lang="bash"> $ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...

Negating a history selector produces a shallow selector: while modafter is history selector, nmodafter is shallow. But the order of the two in the previous command line is not important, as the last selector is the usual modar.

Selectors can be issued in other modes too. If the PO file is big and Alice has reviewed messages up to and including entry 246 when she has to pause until another day, she can commit reviews only up to this entry by issuing the espan selector:

<syntaxhighlight lang="bash"> $ poascribe ci -s espan::246 PATHS...

(the first parameter to espan is the first entry number, given if messages are not to be selected from the first). There is also the counterpart lspan selector, which works with referent line numbers (those of msgid keywords) instead of entry numbers.

Fine-Grained Reviews

In the introduction, several distinct types of what can go wrong in translation were described. Not all reviewers may be able to check translation against all those problems. Here is a typical scenario of this kind:

Alice is very computer-savvy and knows the translation project inside and out, which means that she can review well for context, terminology, and technical style. But, her language style leaves something to be desired, which shows through longer sentences and passages. Dan, on the other hand, is a very literary person, but not that much into the technical aspects. Dan's style reviews would thus be a perfect complement to Alice's general reviews.

poascribe can support this scenario in the following way. A review type tag for language style is defined in the ascription configuration, using the review-tags field:

[global] ... review-tags = lstyle

(The value to review-tags is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.

Dan selects messages for review similarly to Alice, but aditionally giving the lstyle tag as third parameter of modar, and indicating that ascribed reviews should be tagged as lstyle:

<syntaxhighlight lang="bash"> $ poascribe di -s modar:::lstyle -t lstyle PATHS...

After finishing the review, Dan commits as usual:

<syntaxhighlight lang="bash"> $ poascribe ci PATHS...

If Dan is always going to review the language style, in order not to have to issue the selector and tag in the command line all the time, he can make them default per mode in ~/.pologyrc:

[poascribe] user = dan selectors/diff = modar:::lstyle tags/diff = lstyle

With this Dan can use plain poascribe di just like Alice does.

The important point of review tags is that they make reviews by types independent. For example, Dan may come around to review the language style of the given message after several modifications and general reviews have been ascribed to it -- modar:::lstyle will simply ignore all reviews except for lstyle reviews. This is going to be reflected in the ascto: comment to marked messages:

  1. ...
  2. . ascto: charlie:m alice:r bob:m
  3. ...

msgid "..." msgstr "..."

Here Alice has made one review between Charlie's and Bob's modifications, and that review, being general instead of lstyle, did not cause modar to stop at it. After Dan reviews this message for language style, Alice runs selection for review and gets this:

  1. ...
  2. . ascto: bob:m dan:r(lstyle)
  3. ...

msgid "..." msgstr "..."

Again, since lstyle reviews do not mix with general reviews, Dan's review did not hide Bob's modification that Alice did not check so far.

(General review too has a tag assigned, the empty string, in case the reviewer needs to explicitly issue it in some context.)

Daily Use for The Coordinator

After setting up the ascription system, the team coordinator should have to do very little to maintain it.

Ascribing Merges

Modifications made to summit catalogs by merging with templates must also be ascribed. Therefore, after merging the summit (posummit ... merge ...) the coordinator substitutes the VCS command:

<syntaxhighlight lang="bash"> $ svn commit LANG/summit/messages/ -m "Merged summit."

with the poascribe command in commit mode:

<syntaxhighlight lang="bash"> $ poascribe ci LANG/summit/messages/ -m "Merged summit."

Since the user is not explicitly given by -u option, this will ascribe merge modifications to the coordinator (more precisely, to the user set as default in ~/.pologyrc), which is just fine. It is also possible to define a special user only for ascribing merge modifications, though there is no known advantage to that.

Since -C option is not issued, poascribe will automatically commit all modified summit and ascription catalogs when done.

Shuffling Ascription Catalogs

Sometimes summit catalogs are shuffled in the repository: moved to another module, renamed, one catalog split into two, two catalogs merged into one. Such shuffling should be exactly mirrored in the ascription tree, and this too is done on the repository side, at the same time. This relies on the ascription root being set exactly to summit-ascript in the ascription configuration. So the team coordinator has nothing special to do here.

If instead in the central KDE repository the translation team is working in an external repository, by consequence the ascription system must be set up in that repository. But so long as process_orphans.sh script from trunk/l10n-support/scripts/ is used to shuffle catalogs in the external repository as well, the ascription catalogs will be properly handled.

Filtering for Release

The last component of the ascription system is how to prevent insufficiently reviewed messages from leaking into a release. In context of Pology and summit workflow, poascribe itself is not used directly to this end. Instead, in the summit configuration (as opposed to ascription configuration), the team coordinator defines filters which pass messages by applying selectors.

Each top level PO tree has its own summit configuration file, named MSGTREE.extras.summit:

...

   LANG/
       summit/
           messages/
           messages.extras.summit
           docmessages/
           docmessages.extras.summit

For the simple case of all reviews being general reviews, the filter is added to summit configuration like this (anywhere within *.extras.summit file):

<syntaxhighlight lang="python"> S.ascription_filters = [

   ("regular", ["nmodar"]),

]

Here the filter is named regular, and is defined as application of nmodar selector, the negation of modar. This simply means: pass all messages not modified after the last review.

When the team coordinator scatters to branches (executes posummit scatter), messages from summit POs which do not pass this filter will not be sent to branch POs. The count of stopped messages by branch PO will be reported in the output as scattering proceeds.

Why did we have to name the filter regular? (Those knowing some Python will also notice that it is defined as a list element.) Because it is possible to define more than one filter, and select which one is used on each scattering. For example, the coordinator may wish that, when the release is near and time is short to review everything, messages from a few experienced translators can be passed into release without review. If those translators are Alice and Bob, an "emergency" filter can be defined like this:

<syntaxhighlight lang="python"> S.ascription_filters = [

   ("regular", ["nmodar"]),
   ("emergency", ["nmodar:~alice,bob"]),

]

By default, posummit uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the -a option:

<syntaxhighlight lang="bash"> $ cd $KDEREPO/trunk/l10n-support $ posummit scripts/messages.summit LANG scatter -a emergency

What if several selectors are needed to pass the message? For example, the language style review (the earlier example with Alice and Dan) too may be requested for regular scattering, but omitted from emergency scattering. The filter setup for this scenario looks like this:

<syntaxhighlight lang="python"> S.ascription_filters = [

   ("regular", ["nmodar", "nmodar:::lstyle"]),
   ("emergency", ["nmodar:~alice,bob"]),

]

The regular filter now reads: pass the message if it has not been modified after the last (general) review and has not been modified after the last style review.

Simple combination of predefined selectors by AND-conditions may not be sufficient for more involved scenarios. When this is the case, the coordinator may write (or ask someone to write) a custom selector in Python, and plug it in as the second element in the filter tuple (instead of the list of predefined selectors).

Writing a Selector Function

((To be written.))