Localization/Workflows/PO Ascription: Difference between revisions

From KDE TechBase
m (Text replace - "<code>" to "<syntaxhighlight lang="text">")
m (Text replace - "</code>" to "</syntaxhighlight>")
Line 66: Line 66:
                     messages/
                     messages/
                     docmessages/
                     docmessages/
</code>
</syntaxhighlight>


The team coordinator already has this part of the repository tree locally due to regular summit operations. For the same reason Pology is already set up. Setting up the ascription system is now simple. The file <tt>ascription-config</tt> is created in the parent directory of the summit:
The team coordinator already has this part of the repository tree locally due to regular summit operations. For the same reason Pology is already set up. Setting up the ascription system is now simple. The file <tt>ascription-config</tt> is created in the parent directory of the summit:
Line 77: Line 77:
             messages/
             messages/
             docmessages/
             docmessages/
</code>
</syntaxhighlight>


with the following contents:
with the following contents:
Line 119: Line 119:


# ...and so on.
# ...and so on.
</code>
</syntaxhighlight>


Some notes:
Some notes:
Line 141: Line 141:
name = Unknown Hero
name = Unknown Hero
original-name = Незнани јунак
original-name = Незнани јунак
</code>
</syntaxhighlight>


and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:
and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:
Line 148: Line 148:
$ cd $KDEREPO/trunk/l10n-support
$ cd $KDEREPO/trunk/l10n-support
$ poascribe commit -u uhero --all-reviewed -C LANG/summit/
$ poascribe commit -u uhero --all-reviewed -C LANG/summit/
</code>
</syntaxhighlight>


The argument <tt>commit</tt> is the ascription mode, and the <tt>-u</tt> option provides the user name to which ascriptions are made. This is an important point: ascriptions are made to a user defined in ascription configuration, and have nothing to do with VCS accounts; someone who has the account can commit in the name of someone who does not. It is the <tt>--all-reviewed</tt> option that declares all messages to be reviewed as well (note that it is normally used only this once, and not for normal day to day reviewing). The <tt>-C</tt> option prevents automatic adding and committing to version control, which is useful for this initial step. Finally the paths which contain all summit catalogs are given.
The argument <tt>commit</tt> is the ascription mode, and the <tt>-u</tt> option provides the user name to which ascriptions are made. This is an important point: ascriptions are made to a user defined in ascription configuration, and have nothing to do with VCS accounts; someone who has the account can commit in the name of someone who does not. It is the <tt>--all-reviewed</tt> option that declares all messages to be reviewed as well (note that it is normally used only this once, and not for normal day to day reviewing). The <tt>-C</tt> option prevents automatic adding and committing to version control, which is useful for this initial step. Finally the paths which contain all summit catalogs are given.
Line 168: Line 168:
obsolete/t      2965      2965
obsolete/t      2965      2965
obsolete/f      1626      1626
obsolete/f      1626      1626
</code>
</syntaxhighlight>


The number in parenthesis indicates how many messages have been ascribed in the given PO file (modified/reviewed), and at the end the totals are given. Ascribing the complete summit for the first time will take quite some time (on the order of 10-20 minutes).
The number in parenthesis indicates how many messages have been ascribed in the given PO file (modified/reviewed), and at the end the totals are given. Ascribing the complete summit for the first time will take quite some time (on the order of 10-20 minutes).
Line 179: Line 179:
$ poascribe commit -u bob --all-reviewed -C kdemultimedia/ kdeutils/
$ poascribe commit -u bob --all-reviewed -C kdemultimedia/ kdeutils/
$ ...
$ ...
</code>//-->
</syntaxhighlight>//-->


After the initial ascription has been made, the ascription tree will appear next to the summit tree. This tree will contain one ascription PO file for each summit PO file, with the same name and relative location within the tree:
After the initial ascription has been made, the ascription tree will appear next to the summit tree. This tree will contain one ascription PO file for each summit PO file, with the same name and relative location within the tree:
Line 203: Line 203:
                 ...
                 ...
             docmessages/
             docmessages/
</code>
</syntaxhighlight>


During the ascription some summit PO files may have been modified as well, in that any previous fields (<tt>#| ...</tt>) on translated messages have been removed. (These fields are sometimes erroneously left in by older PO editors.)
During the ascription some summit PO files may have been modified as well, in that any previous fields (<tt>#| ...</tt>) on translated messages have been removed. (These fields are sometimes erroneously left in by older PO editors.)
Line 212: Line 212:
$ svn add LANG/summit-ascript
$ svn add LANG/summit-ascript
$ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."
$ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."
</code>
</syntaxhighlight>


== Daily Use for Translators ==
== Daily Use for Translators ==
Line 223: Line 223:
[poascribe]
[poascribe]
user = alice
user = alice
</code>
</syntaxhighlight>


Translators can then submit updated PO files simply by substituting <tt>svn commit</tt> (or whatever the VCS commit command is) with <tt>poascribe commit</tt> (<tt>co</tt> or <tt>ci</tt> for short):
Translators can then submit updated PO files simply by substituting <tt>svn commit</tt> (or whatever the VCS commit command is) with <tt>poascribe commit</tt> (<tt>co</tt> or <tt>ci</tt> for short):
Line 242: Line 242:
Committed revision 1267069.
Committed revision 1267069.
$
$
</code>
</syntaxhighlight>


<tt>poascribe</tt> will add ascription records into ascription catalogs corresponding to summit catalogs to be committed, and commit them all. Like <tt>svn commit</tt>, <tt>poascribe ci</tt> can take any number of file or directory paths, and can be issued from any working directory (it will always find ascription catalogs). If default commit message has not been set in the ascription configuration, <tt>poascribe</tt> will ask for it; or it can be given in command line through <tt>-m</tt> option.
<tt>poascribe</tt> will add ascription records into ascription catalogs corresponding to summit catalogs to be committed, and commit them all. Like <tt>svn commit</tt>, <tt>poascribe ci</tt> can take any number of file or directory paths, and can be issued from any working directory (it will always find ascription catalogs). If default commit message has not been set in the ascription configuration, <tt>poascribe</tt> will ask for it; or it can be given in command line through <tt>-m</tt> option.
Line 254: Line 254:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe ci -u bob ...
$ poascribe ci -u bob ...
</code>
</syntaxhighlight>


For this to work, the translator who sent in the files has to be defined in the ascription configuration. There are no hidden costs or security issues to this (as opposed to opening a VCS account), so every new translator should be defined there before any work of that person is committed.
For this to work, the translator who sent in the files has to be defined in the ascription configuration. There are no hidden costs or security issues to this (as opposed to opening a VCS account), so every new translator should be defined there before any work of that person is committed.
Line 273: Line 273:
LANG/summit/messages/kdegames/palapeli.po  (12)
LANG/summit/messages/kdegames/palapeli.po  (12)
===== Diffed for review: 21
===== Diffed for review: 21
</code>
</syntaxhighlight>


Unreviewed messages have now been marked and diffed, inside the listed PO files. What is this about "diffing"? If the files had already been reviewed before, some of the messages modified since then (those marked for review) may have changed very little (e.g. a few words in a paragraph-length message, or even just punctuation). Therefore, for each message marked for review, Alice also wants to see the diff since last review to current version. Here are two messages in typical review states added by <tt>poascribe di</tt>:
Unreviewed messages have now been marked and diffed, inside the listed PO files. What is this about "diffing"? If the files had already been reviewed before, some of the messages modified since then (those marked for review) may have changed very little (e.g. a few words in a paragraph-length message, or even just punctuation). Therefore, for each message marked for review, Alice also wants to see the diff since last review to current version. Here are two messages in typical review states added by <tt>poascribe di</tt>:
Line 291: Line 291:
msgid "Click the pause button again to resume the game."
msgid "Click the pause button again to resume the game."
msgstr "Kliknite ponovo na dugme pauze da nastavite igru."
msgstr "Kliknite ponovo na dugme pauze da nastavite igru."
</code>
</syntaxhighlight>


and the first one in Kate:
and the first one in Kate:
Line 321: Line 321:
Committed revision 1284220.
Committed revision 1284220.
$
$
</code>
</syntaxhighlight>


Three things have happened here. First, all review states (flags, embedded diffs, etc.) have been removed, restoring the PO file to normal. Then, any modifications that Alice have made during review are ascribed to her (here 3 out of 21 messages). Finally, all marked messages are ascribed as reviewed by Alice (any with <tt>unreviewed</tt>/<tt>nrev</tt> flags would have been omitted here). When committing, the only summit catalog that got committed is the one with modifications made during review, and all the ascription catalogs were committed because of the reviews recorded in them.
Three things have happened here. First, all review states (flags, embedded diffs, etc.) have been removed, restoring the PO file to normal. Then, any modifications that Alice have made during review are ascribed to her (here 3 out of 21 messages). Finally, all marked messages are ascribed as reviewed by Alice (any with <tt>unreviewed</tt>/<tt>nrev</tt> flags would have been omitted here). When committing, the only summit catalog that got committed is the one with modifications made during review, and all the ascription catalogs were committed because of the reviews recorded in them.
Line 331: Line 331:
$ # ...only marked messages opened in Lokalize, review them...
$ # ...only marked messages opened in Lokalize, review them...
$ poascribe ci -f toreview.out
$ poascribe ci -f toreview.out
</code>
</syntaxhighlight>


=== Selecting Messages for Review ===
=== Selecting Messages for Review ===
Line 339: Line 339:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s modar PATHS...
$ poascribe di -s modar PATHS...
</code>
</syntaxhighlight>


Option <tt>-s</tt> is issuing the message ''selector''. <tt>modar</tt> is the default selector for <tt>diff</tt> mode, and stands for MODified-After-Review: it selects the earliest historical modification of the message after the last (or no) review of that message, if there is any such. By selecting a historical modification of the message, the diff from it to current version can be computed and embedded into the PO file, as in previous examples.
Option <tt>-s</tt> is issuing the message ''selector''. <tt>modar</tt> is the default selector for <tt>diff</tt> mode, and stands for MODified-After-Review: it selects the earliest historical modification of the message after the last (or no) review of that message, if there is any such. By selecting a historical modification of the message, the diff from it to current version can be computed and embedded into the PO file, as in previous examples.
Line 347: Line 347:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s branch:stable -s modar PATHS...
$ poascribe di -s branch:stable -s modar PATHS...
</code>
</syntaxhighlight>


It is important that the history selector is given last, because the last selector determines which historical message is selected. If the ordering had been reversed here, same messages would get selected, but they would not have embedded diffs, because <tt>branch</tt> is a shallow selector.
It is important that the history selector is given last, because the last selector determines which historical message is selected. If the ordering had been reversed here, same messages would get selected, but they would not have embedded diffs, because <tt>branch</tt> is a shallow selector.
Line 355: Line 355:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s modar:charlie PATHS...
$ poascribe di -s modar:charlie PATHS...
</code>
</syntaxhighlight>


If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review ''by her'' with second parameter to <tt>modar</tt>:
If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review ''by her'' with second parameter to <tt>modar</tt>:
Line 361: Line 361:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s modar::alice PATHS...
$ poascribe di -s modar::alice PATHS...
</code>
</syntaxhighlight>


Here the first parameter ("modified by..."), which is not needed, must be explicitly skipped, before going to the second parameter ("reviewed by..."). The third optional parameter of <tt>modar</tt> will be mentioned in the next section.
Here the first parameter ("modified by..."), which is not needed, must be explicitly skipped, before going to the second parameter ("reviewed by..."). The third optional parameter of <tt>modar</tt> will be mentioned in the next section.
Line 371: Line 371:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...
$ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...
</code>
</syntaxhighlight>


Negating a history selector produces a shallow selector: while <tt>modafter</tt> is history selector, <tt>nmodafter</tt> is shallow. But the order of the two in the previous command line is not important, as the last selector is the usual <tt>modar</tt>.
Negating a history selector produces a shallow selector: while <tt>modafter</tt> is history selector, <tt>nmodafter</tt> is shallow. But the order of the two in the previous command line is not important, as the last selector is the usual <tt>modar</tt>.
Line 379: Line 379:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe ci -s espan::246 PATHS...
$ poascribe ci -s espan::246 PATHS...
</code>
</syntaxhighlight>


(the first parameter to <tt>espan</tt> is the first entry number, given if messages are not to be selected from the first). There is also the counterpart <tt>lspan</tt> selector, which works with referent line numbers (those of <tt>msgid</tt> keywords) instead of entry numbers.
(the first parameter to <tt>espan</tt> is the first entry number, given if messages are not to be selected from the first). There is also the counterpart <tt>lspan</tt> selector, which works with referent line numbers (those of <tt>msgid</tt> keywords) instead of entry numbers.
Line 395: Line 395:
...
...
review-tags = lstyle
review-tags = lstyle
</code>
</syntaxhighlight>


(The value to <tt>review-tags</tt> is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.
(The value to <tt>review-tags</tt> is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.
Line 403: Line 403:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe di -s modar:::lstyle -t lstyle PATHS...
$ poascribe di -s modar:::lstyle -t lstyle PATHS...
</code>
</syntaxhighlight>


After finishing the review, Dan commits as usual:
After finishing the review, Dan commits as usual:
Line 409: Line 409:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe ci PATHS...
$ poascribe ci PATHS...
</code>
</syntaxhighlight>


If Dan is always going to review the language style, in order not to have to issue the selector and tag in the command line all the time, he can make them default per mode in <tt>~/.pologyrc</tt>:
If Dan is always going to review the language style, in order not to have to issue the selector and tag in the command line all the time, he can make them default per mode in <tt>~/.pologyrc</tt>:
Line 418: Line 418:
selectors/diff = modar:::lstyle
selectors/diff = modar:::lstyle
tags/diff = lstyle
tags/diff = lstyle
</code>
</syntaxhighlight>


With this Dan can use plain <tt>poascribe di</tt> just like Alice does.
With this Dan can use plain <tt>poascribe di</tt> just like Alice does.
Line 430: Line 430:
msgid "..."
msgid "..."
msgstr "..."
msgstr "..."
</code>
</syntaxhighlight>


Here Alice has made one review between Charlie's and Bob's modifications, and that review, being general instead of <tt>lstyle</tt>, did not cause <tt>modar</tt> to stop at it. After Dan reviews this message for language style, Alice runs selection for review and gets this:
Here Alice has made one review between Charlie's and Bob's modifications, and that review, being general instead of <tt>lstyle</tt>, did not cause <tt>modar</tt> to stop at it. After Dan reviews this message for language style, Alice runs selection for review and gets this:
Line 440: Line 440:
msgid "..."
msgid "..."
msgstr "..."
msgstr "..."
</code>
</syntaxhighlight>


Again, since <tt>lstyle</tt> reviews do not mix with general reviews, Dan's review did not hide Bob's modification that Alice did not check so far.
Again, since <tt>lstyle</tt> reviews do not mix with general reviews, Dan's review did not hide Bob's modification that Alice did not check so far.
Line 456: Line 456:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ svn commit LANG/summit/messages/ -m "Merged summit."
$ svn commit LANG/summit/messages/ -m "Merged summit."
</code>
</syntaxhighlight>


with the <tt>poascribe</tt> command in <tt>commit</tt> mode:
with the <tt>poascribe</tt> command in <tt>commit</tt> mode:
Line 462: Line 462:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ poascribe ci LANG/summit/messages/ -m "Merged summit."
$ poascribe ci LANG/summit/messages/ -m "Merged summit."
</code>
</syntaxhighlight>


Since the user is not explicitly given by <tt>-u</tt> option, this will ascribe merge modifications to the coordinator (more precisely, to the user set as default in <tt>~/.pologyrc</tt>), which is just fine. It is also possible to define a special user only for ascribing merge modifications, though there is no known advantage to that.
Since the user is not explicitly given by <tt>-u</tt> option, this will ascribe merge modifications to the coordinator (more precisely, to the user set as default in <tt>~/.pologyrc</tt>), which is just fine. It is also possible to define a special user only for ascribing merge modifications, though there is no known advantage to that.
Line 488: Line 488:
             docmessages/
             docmessages/
             docmessages.extras.summit
             docmessages.extras.summit
</code>
</syntaxhighlight>


For the simple case of all reviews being general reviews, the filter is added to summit configuration like this (anywhere within <tt>*.extras.summit</tt> file):
For the simple case of all reviews being general reviews, the filter is added to summit configuration like this (anywhere within <tt>*.extras.summit</tt> file):
Line 496: Line 496:
     ("regular", ["nmodar"]),
     ("regular", ["nmodar"]),
]
]
</code>
</syntaxhighlight>


Here the filter is named <tt>regular</tt>, and is defined as application of <tt>nmodar</tt> selector, the negation of <tt>modar</tt>. This simply means: pass all messages ''not'' modified after the last review.
Here the filter is named <tt>regular</tt>, and is defined as application of <tt>nmodar</tt> selector, the negation of <tt>modar</tt>. This simply means: pass all messages ''not'' modified after the last review.
Line 509: Line 509:
     ("emergency", ["nmodar:~alice,bob"]),
     ("emergency", ["nmodar:~alice,bob"]),
]
]
</code>
</syntaxhighlight>


By default, <tt>posummit</tt> uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the <tt>-a</tt> option:
By default, <tt>posummit</tt> uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the <tt>-a</tt> option:
Line 516: Line 516:
$ cd $KDEREPO/trunk/l10n-support
$ cd $KDEREPO/trunk/l10n-support
$ posummit scripts/messages.summit LANG scatter -a emergency
$ posummit scripts/messages.summit LANG scatter -a emergency
</code>
</syntaxhighlight>


What if several selectors are needed to pass the message? For example, the language style review (the earlier example with Alice and Dan) too may be requested for regular scattering, but omitted from emergency scattering. The filter setup for this scenario looks like this:
What if several selectors are needed to pass the message? For example, the language style review (the earlier example with Alice and Dan) too may be requested for regular scattering, but omitted from emergency scattering. The filter setup for this scenario looks like this:
Line 525: Line 525:
     ("emergency", ["nmodar:~alice,bob"]),
     ("emergency", ["nmodar:~alice,bob"]),
]
]
</code>
</syntaxhighlight>


The regular filter now reads: pass the message if it has not been modified after the last (general) review ''and'' has not been modified after the last style review.
The regular filter now reads: pass the message if it has not been modified after the last (general) review ''and'' has not been modified after the last style review.

Revision as of 20:57, 29 June 2011


Localization/Workflows/PO_Summit


Reviewing by Ascriptions
On Localization   Workflows
Prerequisites   Translating in Summit
Related Articles   Pology, Language Coordinator
External Reading   n/a

Why Review Translations?

Especially to new translators, it may not be obvious to which extent the translation needs to be reviewed. If the translator has exercised due diligence, how "wrong" can the translation be? Even if the translator has good command of the source language (English in context of this article), the answer is "very wrong", when all aspects are considered. Here are some of them.

With comparatively simple grammar of English, the meaning of a short English sentence -- as typically encountered in application user interfaces -- is very dependent on the surrounding context. This context may not be obvious when the translator is going through isolated messages in the translation file, so he may commit the worst of errors from the user's viewpoint, the senseless translation. An experienced reviewer will have developed sense for troublesome contexts, and will have several means to decisively determine the context (including, for example, running the development version of the application).

Even if the context is correctly established, the translator may use "wrong" terminology, which is the next worse thing for the user. A term used in translation does not need to be wrong by itself, in fact it may be exactly the correct term -- in another translation project. The reviewer will have more experience with terminology of the present project, and be able to bring the translation in line with it.

Style in the technical sense is a consistent choice between several perfectly valid constructs in target language when applied to text in the given technical context. For example, how to translate menu titles and items, button labels, or tooltips. The choices may include noun or verb forms, particular grammar categories, tone of address, and so on. There may be a style guide to the project which details such choices, and the reviewer will know it well.

Style in the linguistic sense is especially applicable to longer texts, such as tooltips in user interfaces, and passages in documentation. A typical error of a new translator is to closely adhere to English style and grammar. This may produce translation which is semantically and grammatically valid in the target language, but very out of style -- the "translationese". Reviewer is there to naturalize such passages.

Finally, the reviewer may be an experienced translator, but that does not mean that his own translations need no review. Immersion into the source language, distraction, fatigue, will lead the reviewer into any of the above errors in translation, only with less frequency. So reviewers should also review one anothers' translations.

Classical Reviewing by Stages

Classical review workflow by stages seems simple enough. Translator translates a PO file (or updates existing translation), and declares it ready to review. A reviewer reviews it, and declares it ready to "commit". Committing here should be understood generally, as inclusion into the pool from which translations are periodically shipped to end users. A committer finally commits the file. The process is iterative: the reviewer may return the file to the translator, and translator later again declare it as ready for review. There may be several stages of review (such as proof-reading, approving), each of which may return the translation to a previous stage, or forward it to some special stage. The process can also be more finer grained, where each message in the file goes through stages separately.

Regardless of the particularities, workflows of this kind all have the following in common. Members of the translation team are assigned roles -- such as translator, reviewer, approver, committer -- by which they enter into the workflow (single person can have more roles). The later review stages must wait for the earlier stages to complete, and the translation cannot be updated again before the current version clears the pipeline (or the pipeline is aborted). Most importantly, once the translation is committed, it becomes part of simply "admitted" translations, with no further qualifiers.

The system of prescribed roles requires that team members assign them between themselves, stick to them, and shuffle them along the way. The prescribed review pipeline requires a tool to enforce and keep track of the stages in which translations are. This makes the review workflow complex and rigid, most probably with choke points for efficiency. Distribution of roles may become disbalanced by people coming and going, or the workflow tool may be prohibitive to some scenarios (e.g. single translator making small adjustments in dozens of files across the project, but having to upload each manually through a web interface).

Of course, "rigid", "complex", "inefficient", are comparative qualifications, so what is it that the classical review by stages can be compared to in this way?

Reviewing by Ascriptions

Reviewing by ascriptions is even simpler conceptually, and yet less rigid, less complex, and much more efficient than the review by stages. It works on the message-level, rather than file-level. Anyone can simply translate some messages and directly commit modified files, without any review, but with ascribing modifications to own name. Anyone can review any committed messages at any moment, commit the modifications-on-review and ascribe reviews to own name and (possibly) to certain class -- full review, review of context, of terminology, of style, etc. Only when the translation is to be shipped to end users, the insufficiently reviewed messages are automatically omitted from the package, by evaluating the ascription history of each message.

Most importantly, based on the ascription history, the reviewer can select only some particular messages, and review only the difference between their historical and current versions. For example, Alice can select to review only messages modified since she or Bob had last reviewed them for style; she could see the difference from that last review to current version, e.g. if in the whole paragraph only a single word has changed by Charlie when he reviewed the terminology. In terms of PO workflow, the ascription history propagates through merges, so the reviewer can compare the change in original and the change in translation since the last review, to judge if one fits the other.

Since everyone just commits, translations can be efficiently kept in a version control repository, with the ascription system added on top. After having done some translating, the team member simply substitutes commit command of the version control system (VCS) with ascribe-modifications command of the ascription system (AS, which calls the underlying VCS internally). After reviewing, the team member uses ascribe-reviews command of the AS to commit reviews to ascription history (as well as modifications made during the review). To select messages for review, the team member issues diff-for-review command of the AS (with suitable parameters to narrow the set) and selected messages are marked in-place in PO files and embedded with differences, and possibly popped open in a PO editor.

When the translations are to be released, the team coordinator issues filter-for-release command of the AS, which takes the working PO files and creates final PO files with insufficiently reviewed messages removed. "Release time" is used here only figuratively: this should be a fully automatic process, so it can be performed at any interval of convenience.

What constitutes "sufficient review" can be defined in fine detail. It could be specified that messages modified by Alice need to have only review for terminology, but not necessarily for style; Charlie may belong to the group which needs to be reviewed on style, but not necessarily on context; Bob's reviews for style may be nice to have, but never blocking if missing. These decisions do not preclude released messages to be reviewed later on missing points, after higher priority reviews have been completed. The definition of sufficiency may be changed at any point, e.g. as team members get more experienced and require less review, without interfering with direct translation and review work.


In summary, with reviewing by ascriptions the lean efficiency of raw VCS operation is preserved while providing for great flexibility of review. All team members can be given commit access, no web or email detours are needed. There are no prescribed roles, but an equivalent of role assignment happens automatically at last possible moment, and can take into account both translators' and reviewers' abilities. There is no staging between completing and committing the translation, which enables translator to keep on polishing the translation undisturbed until the reviewer comes around. There is no inefficiency in handling small changes throughout many files, since single AS command commits all changes just as single VCS command would. AS in effect abstracts VCS, so general team members do not have to know the particularities of the underlying VCS. On commit operations, AS can also apply checks (e.g. decline to commit syntactically invalid PO files) and modifications (e.g. update translator's data in the PO header).

Ascription System in Pology

Pology is a collection of various modular tools for supporting translation based on PO files. Among them is the script poascribe, which implements an ascription system (AS); at present, it can use Subversion or Git as the underlying VCS. poascribe is still in experimental stage, so what follows is a brief description of how to use it in context of the KDE translation project. However, very little is truly specific to KDE; the only major assumption is that there exists a VCS repository with PO files of a given language grouped together, and that the translation team can use it without special restrictions.

Very important for the AS is how branches are handled (in KDE, the rolling trunk and stable branches). AS can in principle be deployed by branch, but then there is the added complexity of porting translations between branches, which ascriptions should follow. Therefore, the AS implemented by poascribe is currently limited to assumption that there is a single branch of translations at all times. The article "Translating in Summit" explains how a KDE translation team can set up and operate such a single branch, the summit, and this is the prerequisite for the following instructions. (Note that the summit system is useful on its own, and should be conductive to any kind of review workflow.)

Setting Up

The summit branch for the language LANG is positioned like this in the KDE repository:

$KDEREPO/

   trunk/
       l10n-support/
           LANG/
               summit/
                   messages/
                   docmessages/

</syntaxhighlight>

The team coordinator already has this part of the repository tree locally due to regular summit operations. For the same reason Pology is already set up. Setting up the ascription system is now simple. The file ascription-config is created in the parent directory of the summit:

...

   LANG/
       ascription-config
       summit/
           messages/
           docmessages/

</syntaxhighlight>

with the following contents:

  1. ---------------------------
  2. Global ascription settings.

[global]

  1. Roots of the catalog and ascription trees.

catalog-root = summit ascript-root = summit-ascript

  1. The underlying version control system.

version-control = svn

  1. Data for updating catalog headers.
  2. - language code

language = LANG

  1. - full language name

language-team = LANGUAGE

  1. - email address of the team

team-email = [email protected]

  1. Default commit message.

commit-message = Translation updates.

  1. -----------------------
  2. Registered translators.

[user-alice] name = Alice Akmalryn original-name = Алиса Акмалрин email = [email protected]

[user-bob] name = Bob Byomkin original-name = Бобан Бјомкин email = [email protected]

  1. ...and so on.

</syntaxhighlight>

Some notes:

  • The ascript-root setting should be exactly summit-ascript, for the reason mentioned later.
  • commit-message field, if defined, allows team members to commit without providing a commit message. The value given by this field will be used by default, with translator's user name appended to the end in special syntax. For example: Translation updates. [>alice]. (Translator's user name is also appended to manually supplied commit messages.) Translators can still supply a commit message when they wish, as shown later. If this field is not set, the commit message is supplied as usual on committing.
  • Team members are defined by [user-USERNAME] sections. Ascription user names can be any valid ASCII identifier: ASCII letters, digits and underscores only, digit cannot be the first character. Ascription user names have no technical relation to the underlying VCS accounts, though it is mnemonically convenient if they are the same (in case of SVN). This means that a translator who does not have a VCS account (yet) can and should be added here, with assigned user name (best one suitable as SVN account name later); why this should be done will be explained later.
  • original-name field in user sections is there in case the preferred renderings of the name in English and in target language are not the same. When this is not the case, original-name can be omitted.

As soon as the ascription-config file is committed, the ascription system is ready for operation. Only regular modifications to this file are those of adding new team members. (On the other hand, team members should never be removed, because even after they no longer contribute, their ascription records remain in the system.)

Initial Ascription

The most common situation at start of ascription workflow is that there already exists a body of translations, contributed to by many different people over time. The coordinator should ascribe all existing translations as initial modifications, but to whom? It cannot be said precisely who translated what. The solution is to introduce a generic user in ascription-config, suitably known as "Unknown Hero" (or "Lost Translator", you can be inventive):

[user-uhero]
name = Unknown Hero
original-name = Незнани јунак

and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:

$ cd $KDEREPO/trunk/l10n-support
$ poascribe commit -u uhero --all-reviewed -C LANG/summit/

The argument commit is the ascription mode, and the -u option provides the user name to which ascriptions are made. This is an important point: ascriptions are made to a user defined in ascription configuration, and have nothing to do with VCS accounts; someone who has the account can commit in the name of someone who does not. It is the --all-reviewed option that declares all messages to be reviewed as well (note that it is normally used only this once, and not for normal day to day reviewing). The -C option prevents automatic adding and committing to version control, which is useful for this initial step. Finally the paths which contain all summit catalogs are given.

When the poascribe command is issued, a progress bar will appear, and the following output will start to unfold:

LANG/summit/messages/extragear-base/rellinks.po (50/50) LANG/summit/messages/extragear-base/autorefresh.po (13/13) LANG/summit/messages/extragear-base/babelfish.po (38/38) ... LANG/summit/messages/qt/libphonon.po (13/13) LANG/summit/messages/qt/phonon-xine.po (24/24) LANG/summit/messages/qt/phonon_gstreamer.po (12/12) ===== Ascription summary: - modified reviewed translated 111775 111775 fuzzy 26943 26943 obsolete/t 2965 2965 obsolete/f 1626 1626 </syntaxhighlight>

The number in parenthesis indicates how many messages have been ascribed in the given PO file (modified/reviewed), and at the end the totals are given. Ascribing the complete summit for the first time will take quite some time (on the order of 10-20 minutes).


After the initial ascription has been made, the ascription tree will appear next to the summit tree. This tree will contain one ascription PO file for each summit PO file, with the same name and relative location within the tree:

...

   LANG/
       ascription-config
       summit/
           messages/
               kdelibs/
                   kcertpart.po
                   kdelibs4.po
                   ...
               ...
           docmessages/
       summit-ascript/
           messages/
               kdelibs/
                   kcertpart.po
                   kdelibs4.po
                   ...
               ...
           docmessages/

</syntaxhighlight>

During the ascription some summit PO files may have been modified as well, in that any previous fields (#| ...) on translated messages have been removed. (These fields are sometimes erroneously left in by older PO editors.)

The newly created ascription tree (and any modifications to summit tree) can now be committed as usual:

$ svn add LANG/summit-ascript
$ svn commit LANG/summit LANG/summit-ascript -m "Initial ascription."

Daily Use for Translators

Team members other than the coordinator, whether translators or reviewers, need to keep around only the trunk/l10n-support/LANG/ directory. But they always need to update this directory fully (rather than just one particular module or file under .../*messages/), so that the summit tree and the ascription tree (and configuration) are kept in sync.

In order not to have to issue their own user name (-u option to poascribe) all the time, translators can set it in Pology user configuration ~/.pologyrc, in [poascribe] section:

[poascribe] user = alice </syntaxhighlight>

Translators can then submit updated PO files simply by substituting svn commit (or whatever the VCS commit command is) with poascribe commit (co or ci for short):

$ poascribe ci LANG/summit/messages/kdefoo/*fooapp*.po
LANG/summit/messages/kdefoo/fooapp.po  (144)
LANG/summit/messages/kdefoo/libfooapp.po  (25)
===== Ascription summary:
-           modified
translated       169
>>>>> VCS is committing catalogs:
Sending      LANG/summit/messages/kdefoo/fooapp.po
Sending      LANG/summit/messages/kdefoo/libfooapp.po
Sending      LANG/summit-ascript/messages/kdefoo/fooapp.po
Sending      LANG/summit-ascript/messages/kdefoo/libfooapp.po
Transmitting file data ....
Committed revision 1267069.
$

poascribe will add ascription records into ascription catalogs corresponding to summit catalogs to be committed, and commit them all. Like svn commit, poascribe ci can take any number of file or directory paths, and can be issued from any working directory (it will always find ascription catalogs). If default commit message has not been set in the ascription configuration, poascribe will ask for it; or it can be given in command line through -m option.

Translators Without Commit Access

With the ascription system in place, every regular team member should have commit access. But, there may be some period of time before new translators are given accounts, revision control may be too technical for some, and even those with the account may not be able to commit temporarily for some reason.

These translators may send in their work by email, to any team member with commit access (not necessarily the coordinator or a reviewer); this team member can commit received files without any review, as review can be conducted at any later time. If Bob sends some files to Alice, she can commit them immediately by stating Bob's user name:

$ poascribe ci -u bob ...

For this to work, the translator who sent in the files has to be defined in the ascription configuration. There are no hidden costs or security issues to this (as opposed to opening a VCS account), so every new translator should be defined there before any work of that person is committed.

Daily Use for Reviewers

The ascription system opens up all sorts of possibilities for concrete review patterns. Reviewers should keep in mind that for each message the full modification and review history is available, so that the team can think about how to make good use of it. Therefore, what follows are some examples to illustrate the review facilities that poascribe provides.

Basic Reviewing

At the very basic level (which is the only level in classical review by stages), messages can be classified into simply unreviewed and reviewed, without further qualifiers. Alice now wants to review all unreviewed messages in a group of PO files, say kdetoys module. She issues (di is short for diff):

$ cd $KDEREPO/trunk/l10n-support
$ poascribe di LANG/summit/messages/kdetoys/
LANG/summit/messages/kdegames/bovo.po  (2)
LANG/summit/messages/kdegames/kdiamond.po  (7)
LANG/summit/messages/kdegames/palapeli.po  (12)
===== Diffed for review: 21

Unreviewed messages have now been marked and diffed, inside the listed PO files. What is this about "diffing"? If the files had already been reviewed before, some of the messages modified since then (those marked for review) may have changed very little (e.g. a few words in a paragraph-length message, or even just punctuation). Therefore, for each message marked for review, Alice also wants to see the diff since last review to current version. Here are two messages in typical review states added by poascribe di:

  1. . +> trunk stable
  2. . ascto: charlie:m
    gui/mainwindow.cc:372
  3. , ediff

msgid "GAME OVER. {-You won-}{+Tie+}!" msgstr "KRAJ IGRE. {-Pobeda-}{+Nerešeno+}!"

  1. . +> trunk stable
  2. . ascto: bob:m charlie:m
    game-state.cpp:117
  3. , ediff-total

msgid "Click the pause button again to resume the game." msgstr "Kliknite ponovo na dugme pauze da nastavite igru." </syntaxhighlight>

and the first one in Kate:

Message diffed for review by poascribe in Kate.
Message diffed for review by poascribe in Kate.

In the first message, the first to note is the #. ascto: comment. This comment succinctly lists who did what with the message since the last review; here charlie:m means that Charlie is the one who modified it. Then, there is the ediff flag, which alice can use it to jump through messages marked for review. Finally, the original and translation have been diffed; here they show that, since the last review, the message was fuzzied by changing "You won" to "Tie", and what Charlie did in translation to unfuzzy it. Even on a message as short as this, the diff tells something useful to Alice: the phrase "Game over" likely has a formulaic translation, and the fact that it is not part of the diff means that the earlier reviewer had made sure it is consistent, so Alice does not have to check that.

The #. ascto: comment of the second message reveals that both Charlie and Bob had been translating it. ediff-total flag instead of plain ediff means that this message had no review at all up to now, so there are no embedded diffs in text fields.

Alice can now go through marked files and messages, review translations, and possibly make modifications. When making changes in a message with embedded diffs, she can freely edit text outside of difference segments and within {+...+} segments (as these are the ones which belong to current version of the text). While reviewing, Alice does not remove any of the added message elements while reviewing (save for an occasional difference segment, when translation should be modified), as these elements are needed for later. If a message is particularly hard and Alice wants to defer its review for later, she can add the unreviewed (or nrev for short) flag to it.

Once the review is complete, Alice simply commits the reviewed files:

$ poascribe ci LANG/summit/messages/kdetoys/
LANG/summit/messages/kdegames/bovo.po  (0/2)
LANG/summit/messages/kdegames/kdiamond.po  (0/7)
LANG/summit/messages/kdegames/palapeli.po  (3/12)
===== Ascription summary:
-           modified  reviewed
translated         3        21
>>>>> VCS is committing catalogs:
Sending      LANG/summit/messages/kdegames/palapeli.po
Sending      LANG/summit-ascript/messages/kdegames/bovo.po
Sending      LANG/summit-ascript/messages/kdegames/kdiamond.po
Sending      LANG/summit-ascript/messages/kdegames/palapeli.po
Transmitting file data ....
Committed revision 1284220.
$

Three things have happened here. First, all review states (flags, embedded diffs, etc.) have been removed, restoring the PO file to normal. Then, any modifications that Alice have made during review are ascribed to her (here 3 out of 21 messages). Finally, all marked messages are ascribed as reviewed by Alice (any with unreviewed/nrev flags would have been omitted here). When committing, the only summit catalog that got committed is the one with modifications made during review, and all the ascription catalogs were committed because of the reviews recorded in them.

When many files with few changes in each are to be reviewed, it becomes burdensome to manually open each and every diffed for review, and then to make sure that all are committed with poascribe ci. To make this easier, -w torevivew.out option can be added to poascribe di, which requests that paths of all diffed PO files are written into torevivew.out file. This file can then be used to batch open POs for review in the editor, as well as fed back on poascribe ci with -f torevivew.out. There is also the -o option which causes poascribe to directly open PO files in a PO editor (though this is currently applicable only to Lokalize). Putting it together, to efficiently review a whole bunch of small changes throughout many files, Alice can:

$ poascribe di PATHS... -w toreview.out -o lokalize
$ # ...only marked messages opened in Lokalize, review them...
$ poascribe ci -f toreview.out

Selecting Messages for Review

Invocations of poascribe di without any options, as in the previous section, were actually equivalent to this:

$ poascribe di -s modar PATHS...

Option -s is issuing the message selector. modar is the default selector for diff mode, and stands for MODified-After-Review: it selects the earliest historical modification of the message after the last (or no) review of that message, if there is any such. By selecting a historical modification of the message, the diff from it to current version can be computed and embedded into the PO file, as in previous examples.

There are various specialized selectors, and fall into two groups: shallow selectors and history selectors. Shallow selectors look only into the current version of the message, and cannot select historical versions, which means that they cannot provide embedded diffs. History selectors (modar is of this type) can select messages from history and provide diffs. Several selectors can be issued on the command line, and the message is selected only if all selectors select it. Shallow selectors are then normally used as a pre-filter for history selectors. For example, to select messages modified after last reviewed, but only those found in stable branch, branch and modar selectors are chained:

$ poascribe di -s branch:stable -s modar PATHS...

It is important that the history selector is given last, because the last selector determines which historical message is selected. If the ordering had been reversed here, same messages would get selected, but they would not have embedded diffs, because branch is a shallow selector.

Selectors can take parameters themselves, like branch:stable in the previous example. Parameters are separated from the selector name by any non-alphanumeric character; this is colon by convention, but if a parameter contains a colon, something like slash, tilde, etc. can be used. Number of parameters can be variable, and modar in particular can take from none to three. If Alice wants to review only those messages modified by Charlie since last review, she states this by first argument to modar:

$ poascribe di -s modar:charlie PATHS...

If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review by her with second parameter to modar:

$ poascribe di -s modar::alice PATHS...

Here the first parameter ("modified by..."), which is not needed, must be explicitly skipped, before going to the second parameter ("reviewed by..."). The third optional parameter of modar will be mentioned in the next section.

When a selector parameter is a user name, normally it can also be a comma-separated list of user names (modar:bob,charlie) or prefixed with tilde to negate, i.e. select all other users (modar:~alice).

Any selector can be negated by prepending n to its name. For example, the history selector modafter:DATE selects first modification after the given date; to select messages modified after last review, but only if modified during June 2010:

$ poascribe di -s modafter:2010-06 -s nmodafter:2010-07 -s modar PATHS...

Negating a history selector produces a shallow selector: while modafter is history selector, nmodafter is shallow. But the order of the two in the previous command line is not important, as the last selector is the usual modar.

Selectors can be issued in other modes too. If the PO file is big and Alice has reviewed messages up to and including entry 246 when she has to pause until another day, she can commit reviews only up to this entry by issuing the espan selector:

$ poascribe ci -s espan::246 PATHS...

(the first parameter to espan is the first entry number, given if messages are not to be selected from the first). There is also the counterpart lspan selector, which works with referent line numbers (those of msgid keywords) instead of entry numbers.

Fine-Grained Reviews

In the introduction, several distinct types of what can go wrong in translation were described. Not all reviewers may be able to check translation against all those problems. Here is a typical scenario of this kind:

Alice is very computer-savvy and knows the translation project inside and out, which means that she can review well for context, terminology, and technical style. But, her language style leaves something to be desired, which shows through longer sentences and passages. Dan, on the other hand, is a very literary person, but not that much into the technical aspects. Dan's style reviews would thus be a perfect complement to Alice's general reviews.

poascribe can support this scenario in the following way. A review type tag for language style is defined in the ascription configuration, using the review-tags field:

[global] ... review-tags = lstyle </syntaxhighlight>

(The value to review-tags is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.

Dan selects messages for review similarly to Alice, but aditionally giving the lstyle tag as third parameter of modar, and indicating that ascribed reviews should be tagged as lstyle:

$ poascribe di -s modar:::lstyle -t lstyle PATHS...

After finishing the review, Dan commits as usual:

$ poascribe ci PATHS...

If Dan is always going to review the language style, in order not to have to issue the selector and tag in the command line all the time, he can make them default per mode in ~/.pologyrc:

[poascribe] user = dan selectors/diff = modar:::lstyle tags/diff = lstyle </syntaxhighlight>

With this Dan can use plain poascribe di just like Alice does.

The important point of review tags is that they make reviews by types independent. For example, Dan may come around to review the language style of the given message after several modifications and general reviews have been ascribed to it -- modar:::lstyle will simply ignore all reviews except for lstyle reviews. This is going to be reflected in the ascto: comment to marked messages:

  1. ...
  2. . ascto: charlie:m alice:r bob:m
  3. ...

msgid "..." msgstr "..." </syntaxhighlight>

Here Alice has made one review between Charlie's and Bob's modifications, and that review, being general instead of lstyle, did not cause modar to stop at it. After Dan reviews this message for language style, Alice runs selection for review and gets this:

  1. ...
  2. . ascto: bob:m dan:r(lstyle)
  3. ...

msgid "..." msgstr "..." </syntaxhighlight>

Again, since lstyle reviews do not mix with general reviews, Dan's review did not hide Bob's modification that Alice did not check so far.

(General review too has a tag assigned, the empty string, in case the reviewer needs to explicitly issue it in some context.)

Daily Use for The Coordinator

After setting up the ascription system, the team coordinator should have to do very little to maintain it.

Ascribing Merges

Modifications made to summit catalogs by merging with templates must also be ascribed. Therefore, after merging the summit (posummit ... merge ...) the coordinator substitutes the VCS command:

$ svn commit LANG/summit/messages/ -m "Merged summit."

with the poascribe command in commit mode:

$ poascribe ci LANG/summit/messages/ -m "Merged summit."

Since the user is not explicitly given by -u option, this will ascribe merge modifications to the coordinator (more precisely, to the user set as default in ~/.pologyrc), which is just fine. It is also possible to define a special user only for ascribing merge modifications, though there is no known advantage to that.

Since -C option is not issued, poascribe will automatically commit all modified summit and ascription catalogs when done.

Shuffling Ascription Catalogs

Sometimes summit catalogs are shuffled in the repository: moved to another module, renamed, one catalog split into two, two catalogs merged into one. Such shuffling should be exactly mirrored in the ascription tree, and this too is done on the repository side, at the same time. This relies on the ascription root being set exactly to summit-ascript in the ascription configuration. So the team coordinator has nothing special to do here.

If instead in the central KDE repository the translation team is working in an external repository, by consequence the ascription system must be set up in that repository. But so long as process_orphans.sh script from trunk/l10n-support/scripts/ is used to shuffle catalogs in the external repository as well, the ascription catalogs will be properly handled.

Filtering for Release

The last component of the ascription system is how to prevent insufficiently reviewed messages from leaking into a release. In context of Pology and summit workflow, poascribe itself is not used directly to this end. Instead, in the summit configuration (as opposed to ascription configuration), the team coordinator defines filters which pass messages by applying selectors.

Each top level PO tree has its own summit configuration file, named MSGTREE.extras.summit:

...

   LANG/
       summit/
           messages/
           messages.extras.summit
           docmessages/
           docmessages.extras.summit

</syntaxhighlight>

For the simple case of all reviews being general reviews, the filter is added to summit configuration like this (anywhere within *.extras.summit file):

S.ascription_filters = [
    ("regular", ["nmodar"]),
]

Here the filter is named regular, and is defined as application of nmodar selector, the negation of modar. This simply means: pass all messages not modified after the last review.

When the team coordinator scatters to branches (executes posummit scatter), messages from summit POs which do not pass this filter will not be sent to branch POs. The count of stopped messages by branch PO will be reported in the output as scattering proceeds.

Why did we have to name the filter regular? (Those knowing some Python will also notice that it is defined as a list element.) Because it is possible to define more than one filter, and select which one is used on each scattering. For example, the coordinator may wish that, when the release is near and time is short to review everything, messages from a few experienced translators can be passed into release without review. If those translators are Alice and Bob, an "emergency" filter can be defined like this:

S.ascription_filters = [
    ("regular", ["nmodar"]),
    ("emergency", ["nmodar:~alice,bob"]),
]

By default, posummit uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the -a option:

$ cd $KDEREPO/trunk/l10n-support
$ posummit scripts/messages.summit LANG scatter -a emergency

What if several selectors are needed to pass the message? For example, the language style review (the earlier example with Alice and Dan) too may be requested for regular scattering, but omitted from emergency scattering. The filter setup for this scenario looks like this:

S.ascription_filters = [
    ("regular", ["nmodar", "nmodar:::lstyle"]),
    ("emergency", ["nmodar:~alice,bob"]),
]

The regular filter now reads: pass the message if it has not been modified after the last (general) review and has not been modified after the last style review.

Simple combination of predefined selectors by AND-conditions may not be sufficient for more involved scenarios. When this is the case, the coordinator may write (or ask someone to write) a custom selector in Python, and plug it in as the second element in the filter tuple (instead of the list of predefined selectors).

Writing a Selector Function

((To be written.))