< Localization
Revision as of 23:44, 23 November 2008 by Ilic (talk | contribs) (Figures centered.)

Localization/Workflows/PO Summit


Translating in Summit
On Localization   Workflows
Prerequisites   Language Coordinator
Related Articles   Pology
External Reading   n/a

Software Branches and Translation

The obvious approach to releasing software is for developers to focus on one central body of code, a single "branch", fixing bugs and adding new features to it, and from time to time taking a "snapshot" of the branch and packing it as next release. One approach to make the release process more robust, is for development to proceed in two parallel branches. From the "stable" branch the actual releases are made, mostly with bugs fixed and only very important features added. The other, "unstable" branch, is used to develop new and redesign existing features, and at some point will become the next stable branch. Depending on the project, releases may be made from the unstable branch as well, in parallel with stable releases, in order for eager users to help testing the novelties.

In KDE, all three release models may be present at any given moment. Core KDE modules, which are together labeled as "the KDE" and released in unison, follow the two branch model, with stable and trunk branch, and releases from stable branch only. KDE extragear applications may do the same, but they are not required to; they may instead make releases from the trunk branch only, or also use stable and trunk branch, but make releases from both. This presents translators with the workflow as on the following figure.

Classic translation by branches

The KDE repository automation (aka "Scripty") is preparing template POs and merging them with language POs. From the point of view of language teams, real people who perform global actions (e.g. moving around POs for all languages when an application moves from module to another) can also be grouped here. However, the translation team is still presented with two branches of POs to translate.

In general, this means that, just like programmers, at times translators have to work on both branches, and propagate fixes from one branch to another ("backport" from trunk to stable, or "forwardport" from stable to trunk). This may be prone to coordinatorial confusion, and demand extra effort and attention when updating translations. Keeping up with two branches is easier in core KDE modules, as they are released from stable branch only and with singular schedule, but could be more taxing with extragear applications.

The exact amount of effort spent due to parallel translation, porting of fixes, and coordination, depends on the translation team. For example, a well-manned and well-coordinated team, with established style and terminology guides, custom automation to support these, may be able to keep all POs in both branches fully translated at all times with ease. Or, a team may have worked out well-defined schedules, such that any given member of the team at one point switches from translating one branch to another, without looking back, thus effectively having everyone always working on one branch (even if not the same one).

If, however, you as the team coordinator have said and thought "Should do that in trunk too, bugger", "Didn't we fix that already?", "No, you should take it from stable", "It was released from WHAT branch?", more often than you would have liked, the following presents one possibility for sidestepping such issues.

Translating in Summit

For a PO catalog which exists in both stable and trunk branch, a new same-named catalog can be made by gathering all the unique messages from branch PO files, i.e. the summit of branch POs. Assuming that branch POs were not all that different, since they are two versions of the same catalog, the summit PO shouldn't have much more messages than either. Translators at all times work on the summit PO, from which the messages are periodically scattered back to original branch POs. Thus, translators can always work on summit POs, not having to do any parallel translation, branch switching or porting of fixes. The summit workflow is presented by the following figure.

Translation in summit

In the summit mode, repository automation gathers summit PO templates from branch templates, and stores them separately from real branches, at trunk/l10n-support/templates/summit/. The language team also has a collection summit POs, at trunk/l10n-support/LANG/summit/, which is the sole location where translation happens. As before, repository automation handles merging of templates in branches, but merging of summit POs is done by the team coordinator; team coordinator can opt to manually merge branch POs as well (more on why-and-how of this later). From time to time, the team coordinator fills out branch POs by scattering from summit. Not to worry, each of these special actions upon the team coordinator is done with a single command.

While it is in principle obvious that two branch POs can be made into one summit PO with the union of their messages, there are some important details that should be handled the right way by the summitting system:

  • What if a PO changes modules in one branch, so that it no longer belongs to the same module in both branches (e.g. application is moved from one to another module in trunk)?
  • What if a PO changes its name in one branch, but not in the other (e.g. application is renamed in trunk)?
  • What if a PO in one branch is split into two POs in another (e.g. extracting a library out of monolithic application in trunk)?
  • Where to place messages unique to one branch in the summit PO? The original file context of a message, which messages precede it and which follow it, should be kept as much as possible.
  • If source references are used to achieve good ordering of messages in summit PO, what to do if some source file paths change in one branch (e.g. application gets restructured in trunk)?
  • How to handle messages with different plurality across branches (since messages are identified only by their msgctxt and msgid field, an not msgid_plural)?

And handle these details it does, the present summitting system. This also means that teams working in the summit need not take care of the first three issues above, which are affecting manual branch handling too.

Summit POs are normal, fully valid POs in their own right. A message in a summit PO is different from branch PO only by being equipped with another comment, #. +> ..., showing in which branches the message exists:

  1. . +> trunk

msgctxt "The destination url of a job" msgid "Destination:" msgstr "" ⁠

  1. . +> stable

msgid "Destination:" msgstr "" ⁠

  1. . +> trunk stable

msgid "&Keep this window open after transfer is complete" msgstr ""

The first message above thus exists in trunk only, the second in stable only, and the third in both branches. The source reference always points to the source file in the first listed branch. Any extracted comments (#.) other than the branch list are also taken from the first listed branch.

Note that the two messages above are different only by context; the context was added in trunk, but not in stable, in order not to break message freeze. However, due to careful ordering of messages in summit POs, these two messages appear together, allowing translator to immediately make correction in stable branch too if the new context in trunk shows it to be necessary.

Setting Up and Daily Operation

Before initializing language summit, the team coordinator has to have all the necessary paths checked out from the KDE repository, and structured on the local machine exactly as in the repository. If the path to the root of KDE repository on the local machine is $KDEREPO, and the language code LANG, then the structure should be as follows, with leaf directories checked out in full:



Summit operations are performed using the script posummit.py, which is part of Pology, residing in trunk/l10n-support/pology/. Therefore the first thing to do is to setup Pology, which amounts only to setting the proper path:

$ export PATH=$KDEREPO/trunk/l10n-support/pology/scripts:$PATH

To initialize the summit, by gathering from existing translation in branches, the team coordinator executes:

$ cd $KDEREPO/l10n-support $ posummit.py scripts/messages.summit LANG gather --create --force

Depending on the amount of translation, after some minutes the initial gathering will have been completed, and language summit located under $KDEREPO/trunk/l10n-support/LANG/summit/messages/. This is the only time when the coordinator performs the gather operation on language POs; it is daily done only on templates by repository automation. Then, the created language summit should be merged with current summit templates:

$ posummit.py scripts/messages.summit LANG merge

Merging the summit is something that the coordinator does periodically, with frequency of own desire. For example, it can be done daily, or with increasing frequency as the last day for translation for the next release approaches.

After the first merging, language summit is ready for active translation. The coordinator should now commit $KDEREPO/trunk/l10n-support/LANG/, and, importantly, notify team members to stop working on branch POs and focus exclusively on summit POs.

To scatter the summit, i.e. fill out POs in stable and trunk branch from the summit POs, the coordinator periodically executes:

$ posummit.py scripts/messages.summit LANG scatter

As with merging, there is no fixed schedule when scattering should be done. Of course, it must necessarily be done before the next release is tagged, and in between it is useful to scatter for runtime testing, or to have translation statistics by branches on l10n.kde.org up to date.

Periodic scattering and merging of the complete summit are basically all that a language team coordinator needs to do specifically to operate the summit. Also, since l10n-support/scripts/ and l10n-support/pology/ contain scripts and settings critical for proper functioning of summit operations, and may be tweaked at any time, they should always be updated from the repository together with PO files and templates (in fact, it is best to always update at once the whole tree as outlined above).

As for documentation POs, the procedure is the same, only replacing every messages with docmessages in the command lines above. The user interface and documentation summits are fully independent, so it is reasonable to work with the interface summit only for a trial period, and engage documentation summit once the trial has been deemed successfull.

Operation Targets

Sometimes it is advantageous to merge or scatter just a single catalog, a single module, a single branch, or any combination thereof. To this end, scatter and merge operations accept any number of operation targets after the operation keyword, specified as one of CATALOG, BRANCH:CATALOG, MODULE/, BRANCH:MODULE/, and BRANCH:. For example, to scatter just to Dolphin's PO in stable branch, in order to test translation at runtime, one would execute:

$ cd $KDEREPO/l10n-support $ posummit.py scripts/messages.summit LANG scatter stable:dolphin

(note no .po ending on catalog name). Or, to scatter to every PO in kdeplasma-addons module in stable branch:

$ cd $KDEREPO/l10n-support $ posummit.py scripts/messages.summit LANG scatter stable:kdeplasma-addons/

(the trailing slash is mandatory, or else posummit.py would think that kdeplasma-addons is a catalog name). Finally, to scatter to all catalogs in the stable branch (with the trailing colon for the same reason as earlier):

$ cd $KDEREPO/l10n-support $ posummit.py scripts/messages.summit LANG scatter stable:

Full Local Merging

When scattering from the summit, sometimes there will be reports of "messages missing in the summit". This happens because of time rift created by the Scripty merging branch POs, gathering summit templates, and a team coordinator merging the language summit, thus making some messages in branch POs not always present in the summit. This condition is benign, as such warnings will start to disappear with the message freeze approaching, but can be annoying. For this reason, team coordinator can stop Scripty from merging branch POs, and have the same posummit.py ... merge command above merge not only summit POs, but stable and trunk POs as well, such that summit and branches are always in perfect sync.

First, to stop Scripty from merging branch POs, a file named no-auto-merge (with arbitrary content) should be committed to the roots of respective trees, e.g.:

$ touch $KDEREPO/trunk/l10n-kde4/messages/LANG/no-auto-merge $ touch $KDEREPO/branches/stable/l10n-kde4/LANG/messages/no-auto-merge

Then, to make summit posummit.py ... merge merge everything, a summit customization file is put into the summit root at $KDEREPO/trunk/l10n-support/LANG/summit/messages.extras.summit and committed, with the following content:

  1. -*- coding: UTF-8 -*-
  2. This file is included by scripts/messages.summit
  3. for language-specific additions/overrides.
  1. Set local merging for all branches.

for branch in S.branches:

   branch["merge_locally"] = True

Once local merging of all branches is set, the coordinator can also use operation targets for selective merging, e.g. to merge only stable branch:

$ cd $KDEREPO/l10n-support $ posummit.py scripts/messages.summit LANG merge stable:

Disadvantages and Remedies

Although hopefully shadowed by the advantages, working in summit is not without its disadvantages. These should be weighed when deciding of whether to try out the summit workflow.

Obviously, while summit operations are made to be quite automatic, some extra aptitude is asked of the team coordinator. Reasonable shell handling, understanding of version control operations, feeling the pulse of repository automation, are all prerequisites, and some scripting ability advantageous.

After the summit is put in operation, any changes made manually in branch POs will not propagate to summit, and will be soon lost to scattering -- summit translations override everything in branches. This means that the whole team must work in the summit, it is not possible for some members to use the summit, and some not.

A summit PO file will necessarily have more messages than either of the branch files. For example, in the KDE 4.0/4.1 and 4.1/4.2 cycle, summit POs of core KDE modules had on average less than 5% more words than their stable counterparts. However, the said percent is the top, never approached limit of wasted workload due to trunk messages coming and going, given that as the next feature KDE release approaches, more and more trunk messages will find their way into it.

Another, more pressing issue with increased size of summit POs is the following scenario: a stable release is around the corner, and the team has no time to update summit POs fully, but could update only stable messages in them. E.g. there are 1000 incomplete (untranslated and fuzzy) messages, out of which only 100 are from the stable branch. A clever dedicated PO editor could allow jumping only through incomplete messages also satisfying a general search criteria, which in this case would be that a comment matches #\.\+>.*stable regular expression. On the other hand, with some external help, it is enough if the PO editor can merely search through comments. Then, posieve.py script (ready to use next to posummit.py) can equip incomplete stable messages with incomplete flag (as in #, ..., incomplete comment), and this flag searched for in the PO editor:

$ posieve.py tag-incomplete -sbranch:stable PATH_TO_PO_FILES_OR_DIRS

The incomplete tag needs not be manually removed when the message is updated. It will automatically disappear on the next merge, as it is not among flags known to Gettext.

There is also the organizational issue with starting to use the summit, and, if it does not help as expected, stopping to use it. Team members have to be reminded to not send in branch POs at start, and then to be sent back to branch POs if summit is disbanded. On the plus side, disbanding summit is technically simple: just remove from the repository l10n-supprot/LANG/summit, possibly also no-auto-merge files if local merging was set up, and that is it.

Summit Customization

As it was briefly indicated when setting up local merging of branches, located at $KDEREPO/trunk/l10n-support/LANG/summit/messages.extras.summit is the summit customization file, where various additions and overrides per language can be set compared to the default summit setup in $KDEREPO/trunk/l10n-support/scripts/messages.summit. File messages.extras.summit is a Python source, which is imported by messages.summit, and sets up summit options using the special object named S...

// TODO: scatter and merge hooks, checks, special header fields, etc.

Another Way to Improve Branch Handling

If summit seems a lot to digest, or is simply an overkill for team's needs, but still some improvement to manual handling of branches would be welcomed, KDE's dedicated PO editor Lokalize offers a branch sync mode. It works as follows.

In Lokalize project definition, the local paths of trunk and stable PO roots are set in Translation directory: and Branch directory: fields. Then, when a trunk PO file is opened, if it has a stable counterpart with same name and location as in the trunk, this stable PO is also going to be opened. For each trunk message in the main editing pane, if such a message exists in stable PO too, the stable message will be shown in the Secondary Sync pane; changes in the translation of trunk message will reflect to the stable message, and stable PO file will also be saved when the trunk is saved.

Furthermore, any team member can personally choose to work like this, there is no need to change the workflow of the language team as whole. When sending modifications to the coordinator, team members who rely on this feature of Lokalize simply send both trunk and stable POs that got modified.

Content is available under Creative Commons License SA 4.0 unless otherwise noted.