Localization/Tools/Lbundle Check: Difference between revisions

From KDE TechBase
(Describe greedy bundling.)
m (Removed extra empty lines.)
 
(7 intermediate revisions by 2 users not shown)
Line 13: Line 13:
== About ==
== About ==


The <tt>lbundle_check.py</tt> script checks and records the state of localized non-text resources organized in lbundles, allowing translators to track their relation to original resources. It lives in <tt>l10n-kde4/scripts/</tt> directory in the appropriate branch in the KDE repository.
The <tt>lbundle_check.py</tt> script checks and records the state of localized non-text resources organized in lbundles, allowing translators to track their relation to original resources. Its latest version can be found in <tt>trunk/l10n-support/scripts/</tt> directory in the KDE repository.


<tt>lbundle_check.py</tt> can be used to track both out-of-source and in-source lbundles, but the setup needed for these two modes is somewhat different. First the setup is explained for each type of lbundle, followed by the details of operation.
<tt>lbundle_check.py</tt> is in no way tied to KDE Translation Project, but can be used in any other environment. Only the instructions in this article are specifically about using <tt>lbundle_check.py</tt> in context of KDE TP.


== Setup ==
== Checking Out and Updating Sources ==


Before going through the setup examples, refresh the examples on organization of [[Localization/Concepts/Non_Text_Resources#Inside The Repository|lbundles in the repository]] (localizing the splash screen of the imaginary KDingus application).
To be able to track lbundle states, the translator (one of the coordinators) needs to have all the sources to which localized resources correspond. This is done by checking out once from the KDE repositories, and then regularly updating local checkouts. This is hard to do and maintain manually for selected sources only.


As it will become apparent below, to be able to track lbundle states, the translator (one of the coordinators) needs to have all the original counterparts to localized resources checked out from the KDE repository, and in the same relative positions. E.g. for KDingus example as kdeutils application, checked out files could have the following structure:
Instead, the easiest is to check out and update all localization-relevant KDE sources, using a single command. This command is <tt>populate_source.sh</tt>, found in <tt>trunk/l10n-kde4/scripts/</tt> and <tt>branches/stable/l10n-kde4/scripts/</tt> directories; each knows how to check out and update sources corresponding to the given translation branch (trunk or stable). It is run simply like this:
<code text>
<syntaxhighlight lang="bash">
$HOME/kde-svn
$ cd $KDEREPO/trunk/l10n-kde4
$ scripts/populate_source.sh
$ cd $KDEREPO/branches/stable/l10n-kde4
$ scripts/populate_source.sh
</syntaxhighlight>
Running <tt>populate_source.sh</tt> which will take quite some time for the first run, but subsequent runs will be much faster. After the run is complete, in current working directory there will be the <tt>source/</tt> directory, with all the checkouts in it. The local directory tree should then look like this:
<syntaxhighlight lang="text">
$KDEREPO/
     trunk/
     trunk/
         l10n-kde4/
         l10n-kde4/
             aa/
             scripts/
         KDE/
            source/
             kdeutils/
            ...
                 kdingus/
    branches/
</code>
         stable/
To have it like this without checking out full parent directories, one can use <tt>-N</tt> option to <tt>svn checkout</tt>, which avoids recursion into subdirectories.
             l10n-kde4/
                scripts/
                source/
                 ...
</syntaxhighlight>
At the moment of this writing, the <tt>source/</tt> in trunk occupies ~10 GiB of disk space, and in stable ~6 GiB. While this is not little space, it should not be a significant problem given contemporary typical disk sizes.
 
To quickly check out or update only one or few modules, their names can be given as arguments to <tt>populate_source.sh</tt>:
<syntaxhighlight lang="bash">
$ scripts/populate_source.sh extragear-network_konversation kdegames_konquest
</syntaxhighlight>
Module names for use as arguments can be found in <tt>source/modules</tt> file. This file is generated anew whenever <tt>populate_source.sh</tt> is run without arguments, i.e. to update everything.
 
The <tt>source/</tt> directory contains the <tt>repo/</tt> subdirectory with actual local checkouts, and the <tt>link/</tt> subdirectory, with links to checkout directories suitable for later operations. If the imaginary KDingus application is part of kdeutils top module, and is kept in a Git repository, it will be located like this in <tt>source/</tt> in trunk:
<syntaxhighlight lang="text">
source/
    repo/
        git-unstable/
            kdeutils_kdingus
    link/
        kdeutils/
            kdingus --> ../../repo/kdeutils_kdingus
</syntaxhighlight>
And if KDingus were in the Subversion repository:
<syntaxhighlight lang="text">
source/
    repo/
        trunk/
            KDE/
                kdeutils/
                    kdingus
    link/
        kdeutils/
            kdingus --> ../../repo/trunk/KDE/kdeutils/kdingus
</syntaxhighlight>
Note that in both cases the link location stays the same, <tt>source/link/kdeutils/kdingus</tt>. It will also stay the same in <tt>source/</tt> of stable branch, where <tt>repo/</tt> will instead contain <tt>git-stable/</tt>, <tt>branches/KDE/4.x/</tt>, etc.
 
There is a special case when a top module name represents Git repository of its own, such as <tt>kde-baseapps</tt> or <tt>kdepim</tt>. The link then has to repeat the module name:
<syntaxhighlight lang="text">
source/
    link/
        kde-baseapps/
            kde-baseapps --> ../../repo/git-unstable/kde-baseapps
        kdepim/
            kdepim --> ../../repo/git-unstable/kdepim
</syntaxhighlight>
<tt>lbundle_check.py</tt> is capable of gracefully handling this case as well.
 
== Setup ==
 
Before going through the setup examples, refresh the examples on organization of [[Localization/Concepts/Non_Text_Resources#Inside The Repository|lbundles in the repository]], which dealt with localizing the splash screen of the imaginary KDingus application.
 
<tt>lbundle_check.py</tt> can be used to track both out-of-source and in-source lbundles, but the setup needed for these two modes is somewhat different. The setup is explained for both modes, followed by the details of operation.


=== Out-of-Source Lbundles ===
=== Out-of-Source Lbundles ===


When it was assumed that KDingus is living in kdeutils, a core KDE module, its out-of-source lbundle for the language <tt>aa</tt> was organized like this:
When it was assumed that KDingus was part of kdeutils top module, its out-of-source lbundle for the language <tt>aa</tt> was organized like this:
<code text>
<syntaxhighlight lang="text">
/trunk/l10n-kde4/aa/data/kdeutils/
$KDEREPO/trunk/l10n-kde4/aa/data/kdeutils/
     CMakeLists.txt
     CMakeLists.txt
     kdingus/
     kdingus/
Line 47: Line 106:
                 aa/
                 aa/
                     kdingus-splash.png
                     kdingus-splash.png
</code>
</syntaxhighlight>
except that this listing includes one new file, the <tt>l10n-spec</tt>. This is the setup file for tracking the lbundle, or rather, all lbundles on its level and beneath. The files outside of <tt>l10n/</tt> subdirs are going to be called ''unbundled'', as they are not considered a part of lbundle proper.
except that this listing includes one new file, the <tt>l10n-spec</tt>. This is the setup file for tracking the lbundle, or rather, all lbundles on its level and beneath. The files outside of <tt>l10n/</tt> subdirs are going to be called ''unbundled'', as they are not considered a part of the lbundle proper.


The <tt>l10n-spec</tt> files are composed of <tt>key = value</tt> fields per line. For the example above the content of <tt>l10n-spec</tt> would be:
<tt>l10n-spec</tt> files are composed of <tt>key = value</tt> fields per line. For the example above the content of <tt>l10n-spec</tt> would be:
<code text>
<syntaxhighlight lang="text">
# l10n-spec for KDingus' out-of-source bundle.
# l10n-spec for KDingus' out-of-source bundle.
source-root = trunk/KDE/kdeutils/kdingus
source-root = aa/data:source/link;^1
source-vcs = svn
source-vcs = auto
bundle-vcs = svn
bundle-vcs = auto
languages = aa
languages = aa
track-unbundled = CMakeLists.txt
track-unbundled = CMakeLists.txt
</code>
</syntaxhighlight>


The fields used in the example are:
The fields used in the example are:


; source-root: the root of the KDingus' sources in the KDE repository. This is needed to link the original counterparts to localized files. The path of the original file relative to the repository top will be composed of this, plus the relative path of the localized file in the lbundle (relative to the <tt>l10n-spec</tt> file), and stripped of the <tt>l10n/aa/</tt> subpath.
; source-root: the path to the root directory of the KDingus' sources. Every path-valued field can specify the path in several ways, which will be explained shortly. In this example, <tt>aa/data:source/link</tt> means to construct the path to original file corresponding to localized file by replacing <tt>aa/data</tt> in the absolute localized path with <tt>source/link</tt>; if sources were [[#Checking Out and Updating Sources|checked out as explained earlier]], this will do exactly the right thing. Then there is the second path specification element separated by semi-colon, <tt>^1</tt>, which means to assume that the original file may have one extra parent directory inserted somewhere in its path; this is used to cover the special case of top module name being equal to submodule name (e.g. <tt>source/link/kdepim/kdepim</tt>). Note that this path specification nowhere refers to KDingus in particular: it should be applicable to all out-of-source bundles.


; source-vcs, bundle-vcs: the version control systems used by the source code and the lbundle, respectively. These are needed so that the checker script knows how to update the sources (if requested), and to avoid trying to track version control bookkeeping files in the lbundle (e.g. <tt>.svn</tt> subdirs for SVN). Currently the only known VCS is <tt>svn</tt>, but the script abstracts this internally, so that other may be added in the future.
; source-vcs, bundle-vcs: the version control systems used by the source code and the lbundle, respectively. These fields are needed so that <tt>lbundle-check.py</tt> knows how to perform VCS operations when necessary, as well as to avoid trying to track version control bookkeeping files in the lbundle (e.g. <tt>.svn</tt> subdirs for Subversiong). Currently the two known VCS are <tt>svn</tt> and <tt>git</tt>, and <tt>auto</tt> can be set to let <tt>lbundle-check.py</tt> which one it is.


; languages: space-delimited list of languages expected in <tt>l10n</tt> subdirs. This is an optional field, which serves to check for naming errors with language subdirs. It may be left out, preventing such checks, but there is no reason to do so (especially for in-source bundles described below).
; languages: space-delimited list of languages expected in <tt>l10n</tt> subdirs. This is an optional field, which serves to check for naming errors with language subdirs. It may be left out, preventing such checks, but there is no reason to do so (especially for in-source bundles described below).


; track-unbundled: space-delimited list of relative file paths, or pairs of file paths, of unbundled files which should be tracked nevertheless. In this example, <tt>CMakeLists.txt</tt> should be tracked because the installation instructions for the lbundle may need to change when those for the original resources change. A pair of file paths, as two space-separated paths within parenthesis (e.g. <tt>(foo.txt bar.txt)</tt>), can be given instead of a single file path when the local relative path needs to be different from that of the tracked original. A shell glob can be specified instead of a particular path (e.g. <tt>*.txt</tt>), but only for single paths, not in pairs. Backslash serves as escape character, e.g. when the file name contains a space.
; track-unbundled: space-delimited list of relative file paths, or pairs of file paths, of unbundled files which should be tracked nevertheless. In this example, <tt>CMakeLists.txt</tt> should be tracked because the installation instructions for the lbundle may need to change when instructions for the original resources change. A pair of file paths, as two space-separated paths within parenthesis (e.g. <tt>(foo.txt bar.txt)</tt>), can be given instead of a single file path when the local relative path needs to be different from that of the tracked original. A shell glob can be specified instead of a particular path (e.g. <tt>*.txt</tt>), but only for single paths, not in pairs. Backslash serves as escape character, e.g. when the file name contains a space.


Several other fields, not used in this example, are available:
Several other fields, not used in this example, are available:


; source-root-rel: relative path to the sources, from <tt>l10n-spec</tt> in which it is defined. Useful when the repository organizes branches as subdirectories, since then the path relative to repository top (as provided by <tt>source-root</tt>) would change between branches.
; ignore-unbundled: space-delimited list of relative file paths which are to be completely ignored by <tt>lbundle-check.py</tt>. For an out-of-source bundle, a file which is unbundled should be either tracked too (using <tt>track-unbundled</tt>) or ignored using this field, otherwise <tt>lbundle-check.py</tt> will complain about it. Shell globs also possible.
 
; strict-state: a boolean value stating whether the tracking state is to be strictly imposed. Non-strict means that out-of-sync states between localized and original resources are recorded only in the track file, whereas strict state means that the localized files names will be changed too to reflect the out-of-sync state. More details in section on [[#Operation|operation]].
 
; ignore-substr: space-delimited list of substrings to ignore in names of bundled files when determining their original counterparts. This is never needed in KDE, but in other environments some localized files may be handled differently than a plain substitute for the original file, based on a substring in their name (e.g. a localized image may be overlayed over the original, rather than replacing it).
 
; track-by-subdir: by default there is only one tracking file per <tt>l10n-spec</tt> file (see [[#Operation|operation]] for tracking files); when this option is enabled (a boolean), there will be instead one tracking file per subdirectory (excluding <tt>l10n/</tt> directories) starting from the level of <tt>l10n-spec</tt> file and below. Usefull when frequent moving of subdirectories within the repository is expected, and there are a lot of localized resources to track.


; source-root-repl: construct path to sources by replacing a part of current bundle's path. The replacement specification is in form <tt>findstr:replstr</tt>. Useful when there are many bundles which would have exactly the same <tt>l10n-spec</tt> files when the path to sources is specified like this.
Paths in path-valued fields (such as <tt>source-root</tt>) can be specified in several modes. When it makes sense, to make up one path field value two or more modes can be chained, separated with semi-colon (<tt>;</tt>). Each subsequent mode refers to the path established up to that point. Path specification modes are as follows:


; ignore-unbundled: space-delimited list of relative file paths which are to be completely ignored by the lbundle checker. For an out-of-source bundle, a file which is unbundled should be either tracked too (using <tt>track-unbundled</tt>) or ignored using this field, otherwise the checker will complain about it. Shell globs also possible.
* Relative to a top root directory. This is represented by ordinary relative path. When running <tt>lbundle-check.py</tt>, the top root directory will have to be given through <tt>-s</tt> option.


; strict-state: a boolean value stating whether the tracking state is to be strictly imposed. Non-strict means that out-of-sync states between localized and original resources are recorded only in the track file, whereas strict state means that the localized files names will be changed too to reflact the out-of-sync state. More details in section on [[#Operation|operation]].
* Relative to the directory of current lbundle (i.e. to the parent directory of <tt>l10n-spec</tt> file). This is done by prefixing a relative path with exclamation mark (<tt>!</tt>).


; ignore-substr: space-delimited list of substrings to ignore in names of bundled files when determining their original counterparts. This is never needed in KDE, but in other environments some localized files may be handled differently than a plain substitute for the original file, based on a substring in their name (e.g. a localized image may be overlayed over the original, rather than replacing it).
* By replacing a part of the absolute path of the current lbundle directory. The replacement specification is of the form <tt>findstr:replstr</tt>.
 
* With allowed insertion of one or more consecutive parent directories when trying to find the path. This is given by <tt>^N</tt>, where N is the maximum number of inserted parents. This mode cannot be the first in mode chain.


The usual #-comments are allowed in <tt>l10n-spec</tt> files, as well as line continuation by trailing backslashes (e.g. when several file paths are needed in <tt>track-unbundled</tt> field).
The usual #-comments are allowed in <tt>l10n-spec</tt> files, as well as line continuation by trailing backslashes (e.g. when several file paths are needed in <tt>track-unbundled</tt> field).
Line 86: Line 153:
=== In-Source Lbundles ===
=== In-Source Lbundles ===


For KDingus living as an extragear app, its directory structure with the in-source lbundle having <tt>aa</tt> and <tt>bb</tt> languages, and an <tt>l10n-spec</tt> file, would be this:
If KDingus is an extragear application, its directory structure with the in-source lbundle having <tt>aa</tt> and <tt>bb</tt> languages, and an <tt>l10n-spec</tt> file, would be this:
<code text>
<syntaxhighlight lang="text">
/trunk/extragear/utils/
/source/link/
     kdingus/
     kdingus/
         CMakeLists.txt
         CMakeLists.txt
Line 100: Line 167:
                 bb/
                 bb/
                     kdingus-splash.png
                     kdingus-splash.png
</code>
</syntaxhighlight>
and the contents of <tt>l10n-spec</tt> somewhat simpler than for out-of-source lbundles:
The contents of <tt>l10n-spec</tt> is simpler than that of out-of-source lbundles:
<code text>
<syntaxhighlight lang="text">
# l10n-spec for KDingus' in-source bundle.
# l10n-spec for KDingus' in-source bundle.
source-vcs = svn
source-vcs = auto
languages = aa bb
languages = aa bb
</code>
</syntaxhighlight>


Since the lbundle is kept together with the application, sharing same root directory and version control system, there is no need for the <tt>source-root</tt> and <tt>bundle-vcs</tt> fields. In fact, presence or lack of <tt>source-root</tt> field identifies the bundle as out-of-source or in-source to the checker script.
Since the lbundle is kept together with the application, sharing same root directory and version control system, there is no need for the <tt>source-root</tt> and <tt>bundle-vcs</tt> fields. In fact, the presence or lack of the <tt>source-root</tt> field identifies the lbundle as out-of-source or in-source.


Field <tt>track-unbundled</tt> (or <tt>ignore-unbundled</tt>) are not present either. All files which are unbundled are silently ignored. E.g. it is assumed that the KDingus' maintainer will modify and test install instructions so as to not break installation of localized resources.
Field <tt>track-unbundled</tt> (or <tt>ignore-unbundled</tt>) are not present either. All files which are unbundled are silently ignored. It is assumed that the KDingus' maintainer will modify and test installation instructions such as not to break installation of localized resources.


<tt>languages</tt> field could be omitted here as well, but for in-source lbundles it is even more important to keep tight check of wrongly named language subdirs.
<tt>languages</tt> field could be omitted here as well, but for in-source lbundles it is even more important to keep tight check of wrongly named language subdirs.
Line 116: Line 183:
=== Greedy Bundling ===
=== Greedy Bundling ===


The default assumption behind lbundles is that only a small part of all resources need to be localized. For example, if a new original resource is added, it is upon translator to notice if it needs localization, make the localized version and add it to lbundle where it will get tracked.
The default assumption behind lbundles is that only a small part of all resources need to be localized. It is upon the translator to spot the resources which need localization, make their localized versions, and add them to an lbundle where they will get tracked.


When instead it would be better to track all original resources, the bundle tracking can be set to ''greedy'' mode: any original resource that does not have a localized counterpart yet, will be tracked as missing. I.e. when a new original resource gets added, translator will be warned of it by the tracking process.
When instead it would be better to track all of the original resources, lbundle tracking can be set to ''greedy'' mode: any original resource that does not have a localized counterpart yet will be tracked as missing. When a new original resource gets added, the translator will be notified of that by <tt>lbundle-check.py</tt>.


Fields in <tt>l10n-spec</tt> used in greedy mode are as follows:
Fields in <tt>l10n-spec</tt> used in greedy mode are as follows:


; greedy-bundling: option to engage greedy mode (a boolean value, set to true). Other fields controlling greedy bundling have no effect if this option is not set.
; greedy-bundling: the option to engage greedy mode (a boolean value, set to true). Other fields controlling greedy bundling have no effect if this option is not set.


; greedy-monolingual: states whether the bundle is monolingual (boolean). This should make sense only for out-of-source bundles, where there may be one per language. Greedy mode needs this piece of information in order to know how to report and track missing localized files, as bundled (multilingual lbundle) or unbundled (monolingual).
; greedy-monolingual: states whether the lbundle is monolingual (boolean). This should make sense only for out-of-source bundles, where there may be one per language. Greedy mode needs this piece of information in order to know how to report and track missing localized files, as bundled (multilingual lbundle) or unbundled (monolingual).


; greedy-from-level: if the lbundle covers a tree of original subdirectories, this field can be used to limit the greedy collection and reporting of missing files only to subdirectories at this level and below. 0 is the level of the <tt>l10n-spec</tt> file itself, 1 is one level below, etc.
; greedy-from-level: if the lbundle covers a tree of original subdirectories, this field can be used to limit the greedy collection and reporting of missing files only to subdirectories at this level and below. 0 is the level of the <tt>l10n-spec</tt> file itself, 1 is one level below, etc.


; greedy-only-started: normally all original resources missing in localization are reported as such (except the ignored ones), no matter where in the original subdirectory tree covered by the lbundle they reside. This field (a boolean) is used to limit greedy bundling only to those subdirectories already existing in the lbundle. Useful in several scenarios, e.g. when starting the resource localization to avoid being engulfed in listings of missing localized resources.
; greedy-only-started: normally all original resources missing in localization are reported as such (except the ignored ones), no matter where they reside in the original subdirectory tree covered by the lbundle. This field (a boolean) is used to limit greedy bundling only to those subdirectories already existing in the lbundle. Useful in several scenarios, e.g. when starting the resource localization to avoid being engulfed in listings of missing localized resources. Automatically enabled when <tt>track-by-subdir</tt> is in effect.


== Operation ==
== Operation ==


If the repository is checked out as exemplified [[#Setup|above]], to check all lbundles for the language <tt>aa</tt> in trunk, <tt>lbundle-check.py</tt> can be run like this:
To check all lbundles for the language <tt>aa</tt> in trunk, <tt>lbundle-check.py</tt> can be run like this:
<code text>
<syntaxhighlight lang="bash">
$ cd $HOME/kde-svn/
$ cd $KDEREPO/trunk/l10n-kde4/
$ lbundle-check.py -s $HOME/kde-svn -l aa trunk/
$ lbundle-check.py aa/ # check all out-of-source
</code>
$ lbundle-check.py source/link/ -l aa # check all in-source
This will process all lbundles found in <tt>trunk/</tt>, whether out-of-source or in-source, containing the language <tt>aa</tt>. (If <tt>-l</tt> option weren't specified, it would check ''all'' languages in <tt>trunk/</tt>.) Option <tt>-s</tt> states the top local directory of the repository, which is needed to resolve original resource paths for out-of-source bundles (path in field <tt>source-root</tt> from <tt>l10n-spec</tt> files is appended to it).
</syntaxhighlight>
 
If <tt>-l aa</tt> option weren't specified in the second invocation, <tt>lbundle-check.py</tt> would check ''all'' languages in <tt>source/link/</tt> instead of only <tt>aa</tt>. In the first invocation <tt>-l aa</tt> was omitted because out-of-source bundles contain only their own language, but it wouldn't cause any problem if it were present. Only particular lbundles can be checked by giving their subdirectories as arguments.
Or, to check only out-of-source bundles in the <tt>trunk/l10n-kde4/aa/data</tt>, one would use:
<code text>
$ cd $HOME/kde-svn/trunk/l10n-kde4
$ lbundle-check.py -s $HOME/kde-svn aa/data
</code>
where <tt>-l aa</tt> is omitted since out-of-source bundles will contain only their own language.


So, what does <tt>lbundle-check.py</tt> actually do, and what does it report? For brevity, let's limit to the out-of-source lbundle for the KDingus as kdeutils app. After setting up the <tt>l10n-spec</tt> for that case (as detailed [[#Out-of-Source Lbundles|above]]), running <tt>lbundle-check.py</tt> for the first time would output:
So, what does <tt>lbundle-check.py</tt> actually do, and what does it report? For brevity, let's limit to the out-of-source bundle for the KDingus as kdeutils application. After setting up the <tt>l10n-spec</tt> for that case (as shown [[#Out-of-Source Lbundles|earlier]]), running <tt>lbundle-check.py</tt> for the first time will output:
<code text>
<syntaxhighlight lang="bash">
$ cd $HOME/kde-svn/trunk/l10n-kde4
$ cd $KDEREPO/trunk/l10n-kde4
$ lbundle-check.py -s $HOME/kde-svn aa/data
$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
--------------------
Line 156: Line 217:
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
$
$
</code>
</syntaxhighlight>
creating the file <tt>aa/data/kdeutils/kdingus/pics/l10n-track</tt> in the process, with the following content:
The file <tt>aa/data/kdeutils/kdingus/pics/l10n-track</tt> will be created, with the following contents:
<code text>
<syntaxhighlight lang="text">
# Do not edit manually, except to remove complete lines.
# Do not edit manually, except to remove complete lines.


Line 166: Line 227:
# aa
# aa
ok        ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
ok        ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
</code>
</syntaxhighlight>
(For a check, executing the same command line immediately for the second time would produce no output, nor change any files.)


For each <tt>l10n-spec</tt> file encountered, <tt>lbundle-check.py</tt> will create one of these <tt>l10n-track</tt> files. <tt>l10n-track</tt> file, in each non-comment, non-empty line, contains four fields, in order: the state of the localized resource against the original, the relative path to the bundled resource, the checksum of the ''original'' resource, and the VCS revision string of the original file which has this checksum.
For each <tt>l10n-spec</tt> file encountered, <tt>lbundle-check.py</tt> will create one of these <tt>l10n-track</tt> files. <tt>l10n-track</tt> file, in each non-comment, non-empty line, contains four fields, in order: the state of the localized resource against the original, the relative path to the bundled resource, the checksum of the ''original'' resource, and the VCS revision string of the original file which has this checksum.


At this point, when <tt>l10n-spec</tt> file has been manually written, and <tt>l10n-track</tt> created by running the checker, both should be added and committed to version control.
At this point, when <tt>l10n-spec</tt> file has been manually written, and <tt>l10n-track</tt> created by running <tt>lbundle-check.py</tt>, both should be added and committed to version control.


So long as the original resource does not change, rerunning <tt>lbundle-check.py</tt> in the same way will do nothing, as everything is in sync. Obviously, the original resources should be updated to the latest repository version prior to checking, and <tt>lbundle-check.py</tt> can do that itself if started with <tt>-u</tt> option:
So long as the original resource does not change, rerunning <tt>lbundle-check.py</tt> in the same way will do nothing, as everything is in sync. Obviously, the original resources should be updated to the latest repository version prior to checking, and <tt>lbundle-check.py</tt> can do that itself if started with <tt>-u</tt> option:
<code text>
<syntaxhighlight lang="bash">
$ lbundle-check.py -s $HOME/kde-svn -u aa/data
$ lbundle-check.py -u aa/data/
svn up /home/.../kdeutils/kdingus/pics/CMakeLists.txt
svn up .../kdeutils/kdingus/pics/CMakeLists.txt
At revision 762512.
At revision 762512.
svn up /home/.../kdeutils/kdingus/pics/kdingus-splash.png
svn up .../kdeutils/kdingus/pics/kdingus-splash.png
At revision 762512.
At revision 762512.
$
$
</code>
</syntaxhighlight>


Once the original resource is modified, e.g. KDingus' splash screen is souped up, <tt>lbundle-check.py</tt> will report the following:
Once the original resource is modified, e.g. KDingus' splash screen is tweaked up, <tt>lbundle-check.py</tt> will report the following:
<code text>
<syntaxhighlight lang="bash">
$ lbundle-check.py -s $HOME/kde-svn aa/data
$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
--------------------
Newly fuzzied: 1
Newly fuzzied: 1
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
</code>
</syntaxhighlight>
and its entry in <tt>l10n-track</tt> will state:
and its entry in <tt>l10n-track</tt> will state:
<code text>
<syntaxhighlight lang="text">
fuzzy    ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
fuzzy    ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
</code>
</syntaxhighlight>
i.e. only the status will have changed from <tt>ok</tt> to <tt>fuzzy</tt>. Now the translator has the information needed to compare what has changed in the original splash image: this entry still states the revision of the previous original splash, on which the localized one was based.
i.e. only the status will have changed from <tt>ok</tt> to <tt>fuzzy</tt>. Now the translator has the information needed to compare what has changed in the original splash image: this entry still states the revision of the previous original file on which the localized file was based.


If the original resource is renamed, moved, or deleted, the report will be slightly different:
If the original resource is renamed, moved, or deleted, the report will be slightly different:
<code text>
<syntaxhighlight lang="bash">
$ lbundle-check.py -s $HOME/kde-svn aa/data
$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
--------------------
Newly obsoleted: 1
Newly obsoleted: 1
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
   aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
</code>
</syntaxhighlight>
and the entry in <tt>l10n-track</tt>:
and the entry in <tt>l10n-track</tt>:
<code text>
<syntaxhighlight lang="text">
obsolete  ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
obsolete  ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647
</code>
</syntaxhighlight>
Now the translator must find out, using the old revision string, what exactly happened to the original resource, and do the same for the localized one too.
Now the translator must find out, using the old revision string, what exactly happened to the original resource, and do the same for the localized resource.


When the fuzzy or obsolete state has been resolved, localized resource updated to reflect changes in the original, it should be recorded as such. To do this, the entry for the resource is first manually deleted from <tt>l10n-track</tt> file (the whole line is removed), and then <tt>lbundle-check.py</tt> is rerun. This will recreate the entry with a fresh <tt>ok</tt> state.
When the fuzzy or obsolete state has been resolved, localized resource updated to reflect changes in the original, it should be recorded as such. To do this, the entry for the resource is first manually deleted from <tt>l10n-track</tt> file (the whole line is removed), and then <tt>lbundle-check.py</tt> is rerun. This will recreate the entry with a fresh <tt>ok</tt> state.


If <tt>strict-state</tt> field has been set to true in <tt>l10n-spec</tt>, then not only the state of the entry in <tt>l10n-track</tt> is changed, but the tracked file itself is renamed to contain <tt>~fuzzy</tt> or <tt>~obsolete</tt> marker just before the extension. This mode is called strict because unless the out-of-sync state is corrected, the file is "misnamed" from the point of view of the runtime system, and will not be used instead of the original (it may even not get installed, depending on precise install instructions).
If <tt>strict-state</tt> field has been set to true in <tt>l10n-spec</tt>, then not only the state of the entry in <tt>l10n-track</tt> is changed, but the tracked file itself is renamed to contain <tt>~fuzzy</tt> or <tt>~obsolete</tt> marker just before the extension. This mode is called strict because unless the out-of-sync state is corrected, the file is "misnamed" from the point of view of the runtime system, and will not be used instead of the original (it may even not get installed, depending on the exact installation instructions).


To resolve the fuzzy or obsolete state in strict mode, it is not necessary to manually remove the entry from <tt>l10n-track</tt>. Instead, it is enough to put an updated version of the file (without the state maker in its name) next to the out-of-sync version, and rerun <tt>lbundle-check.py</tt>. It will remove the old version and update the entry in <tt>l10n-track</tt>. (The renaming/removal are version control aware, i.e. proper version control commands will be used for these operations if <tt>*-vcs</tt> fields have been set in the spec file.)
To resolve the fuzzy or obsolete state in strict mode, it is not necessary to manually remove the entry from <tt>l10n-track</tt>. Instead, it is enough to put an updated version of the file (without the state maker in its name) next to the out-of-sync version, and rerun <tt>lbundle-check.py</tt>. It will remove the old version and update the entry in <tt>l10n-track</tt>. (The renaming/removal are version control aware, i.e. proper version control commands will be used for these operations if <tt>*-vcs</tt> fields have been set in the spec file.)


When [[#Greedy Bundling|greedy bundling]] is engaged, each original file which doesn't have a localized counterpart will be reported in the track file (either as bundled or unbundled, depending on <tt>greedy-monolingual</tt> option), with it's state set to <tt>missing</tt> (except for the ignored files, given by the <tt>ignore-unbundled</tt> option). When the localized file is placed at the appropriate location, next run of <tt>lbundle-check.py</tt> will change the state to <tt>ok</tt>.
When [[#Greedy Bundling|greedy bundling]] is engaged, each original file which doesn't have a localized counterpart will be reported in the track file (either as bundled or unbundled, depending on <tt>greedy-monolingual</tt> option), with it's state set to <tt>missing</tt> (except for the ignored files, given by the <tt>ignore-unbundled</tt> option). When the localized file is placed at the appropriate location, next run of <tt>lbundle-check.py</tt> will change the state to <tt>ok</tt>.
== Special Case: Tracking KDE Documentation Resources ==
{{note|This section should be mostly self-contained, providing all the information needed for resource tracking in documentation, for two reasons. Firstly, it can be reasonably assumed that many more KDE translation teams will want to track localized documentation resources, rather than some other, special resources. Secondly, structure of documentation resources is significantly different from that of normal lbundles, so having separate examples helps.}}
KDE documentation consists of two types of resources: the text, which is provided through Docbook files, and other data, mostly screenshots from applications. The text from original Docbook files is extracted into PO templates, translated through PO files, and localized Docbook files created out of them. Thus, when the original text changes, translators are automatically made aware of that through new and fuzzy entries in PO files. This is not so for screenshots. When an original screenshot changes, gets removed or added, translators get no notice of it. However, the tracking mechanism provided by <tt>lbundle-check.py</tt> can be used to help rectify this.
Localized resources of KDE documentation are not really organized into lbundles; their organization far precedes the lbundling system. Still, every language's documentation directory (<tt>aa/docs/</tt>) can be understood as one ''greedy monolingual'' lbundle composed solely of ''unbundled'' resources, and as such tracked by the <tt>lbundle-check.py</tt>. Docbook files are excluded from tracking, as they are kept up-to-date through Docbook-PO-Docbook roundtrip. At the moment, tracking is set up and operated by the coordinator of each language team that wants to have it.
To set up tracking, the coordinator of the language <tt>aa</tt> should maintain the local checkout of several modules, with same structure as KDE repository, as follows:
<syntaxhighlight lang="text">
$KDEREPO/
    trunk/
        l10n-kde4/
            scripts/
            templates/
            documentation/
            aa/
        l10n-support/
            scripts/
    branches/
        stable/
            l10n-kde4/
                scripts/
                templates/
                documentation/
                aa/
</syntaxhighlight>
In fact, it is a good idea anyway for a language coordinator to maintain such a checkout (the total space requirement is between 500 MB and 1 GB).
The <tt>documentation/</tt> subdirectory is not actually found in the Subversion repository; it is created and periodically updated by running the <tt>scripts/populate_documentation.sh</tt> script, in stable and trunk:
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-kde4
$ scripts/populate_documentation.sh
$ cd $KDEREPO/branches/stable/l10n-kde4
$ scripts/populate_documentation.sh
</syntaxhighlight>
<tt>populate_documentation.sh</tt> from trunk will get exactly the documentation modules' branches translated as trunk, and <tt>populate_documentation.sh</tt> from stable those translated as stable.
After this file structure is established, create the files:
<syntaxhighlight lang="text">
$KDEREPO/trunk/l10n-kde4/aa/docs/l10n-spec
$KDEREPO/branches/stable/l10n-kde4/aa/docs/l10n-spec
</syntaxhighlight>
both with the following, branch-independent content:
<syntaxhighlight lang="text">
source-root = aa/docs:documentation
source-vcs = file
bundle-vcs = auto
track-by-subdir = yes
greedy-bundling = yes
greedy-monolingual = yes
greedy-from-level = 2
ignore-unbundled = \
    CMakeLists.txt */CMakeLists.txt */*/CMakeLists.txt */*/*/CMakeLists.txt \
    */*/*.docbook */*/*/*.docbook \
</syntaxhighlight>
Note the language code <tt>aa</tt> in <tt>aa/docs:...</tt> at the top, and modify it accordingly.
Finally, run the <tt>lbundle-check.py</tt> script from <tt>trunk/l10n-support/scripts/</tt> on the documentation directories of your language, e.g. for the trunk:
<syntaxhighlight lang="bash">
$ lbundle-check.py $KDEREPO/trunk/l10n-kde4/aa/docs/
</syntaxhighlight>
After some time, depending on how many localized screenshots you have made already, a whole lot of output may appear -- listing of files having been added to tracking. More importantly, each subdirectory in <tt>aa/docs/</tt> which has (or could have) at least one localized screenshot, will get a ''tracking file'', named <tt>l10n-track</tt>. Tracking files will be automatically added to version control as well, so you can commit <tt>aa/docs/</tt> right away.
What does a tracking file look like? For example, the original documentation for KRuler contains these files:
<syntaxhighlight lang="text">
documentation/kdegraphics/kruler/
    CMakeLists.txt
    index.docbook
    kruler.png
    kruler-settings.png
</syntaxhighlight>
Assume that your language's localized documentation contains:
<syntaxhighlight lang="text">
aa/docs/kdegraphics/kruler/
    CMakeLists.txt
    index.docbook
    kruler.png
</syntaxhighlight>
that is, the localized variant of <tt>kruler-settings.png</tt> is missing. Then, the <tt>aa/docs/kdegraphics/kruler/l10n-track</tt> that got created will have this contents:
<syntaxhighlight lang="text">
# Do not edit manually, except to remove complete lines.
ok        ¦kruler.png¦  58ba...872a  58ba...872a
missing  ¦kruler-settings.png¦  4dd2...5ebf  4dd2...5ebf
</syntaxhighlight>
Each line contains four pieces of data: the sync state and the name of the localized file, and the twice repeated checksum of the ''original'' file. (Instead of the second checksum normally there would be the repository revision of the original file, but <tt>populate_documentation.sh</tt> does not make proper repository checkouts). Since this is the first run of <tt>lbundle-check.py</tt>, it assumed that the present localized <tt>kruler.png</tt> is up-to-date (more later on this assumption), and so its state is set to <tt>ok</tt>. For <tt>kruler-settings.png</tt>, which is not there in the localized directory, the state is set as <tt>missing</tt>.
The day-to-day operation constitutes running <tt>lbundle-check.py</tt> from time to time, in the same way as the first time. When a new original screenshot is added, the appropriate <tt>l10n-track</tt> file will get a new <tt>missing</tt> entry. When a localized screenshot is produced and put into correct place, the <tt>missing</tt> state will change into <tt>ok</tt>:
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-kde4/
$ (...put localized kruler-settings.png into aa/docs/...)
$ lbundle-check.py aa/docs/
A  (bin)  aa/docs/kdegraphics/kruler/kruler-settings.png
M      aa/docs/kdegraphics/kruler/l10n-track
--------------------
New 'ok': 1
  aa/docs/kdegraphics/kruler/kruler-settings.png (...)
$
</syntaxhighlight>
Note that version control operations are done automatically too.
Assume now that, in the above example, after some time the original <tt>kruler.png</tt> is modified, and the original <tt>kruler-settings.png</tt> is removed. Then, running the tracker produces:
<syntaxhighlight lang="bash">
$ cd $KDEREPO/trunk/l10n-kde4/
$ lbundle-check.py aa/docs/
M      aa/docs/kdegraphics/kruler/l10n-track
--------------------
New 'fuzzy': 1
  aa/docs/kdegraphics/kruler/kruler.png (...)
New 'obsolete': 1
  aa/docs/kdegraphics/kruler/kruler-settings.png (...)
$
</syntaxhighlight>
and the content of tracking file becomes:
<syntaxhighlight lang="text">
# Do not edit manually, except to remove complete lines.
fuzzy    ¦kruler.png¦  58ba...872a  3d94...2c6f
obsolete  ¦kruler-settings.png¦  4dd2...5ebf  ae31...b561
</syntaxhighlight>
Now the translator should remove the obsoleted localized <tt>kruler-settings.png</tt>, and check the new original <tt>kruler.png</tt> to see what to do with the localized <tt>kruler.png</tt>. When the fuzzy state of localized <tt>kruler.png</tt> is resolved -- by making a new localized screenshot, or deciding that the existing screenshot is still fine -- the translator edits <tt>l10n-track</tt> to remove its line. After rerunning <tt>lbundle-check.py</tt>, new localized <tt>kruler.png</tt> will be picked up with the default <tt>ok</tt> state, and the line for <tt>kruler-settings.png</tt> will be automatically removed.
{{warning|As the top comment in <tt>l10n-track</tt> files states, never manually edit these files except to unfuzzy entries by removing their complete lines. E.g. do not change the state manually, which would leave the original checksum and revision wrong.}}
Instead of running <tt>lbundle-check.py</tt> on the whole <tt>aa/docs/</tt> tree, it can also be run on any given subdirectory of it. It will then check only the files in that subdirectory and below.
=== Many Localized Screenshots at Start ===
There is a small issue with the fact that <tt>lbundle-check.py</tt> will set state of new files to <tt>ok</tt>. If the documentation directory already has a lot of screenshots with unknown state compared to the original, it is not appropriate to consider them <tt>ok</tt>, but <tt>fuzzy</tt>. To have this, after the ''first'' run of <tt>lbundle-check.py</tt> simply postprocess all <tt>l10n-track</tt> files:
<syntaxhighlight lang="bash">
$ find aa/docs/ -iname l10n-track | xargs perl -pi -e 's/^ok  /fuzzy/'
$ find aa/docs/ -iname l10n-track | xargs perl -pi -e 's/[0-9a-z]{32}/0/'
</syntaxhighlight>
(The second command replaces the original checksums with all zeros, as otherwise the files would be again recognized as up-to-date on the next run.) Afterwards, in due time, existing screenshots can be inspected one by one, and their fuzzy state removed as explained previously.

Latest revision as of 16:05, 2 February 2013


Lbundle Checker
On Localization   Tools
Prerequisites   Subversion Ops, Localizing Non-Text Resources
Related Articles   n/a
External Reading   n/a

About

The lbundle_check.py script checks and records the state of localized non-text resources organized in lbundles, allowing translators to track their relation to original resources. Its latest version can be found in trunk/l10n-support/scripts/ directory in the KDE repository.

lbundle_check.py is in no way tied to KDE Translation Project, but can be used in any other environment. Only the instructions in this article are specifically about using lbundle_check.py in context of KDE TP.

Checking Out and Updating Sources

To be able to track lbundle states, the translator (one of the coordinators) needs to have all the sources to which localized resources correspond. This is done by checking out once from the KDE repositories, and then regularly updating local checkouts. This is hard to do and maintain manually for selected sources only.

Instead, the easiest is to check out and update all localization-relevant KDE sources, using a single command. This command is populate_source.sh, found in trunk/l10n-kde4/scripts/ and branches/stable/l10n-kde4/scripts/ directories; each knows how to check out and update sources corresponding to the given translation branch (trunk or stable). It is run simply like this:

$ cd $KDEREPO/trunk/l10n-kde4
$ scripts/populate_source.sh
$ cd $KDEREPO/branches/stable/l10n-kde4
$ scripts/populate_source.sh

Running populate_source.sh which will take quite some time for the first run, but subsequent runs will be much faster. After the run is complete, in current working directory there will be the source/ directory, with all the checkouts in it. The local directory tree should then look like this:

$KDEREPO/
    trunk/
        l10n-kde4/
            scripts/
            source/
            ...
    branches/
        stable/
            l10n-kde4/
                scripts/
                source/
                ...

At the moment of this writing, the source/ in trunk occupies ~10 GiB of disk space, and in stable ~6 GiB. While this is not little space, it should not be a significant problem given contemporary typical disk sizes.

To quickly check out or update only one or few modules, their names can be given as arguments to populate_source.sh:

$ scripts/populate_source.sh extragear-network_konversation kdegames_konquest

Module names for use as arguments can be found in source/modules file. This file is generated anew whenever populate_source.sh is run without arguments, i.e. to update everything.

The source/ directory contains the repo/ subdirectory with actual local checkouts, and the link/ subdirectory, with links to checkout directories suitable for later operations. If the imaginary KDingus application is part of kdeutils top module, and is kept in a Git repository, it will be located like this in source/ in trunk:

source/
    repo/
        git-unstable/
            kdeutils_kdingus
    link/
        kdeutils/
            kdingus --> ../../repo/kdeutils_kdingus

And if KDingus were in the Subversion repository:

source/
    repo/
        trunk/
            KDE/
                kdeutils/
                    kdingus
    link/
        kdeutils/
            kdingus --> ../../repo/trunk/KDE/kdeutils/kdingus

Note that in both cases the link location stays the same, source/link/kdeutils/kdingus. It will also stay the same in source/ of stable branch, where repo/ will instead contain git-stable/, branches/KDE/4.x/, etc.

There is a special case when a top module name represents Git repository of its own, such as kde-baseapps or kdepim. The link then has to repeat the module name:

source/
    link/
        kde-baseapps/
            kde-baseapps --> ../../repo/git-unstable/kde-baseapps
        kdepim/
            kdepim --> ../../repo/git-unstable/kdepim

lbundle_check.py is capable of gracefully handling this case as well.

Setup

Before going through the setup examples, refresh the examples on organization of lbundles in the repository, which dealt with localizing the splash screen of the imaginary KDingus application.

lbundle_check.py can be used to track both out-of-source and in-source lbundles, but the setup needed for these two modes is somewhat different. The setup is explained for both modes, followed by the details of operation.

Out-of-Source Lbundles

When it was assumed that KDingus was part of kdeutils top module, its out-of-source lbundle for the language aa was organized like this:

$KDEREPO/trunk/l10n-kde4/aa/data/kdeutils/
    CMakeLists.txt
    kdingus/
        CMakeLists.txt
        pics/
            CMakeLists.txt
            l10n-spec
            l10n/
                aa/
                    kdingus-splash.png

except that this listing includes one new file, the l10n-spec. This is the setup file for tracking the lbundle, or rather, all lbundles on its level and beneath. The files outside of l10n/ subdirs are going to be called unbundled, as they are not considered a part of the lbundle proper.

l10n-spec files are composed of key = value fields per line. For the example above the content of l10n-spec would be:

# l10n-spec for KDingus' out-of-source bundle.
source-root = aa/data:source/link;^1
source-vcs = auto
bundle-vcs = auto
languages = aa
track-unbundled = CMakeLists.txt

The fields used in the example are:

source-root
the path to the root directory of the KDingus' sources. Every path-valued field can specify the path in several ways, which will be explained shortly. In this example, aa/data:source/link means to construct the path to original file corresponding to localized file by replacing aa/data in the absolute localized path with source/link; if sources were checked out as explained earlier, this will do exactly the right thing. Then there is the second path specification element separated by semi-colon, ^1, which means to assume that the original file may have one extra parent directory inserted somewhere in its path; this is used to cover the special case of top module name being equal to submodule name (e.g. source/link/kdepim/kdepim). Note that this path specification nowhere refers to KDingus in particular: it should be applicable to all out-of-source bundles.
source-vcs, bundle-vcs
the version control systems used by the source code and the lbundle, respectively. These fields are needed so that lbundle-check.py knows how to perform VCS operations when necessary, as well as to avoid trying to track version control bookkeeping files in the lbundle (e.g. .svn subdirs for Subversiong). Currently the two known VCS are svn and git, and auto can be set to let lbundle-check.py which one it is.
languages
space-delimited list of languages expected in l10n subdirs. This is an optional field, which serves to check for naming errors with language subdirs. It may be left out, preventing such checks, but there is no reason to do so (especially for in-source bundles described below).
track-unbundled
space-delimited list of relative file paths, or pairs of file paths, of unbundled files which should be tracked nevertheless. In this example, CMakeLists.txt should be tracked because the installation instructions for the lbundle may need to change when instructions for the original resources change. A pair of file paths, as two space-separated paths within parenthesis (e.g. (foo.txt bar.txt)), can be given instead of a single file path when the local relative path needs to be different from that of the tracked original. A shell glob can be specified instead of a particular path (e.g. *.txt), but only for single paths, not in pairs. Backslash serves as escape character, e.g. when the file name contains a space.

Several other fields, not used in this example, are available:

ignore-unbundled
space-delimited list of relative file paths which are to be completely ignored by lbundle-check.py. For an out-of-source bundle, a file which is unbundled should be either tracked too (using track-unbundled) or ignored using this field, otherwise lbundle-check.py will complain about it. Shell globs also possible.
strict-state
a boolean value stating whether the tracking state is to be strictly imposed. Non-strict means that out-of-sync states between localized and original resources are recorded only in the track file, whereas strict state means that the localized files names will be changed too to reflect the out-of-sync state. More details in section on operation.
ignore-substr
space-delimited list of substrings to ignore in names of bundled files when determining their original counterparts. This is never needed in KDE, but in other environments some localized files may be handled differently than a plain substitute for the original file, based on a substring in their name (e.g. a localized image may be overlayed over the original, rather than replacing it).
track-by-subdir
by default there is only one tracking file per l10n-spec file (see operation for tracking files); when this option is enabled (a boolean), there will be instead one tracking file per subdirectory (excluding l10n/ directories) starting from the level of l10n-spec file and below. Usefull when frequent moving of subdirectories within the repository is expected, and there are a lot of localized resources to track.

Paths in path-valued fields (such as source-root) can be specified in several modes. When it makes sense, to make up one path field value two or more modes can be chained, separated with semi-colon (;). Each subsequent mode refers to the path established up to that point. Path specification modes are as follows:

  • Relative to a top root directory. This is represented by ordinary relative path. When running lbundle-check.py, the top root directory will have to be given through -s option.
  • Relative to the directory of current lbundle (i.e. to the parent directory of l10n-spec file). This is done by prefixing a relative path with exclamation mark (!).
  • By replacing a part of the absolute path of the current lbundle directory. The replacement specification is of the form findstr:replstr.
  • With allowed insertion of one or more consecutive parent directories when trying to find the path. This is given by ^N, where N is the maximum number of inserted parents. This mode cannot be the first in mode chain.

The usual #-comments are allowed in l10n-spec files, as well as line continuation by trailing backslashes (e.g. when several file paths are needed in track-unbundled field).

In-Source Lbundles

If KDingus is an extragear application, its directory structure with the in-source lbundle having aa and bb languages, and an l10n-spec file, would be this:

/source/link/
    kdingus/
        CMakeLists.txt
        pics/
            CMakeLists.txt
            kdingus-splash.png
            l10n-spec
            l10n/
                aa/
                    kdingus-splash.png
                bb/
                    kdingus-splash.png

The contents of l10n-spec is simpler than that of out-of-source lbundles:

# l10n-spec for KDingus' in-source bundle.
source-vcs = auto
languages = aa bb

Since the lbundle is kept together with the application, sharing same root directory and version control system, there is no need for the source-root and bundle-vcs fields. In fact, the presence or lack of the source-root field identifies the lbundle as out-of-source or in-source.

Field track-unbundled (or ignore-unbundled) are not present either. All files which are unbundled are silently ignored. It is assumed that the KDingus' maintainer will modify and test installation instructions such as not to break installation of localized resources.

languages field could be omitted here as well, but for in-source lbundles it is even more important to keep tight check of wrongly named language subdirs.

Greedy Bundling

The default assumption behind lbundles is that only a small part of all resources need to be localized. It is upon the translator to spot the resources which need localization, make their localized versions, and add them to an lbundle where they will get tracked.

When instead it would be better to track all of the original resources, lbundle tracking can be set to greedy mode: any original resource that does not have a localized counterpart yet will be tracked as missing. When a new original resource gets added, the translator will be notified of that by lbundle-check.py.

Fields in l10n-spec used in greedy mode are as follows:

greedy-bundling
the option to engage greedy mode (a boolean value, set to true). Other fields controlling greedy bundling have no effect if this option is not set.
greedy-monolingual
states whether the lbundle is monolingual (boolean). This should make sense only for out-of-source bundles, where there may be one per language. Greedy mode needs this piece of information in order to know how to report and track missing localized files, as bundled (multilingual lbundle) or unbundled (monolingual).
greedy-from-level
if the lbundle covers a tree of original subdirectories, this field can be used to limit the greedy collection and reporting of missing files only to subdirectories at this level and below. 0 is the level of the l10n-spec file itself, 1 is one level below, etc.
greedy-only-started
normally all original resources missing in localization are reported as such (except the ignored ones), no matter where they reside in the original subdirectory tree covered by the lbundle. This field (a boolean) is used to limit greedy bundling only to those subdirectories already existing in the lbundle. Useful in several scenarios, e.g. when starting the resource localization to avoid being engulfed in listings of missing localized resources. Automatically enabled when track-by-subdir is in effect.

Operation

To check all lbundles for the language aa in trunk, lbundle-check.py can be run like this:

$ cd $KDEREPO/trunk/l10n-kde4/
$ lbundle-check.py aa/ # check all out-of-source
$ lbundle-check.py source/link/ -l aa # check all in-source

If -l aa option weren't specified in the second invocation, lbundle-check.py would check all languages in source/link/ instead of only aa. In the first invocation -l aa was omitted because out-of-source bundles contain only their own language, but it wouldn't cause any problem if it were present. Only particular lbundles can be checked by giving their subdirectories as arguments.

So, what does lbundle-check.py actually do, and what does it report? For brevity, let's limit to the out-of-source bundle for the KDingus as kdeutils application. After setting up the l10n-spec for that case (as shown earlier), running lbundle-check.py for the first time will output:

$ cd $KDEREPO/trunk/l10n-kde4
$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
Added to tracking: 2
  aa/data/kdeutils/kdingus/pics/CMakeLists.txt
  aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
$

The file aa/data/kdeutils/kdingus/pics/l10n-track will be created, with the following contents:

# Do not edit manually, except to remove complete lines.

# -
ok        ¦CMakeLists.txt¦  865f7...e926d  755260

# aa
ok        ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647

For each l10n-spec file encountered, lbundle-check.py will create one of these l10n-track files. l10n-track file, in each non-comment, non-empty line, contains four fields, in order: the state of the localized resource against the original, the relative path to the bundled resource, the checksum of the original resource, and the VCS revision string of the original file which has this checksum.

At this point, when l10n-spec file has been manually written, and l10n-track created by running lbundle-check.py, both should be added and committed to version control.

So long as the original resource does not change, rerunning lbundle-check.py in the same way will do nothing, as everything is in sync. Obviously, the original resources should be updated to the latest repository version prior to checking, and lbundle-check.py can do that itself if started with -u option:

$ lbundle-check.py -u aa/data/
svn up .../kdeutils/kdingus/pics/CMakeLists.txt
At revision 762512.
svn up .../kdeutils/kdingus/pics/kdingus-splash.png
At revision 762512.
$

Once the original resource is modified, e.g. KDingus' splash screen is tweaked up, lbundle-check.py will report the following:

$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
Newly fuzzied: 1
  aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png

and its entry in l10n-track will state:

fuzzy     ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647

i.e. only the status will have changed from ok to fuzzy. Now the translator has the information needed to compare what has changed in the original splash image: this entry still states the revision of the previous original file on which the localized file was based.

If the original resource is renamed, moved, or deleted, the report will be slightly different:

$ lbundle-check.py aa/data/
!  aa/data/kdeutils/kdingus/pics/l10n-track
--------------------
Newly obsoleted: 1
  aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png

and the entry in l10n-track:

obsolete  ¦l10n/aa/kdingus-splash.png¦  37c4a7...8fe3a  755647

Now the translator must find out, using the old revision string, what exactly happened to the original resource, and do the same for the localized resource.

When the fuzzy or obsolete state has been resolved, localized resource updated to reflect changes in the original, it should be recorded as such. To do this, the entry for the resource is first manually deleted from l10n-track file (the whole line is removed), and then lbundle-check.py is rerun. This will recreate the entry with a fresh ok state.

If strict-state field has been set to true in l10n-spec, then not only the state of the entry in l10n-track is changed, but the tracked file itself is renamed to contain ~fuzzy or ~obsolete marker just before the extension. This mode is called strict because unless the out-of-sync state is corrected, the file is "misnamed" from the point of view of the runtime system, and will not be used instead of the original (it may even not get installed, depending on the exact installation instructions).

To resolve the fuzzy or obsolete state in strict mode, it is not necessary to manually remove the entry from l10n-track. Instead, it is enough to put an updated version of the file (without the state maker in its name) next to the out-of-sync version, and rerun lbundle-check.py. It will remove the old version and update the entry in l10n-track. (The renaming/removal are version control aware, i.e. proper version control commands will be used for these operations if *-vcs fields have been set in the spec file.)

When greedy bundling is engaged, each original file which doesn't have a localized counterpart will be reported in the track file (either as bundled or unbundled, depending on greedy-monolingual option), with it's state set to missing (except for the ignored files, given by the ignore-unbundled option). When the localized file is placed at the appropriate location, next run of lbundle-check.py will change the state to ok.

Special Case: Tracking KDE Documentation Resources

Note
This section should be mostly self-contained, providing all the information needed for resource tracking in documentation, for two reasons. Firstly, it can be reasonably assumed that many more KDE translation teams will want to track localized documentation resources, rather than some other, special resources. Secondly, structure of documentation resources is significantly different from that of normal lbundles, so having separate examples helps.


KDE documentation consists of two types of resources: the text, which is provided through Docbook files, and other data, mostly screenshots from applications. The text from original Docbook files is extracted into PO templates, translated through PO files, and localized Docbook files created out of them. Thus, when the original text changes, translators are automatically made aware of that through new and fuzzy entries in PO files. This is not so for screenshots. When an original screenshot changes, gets removed or added, translators get no notice of it. However, the tracking mechanism provided by lbundle-check.py can be used to help rectify this.

Localized resources of KDE documentation are not really organized into lbundles; their organization far precedes the lbundling system. Still, every language's documentation directory (aa/docs/) can be understood as one greedy monolingual lbundle composed solely of unbundled resources, and as such tracked by the lbundle-check.py. Docbook files are excluded from tracking, as they are kept up-to-date through Docbook-PO-Docbook roundtrip. At the moment, tracking is set up and operated by the coordinator of each language team that wants to have it.

To set up tracking, the coordinator of the language aa should maintain the local checkout of several modules, with same structure as KDE repository, as follows:

$KDEREPO/
    trunk/
        l10n-kde4/
            scripts/
            templates/
            documentation/
            aa/
        l10n-support/
            scripts/
    branches/
        stable/
            l10n-kde4/
                scripts/
                templates/
                documentation/
                aa/

In fact, it is a good idea anyway for a language coordinator to maintain such a checkout (the total space requirement is between 500 MB and 1 GB).

The documentation/ subdirectory is not actually found in the Subversion repository; it is created and periodically updated by running the scripts/populate_documentation.sh script, in stable and trunk:

$ cd $KDEREPO/trunk/l10n-kde4
$ scripts/populate_documentation.sh
$ cd $KDEREPO/branches/stable/l10n-kde4
$ scripts/populate_documentation.sh

populate_documentation.sh from trunk will get exactly the documentation modules' branches translated as trunk, and populate_documentation.sh from stable those translated as stable.

After this file structure is established, create the files:

$KDEREPO/trunk/l10n-kde4/aa/docs/l10n-spec
$KDEREPO/branches/stable/l10n-kde4/aa/docs/l10n-spec

both with the following, branch-independent content:

source-root = aa/docs:documentation
source-vcs = file
bundle-vcs = auto
track-by-subdir = yes

greedy-bundling = yes
greedy-monolingual = yes
greedy-from-level = 2

ignore-unbundled = \
    CMakeLists.txt */CMakeLists.txt */*/CMakeLists.txt */*/*/CMakeLists.txt \
    */*/*.docbook */*/*/*.docbook \

Note the language code aa in aa/docs:... at the top, and modify it accordingly.

Finally, run the lbundle-check.py script from trunk/l10n-support/scripts/ on the documentation directories of your language, e.g. for the trunk:

$ lbundle-check.py $KDEREPO/trunk/l10n-kde4/aa/docs/

After some time, depending on how many localized screenshots you have made already, a whole lot of output may appear -- listing of files having been added to tracking. More importantly, each subdirectory in aa/docs/ which has (or could have) at least one localized screenshot, will get a tracking file, named l10n-track. Tracking files will be automatically added to version control as well, so you can commit aa/docs/ right away.

What does a tracking file look like? For example, the original documentation for KRuler contains these files:

documentation/kdegraphics/kruler/
    CMakeLists.txt
    index.docbook
    kruler.png
    kruler-settings.png

Assume that your language's localized documentation contains:

aa/docs/kdegraphics/kruler/
    CMakeLists.txt
    index.docbook
    kruler.png

that is, the localized variant of kruler-settings.png is missing. Then, the aa/docs/kdegraphics/kruler/l10n-track that got created will have this contents:

# Do not edit manually, except to remove complete lines.

ok        ¦kruler.png¦  58ba...872a  58ba...872a
missing   ¦kruler-settings.png¦  4dd2...5ebf  4dd2...5ebf

Each line contains four pieces of data: the sync state and the name of the localized file, and the twice repeated checksum of the original file. (Instead of the second checksum normally there would be the repository revision of the original file, but populate_documentation.sh does not make proper repository checkouts). Since this is the first run of lbundle-check.py, it assumed that the present localized kruler.png is up-to-date (more later on this assumption), and so its state is set to ok. For kruler-settings.png, which is not there in the localized directory, the state is set as missing.

The day-to-day operation constitutes running lbundle-check.py from time to time, in the same way as the first time. When a new original screenshot is added, the appropriate l10n-track file will get a new missing entry. When a localized screenshot is produced and put into correct place, the missing state will change into ok:

$ cd $KDEREPO/trunk/l10n-kde4/
$ (...put localized kruler-settings.png into aa/docs/...)
$ lbundle-check.py aa/docs/
A  (bin)  aa/docs/kdegraphics/kruler/kruler-settings.png
M      aa/docs/kdegraphics/kruler/l10n-track
--------------------
New 'ok': 1
  aa/docs/kdegraphics/kruler/kruler-settings.png (...)
$

Note that version control operations are done automatically too.

Assume now that, in the above example, after some time the original kruler.png is modified, and the original kruler-settings.png is removed. Then, running the tracker produces:

$ cd $KDEREPO/trunk/l10n-kde4/
$ lbundle-check.py aa/docs/
M      aa/docs/kdegraphics/kruler/l10n-track
--------------------
New 'fuzzy': 1
  aa/docs/kdegraphics/kruler/kruler.png (...)
New 'obsolete': 1
  aa/docs/kdegraphics/kruler/kruler-settings.png (...)
$

and the content of tracking file becomes:

# Do not edit manually, except to remove complete lines.

fuzzy     ¦kruler.png¦  58ba...872a  3d94...2c6f
obsolete  ¦kruler-settings.png¦  4dd2...5ebf  ae31...b561

Now the translator should remove the obsoleted localized kruler-settings.png, and check the new original kruler.png to see what to do with the localized kruler.png. When the fuzzy state of localized kruler.png is resolved -- by making a new localized screenshot, or deciding that the existing screenshot is still fine -- the translator edits l10n-track to remove its line. After rerunning lbundle-check.py, new localized kruler.png will be picked up with the default ok state, and the line for kruler-settings.png will be automatically removed.

Warning
As the top comment in l10n-track files states, never manually edit these files except to unfuzzy entries by removing their complete lines. E.g. do not change the state manually, which would leave the original checksum and revision wrong.


Instead of running lbundle-check.py on the whole aa/docs/ tree, it can also be run on any given subdirectory of it. It will then check only the files in that subdirectory and below.

Many Localized Screenshots at Start

There is a small issue with the fact that lbundle-check.py will set state of new files to ok. If the documentation directory already has a lot of screenshots with unknown state compared to the original, it is not appropriate to consider them ok, but fuzzy. To have this, after the first run of lbundle-check.py simply postprocess all l10n-track files:

$ find aa/docs/ -iname l10n-track | xargs perl -pi -e 's/^ok   /fuzzy/'
$ find aa/docs/ -iname l10n-track | xargs perl -pi -e 's/[0-9a-z]{32}/0/'

(The second command replaces the original checksums with all zeros, as otherwise the files would be again recognized as up-to-date on the next run.) Afterwards, in due time, existing screenshots can be inspected one by one, and their fuzzy state removed as explained previously.