Localization/Tools/Lbundle Check

    From KDE TechBase
    Revision as of 20:57, 29 June 2011 by Neverendingo (talk | contribs) (Text replace - "</code>" to "</syntaxhighlight>")


    Lbundle Checker
    On Localization   Tools
    Prerequisites   Subversion Ops, Localizing Non-Text Resources
    Related Articles   n/a
    External Reading   n/a

    About

    The lbundle_check.py script checks and records the state of localized non-text resources organized in lbundles, allowing translators to track their relation to original resources. Its latest version can be found in trunk/l10n-support/scripts/ directory in the KDE repository.

    lbundle_check.py can be used to track both out-of-source and in-source lbundles, but the setup needed for these two modes is somewhat different. First the setup is explained for each type of lbundle, followed by the details of operation.

    Setup

    Before going through the setup examples, refresh the examples on organization of lbundles in the repository (localizing the splash screen of the imaginary KDingus application).

    As it will become apparent below, to be able to track lbundle states, the translator (one of the coordinators) needs to have all the original counterparts to localized resources checked out from the KDE repository, and in the same relative positions. E.g. for KDingus example as kdeutils application, checked out files could have the following structure: $KDEREPO/

       trunk/
           l10n-kde4/
               aa/
           KDE/
               kdeutils/
                   kdingus/
    

    </syntaxhighlight> To have it like this without checking out full parent directories, one can use -N option to svn checkout, which avoids recursion into subdirectories.

    Out-of-Source Lbundles

    When it was assumed that KDingus is living in kdeutils, a core KDE module, its out-of-source lbundle for the language aa was organized like this: /trunk/l10n-kde4/aa/data/kdeutils/

       CMakeLists.txt
       kdingus/
           CMakeLists.txt
           pics/
               CMakeLists.txt
               l10n-spec
               l10n/
                   aa/
                       kdingus-splash.png
    

    </syntaxhighlight> except that this listing includes one new file, the l10n-spec. This is the setup file for tracking the lbundle, or rather, all lbundles on its level and beneath. The files outside of l10n/ subdirs are going to be called unbundled, as they are not considered a part of lbundle proper.

    The l10n-spec files are composed of key = value fields per line. For the example above the content of l10n-spec would be:

    1. l10n-spec for KDingus' out-of-source bundle.

    source-root = trunk/KDE/kdeutils/kdingus source-vcs = svn bundle-vcs = svn languages = aa track-unbundled = CMakeLists.txt </syntaxhighlight>

    The fields used in the example are:

    source-root
    the root of the KDingus' sources in the KDE repository. This is needed to link the original counterparts to localized files. The path of the original file relative to the repository top will be composed of this, plus the relative path of the localized file in the lbundle (relative to the l10n-spec file), and stripped of the l10n/aa/ subpath.
    source-vcs, bundle-vcs
    the version control systems used by the source code and the lbundle, respectively. These are needed so that the checker script knows how to update the sources (if requested), and to avoid trying to track version control bookkeeping files in the lbundle (e.g. .svn subdirs for SVN). Currently the only known VCS is svn, but the script abstracts this internally, so that other may be added in the future.
    languages
    space-delimited list of languages expected in l10n subdirs. This is an optional field, which serves to check for naming errors with language subdirs. It may be left out, preventing such checks, but there is no reason to do so (especially for in-source bundles described below).
    track-unbundled
    space-delimited list of relative file paths, or pairs of file paths, of unbundled files which should be tracked nevertheless. In this example, CMakeLists.txt should be tracked because the installation instructions for the lbundle may need to change when those for the original resources change. A pair of file paths, as two space-separated paths within parenthesis (e.g. (foo.txt bar.txt)), can be given instead of a single file path when the local relative path needs to be different from that of the tracked original. A shell glob can be specified instead of a particular path (e.g. *.txt), but only for single paths, not in pairs. Backslash serves as escape character, e.g. when the file name contains a space.

    Several other fields, not used in this example, are available:

    ignore-unbundled
    space-delimited list of relative file paths which are to be completely ignored by the lbundle checker. For an out-of-source bundle, a file which is unbundled should be either tracked too (using track-unbundled) or ignored using this field, otherwise the checker will complain about it. Shell globs also possible.
    strict-state
    a boolean value stating whether the tracking state is to be strictly imposed. Non-strict means that out-of-sync states between localized and original resources are recorded only in the track file, whereas strict state means that the localized files names will be changed too to reflact the out-of-sync state. More details in section on operation.
    ignore-substr
    space-delimited list of substrings to ignore in names of bundled files when determining their original counterparts. This is never needed in KDE, but in other environments some localized files may be handled differently than a plain substitute for the original file, based on a substring in their name (e.g. a localized image may be overlayed over the original, rather than replacing it).
    track-by-subdir
    by default there is only one tracking file per l10n-spec file (see operation for tracking files); when this option is enabled (a boolean), there will be instead one tracking file per subdirectory (excluding l10n/ directories) starting from the level of l10n-spec file and below. Usefull when frequent moving of subdirectories within the repository is expected, and there are a lot of localized resources to track.

    Paths to external sources (such as source-root) can be specified in three ways:

    • relative to the repository top, represented by an ordinary relative path
    • relative to the current bundle directory (the l10n-spec file), by prefixing the path with exclamation mark (!). Useful when the repository organizes branches as subdirectories, since then the path relative to repository top would change between branches.
    • by replacing a part of the path of current bundle directory, where path is given as replacement specification in form of findstr:replstr. Useful when there are many bundles which would have exactly the same l10n-spec files when the path to sources is specified like this.

    The usual #-comments are allowed in l10n-spec files, as well as line continuation by trailing backslashes (e.g. when several file paths are needed in track-unbundled field).

    In-Source Lbundles

    For KDingus living as an extragear app, its directory structure with the in-source lbundle having aa and bb languages, and an l10n-spec file, would be this: /trunk/extragear/utils/

       kdingus/
           CMakeLists.txt
           pics/
               CMakeLists.txt
               kdingus-splash.png
               l10n-spec
               l10n/
                   aa/
                       kdingus-splash.png
                   bb/
                       kdingus-splash.png
    

    </syntaxhighlight> and the contents of l10n-spec somewhat simpler than for out-of-source lbundles:

    1. l10n-spec for KDingus' in-source bundle.

    source-vcs = svn languages = aa bb </syntaxhighlight>

    Since the lbundle is kept together with the application, sharing same root directory and version control system, there is no need for the source-root and bundle-vcs fields. In fact, presence or lack of source-root field identifies the bundle as out-of-source or in-source to the checker script.

    Field track-unbundled (or ignore-unbundled) are not present either. All files which are unbundled are silently ignored. E.g. it is assumed that the KDingus' maintainer will modify and test install instructions so as to not break installation of localized resources.

    languages field could be omitted here as well, but for in-source lbundles it is even more important to keep tight check of wrongly named language subdirs.

    Greedy Bundling

    The default assumption behind lbundles is that only a small part of all resources need to be localized. For example, if a new original resource is added, it is upon translator to notice if it needs localization, make the localized version and add it to lbundle where it will get tracked.

    When instead it would be better to track all original resources, the bundle tracking can be set to greedy mode: any original resource that does not have a localized counterpart yet, will be tracked as missing. I.e. when a new original resource gets added, translator will be warned of it by the tracking process.

    Fields in l10n-spec used in greedy mode are as follows:

    greedy-bundling
    option to engage greedy mode (a boolean value, set to true). Other fields controlling greedy bundling have no effect if this option is not set.
    greedy-monolingual
    states whether the bundle is monolingual (boolean). This should make sense only for out-of-source bundles, where there may be one per language. Greedy mode needs this piece of information in order to know how to report and track missing localized files, as bundled (multilingual lbundle) or unbundled (monolingual).
    greedy-from-level
    if the lbundle covers a tree of original subdirectories, this field can be used to limit the greedy collection and reporting of missing files only to subdirectories at this level and below. 0 is the level of the l10n-spec file itself, 1 is one level below, etc.
    greedy-only-started
    normally all original resources missing in localization are reported as such (except the ignored ones), no matter where in the original subdirectory tree covered by the lbundle they reside. This field (a boolean) is used to limit greedy bundling only to those subdirectories already existing in the lbundle. Useful in several scenarios, e.g. when starting the resource localization to avoid being engulfed in listings of missing localized resources. Automatically enabled when track-by-subdir is in effect.

    Operation

    If the repository is checked out as exemplified above, to check all lbundles for the language aa in trunk, lbundle-check.py can be run like this: $ cd $HOME/kde-svn/ $ lbundle-check.py -s $HOME/kde-svn -l aa trunk/ </syntaxhighlight> This will process all lbundles found in trunk/, whether out-of-source or in-source, containing the language aa. (If -l option weren't specified, it would check all languages in trunk/.) Option -s states the top local directory of the repository, which is needed to resolve original resource paths for out-of-source bundles; it is prepended to paths of external resources (e.g. source-root field) in l10n-spec files, where they were specified as relative to repository top (where specified relative to the l10n-spec file itself, or by a path replacement, the top local directory is not used).

    Or, to check only out-of-source bundles in the trunk/l10n-kde4/aa/data, one would use: $ cd $HOME/kde-svn/trunk/l10n-kde4 $ lbundle-check.py -s $HOME/kde-svn aa/data </syntaxhighlight> where -l aa is omitted since out-of-source bundles will contain only their own language.

    So, what does lbundle-check.py actually do, and what does it report? For brevity, let's limit to the out-of-source lbundle for the KDingus as kdeutils app. After setting up the l10n-spec for that case (as detailed above), running lbundle-check.py for the first time would output: $ cd $HOME/kde-svn/trunk/l10n-kde4 $ lbundle-check.py -s $HOME/kde-svn aa/data ! aa/data/kdeutils/kdingus/pics/l10n-track


    Added to tracking: 2

     aa/data/kdeutils/kdingus/pics/CMakeLists.txt
     aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
    

    $ </syntaxhighlight> creating the file aa/data/kdeutils/kdingus/pics/l10n-track in the process, with the following content:

    1. Do not edit manually, except to remove complete lines.
    1. -

    ok ¦CMakeLists.txt¦ 865f7...e926d 755260

    1. aa

    ok ¦l10n/aa/kdingus-splash.png¦ 37c4a7...8fe3a 755647 </syntaxhighlight> (For a check, executing the same command line immediately for the second time would produce no output, nor change any files.)

    For each l10n-spec file encountered, lbundle-check.py will create one of these l10n-track files. l10n-track file, in each non-comment, non-empty line, contains four fields, in order: the state of the localized resource against the original, the relative path to the bundled resource, the checksum of the original resource, and the VCS revision string of the original file which has this checksum.

    At this point, when l10n-spec file has been manually written, and l10n-track created by running the checker, both should be added and committed to version control.

    So long as the original resource does not change, rerunning lbundle-check.py in the same way will do nothing, as everything is in sync. Obviously, the original resources should be updated to the latest repository version prior to checking, and lbundle-check.py can do that itself if started with -u option: $ lbundle-check.py -s $HOME/kde-svn -u aa/data svn up /home/.../kdeutils/kdingus/pics/CMakeLists.txt At revision 762512. svn up /home/.../kdeutils/kdingus/pics/kdingus-splash.png At revision 762512. $ </syntaxhighlight>

    Once the original resource is modified, e.g. KDingus' splash screen is souped up, lbundle-check.py will report the following: $ lbundle-check.py -s $HOME/kde-svn aa/data ! aa/data/kdeutils/kdingus/pics/l10n-track


    Newly fuzzied: 1

     aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
    

    </syntaxhighlight> and its entry in l10n-track will state: fuzzy ¦l10n/aa/kdingus-splash.png¦ 37c4a7...8fe3a 755647 </syntaxhighlight> i.e. only the status will have changed from ok to fuzzy. Now the translator has the information needed to compare what has changed in the original splash image: this entry still states the revision of the previous original splash, on which the localized one was based.

    If the original resource is renamed, moved, or deleted, the report will be slightly different: $ lbundle-check.py -s $HOME/kde-svn aa/data ! aa/data/kdeutils/kdingus/pics/l10n-track


    Newly obsoleted: 1

     aa/data/kdeutils/kdingus/pics/l10n/aa/kdingus-splash.png
    

    </syntaxhighlight> and the entry in l10n-track: obsolete ¦l10n/aa/kdingus-splash.png¦ 37c4a7...8fe3a 755647 </syntaxhighlight> Now the translator must find out, using the old revision string, what exactly happened to the original resource, and do the same for the localized one too.

    When the fuzzy or obsolete state has been resolved, localized resource updated to reflect changes in the original, it should be recorded as such. To do this, the entry for the resource is first manually deleted from l10n-track file (the whole line is removed), and then lbundle-check.py is rerun. This will recreate the entry with a fresh ok state.

    If strict-state field has been set to true in l10n-spec, then not only the state of the entry in l10n-track is changed, but the tracked file itself is renamed to contain ~fuzzy or ~obsolete marker just before the extension. This mode is called strict because unless the out-of-sync state is corrected, the file is "misnamed" from the point of view of the runtime system, and will not be used instead of the original (it may even not get installed, depending on precise install instructions).

    To resolve the fuzzy or obsolete state in strict mode, it is not necessary to manually remove the entry from l10n-track. Instead, it is enough to put an updated version of the file (without the state maker in its name) next to the out-of-sync version, and rerun lbundle-check.py. It will remove the old version and update the entry in l10n-track. (The renaming/removal are version control aware, i.e. proper version control commands will be used for these operations if *-vcs fields have been set in the spec file.)

    When greedy bundling is engaged, each original file which doesn't have a localized counterpart will be reported in the track file (either as bundled or unbundled, depending on greedy-monolingual option), with it's state set to missing (except for the ignored files, given by the ignore-unbundled option). When the localized file is placed at the appropriate location, next run of lbundle-check.py will change the state to ok.

    Special Case: Tracking KDE Documentation Resources

    Note
    This section should be mostly self-contained, providing all the information needed for resource tracking in documentation, for two reasons. Firstly, it can be reasonably assumed that many more KDE translation teams will want to track localized documentation resources, rather than some other, custom resources. Secondly, structure of documentation resources is significantly different from that of normal localized bundles, so having separate examples helps.


    KDE documentation consists of two types of resources: the text, which is provided through Docbook files, and other data, mostly screenshots from applications. The text from original Docbook files is extracted into PO templates, translated through PO files, and localized Docbook files created out of them. Thus, when the original text changes, the translators are automatically made aware of that through new and fuzzy entries in their POs. This is not so for screenshots. When an original screenshot changes, gets removed or added, translators get no notice of it. However, the tracking mechanism provided by lbundle-check.py can be used to correct this procedural deficiency.

    Localized resources of KDE documentation are not really organized into lbundles; their organization far precedes the lbundling system. Still, every language's documentation directory (ll/docs/) can be understood as one greedy monolingual lbundle composed solely of unbundled resources, and as such tracked by the lbundle-check.py. Docbook files are excluded from tracking, as they are kept up-to-date through Docbook-PO-Docbook roundtrip. At the moment, tracking is set up and operated by a coordinator of each language team that wants to have it.

    To set up tracking, the coordinator of language ll should maintain the local checkout of several modules, with same structure as KDE repository, as follows: $KDEREPO/

       trunk/
           l10n-kde4/
               scripts/
               templates/
               documentation/
               ll/
       branches/
           stable/
               l10n-kde4/
                   scripts/
                   templates/
                   documentation/
                   ll/
    

    </syntaxhighlight> In fact, it is a good idea anyway for a language coordinator to maintain such a checkout (the total space requirement is between 500 MB and 1 GB).

    Then, in trunk and stable branches, create the files: $KDEREPO/trunk/l10n-kde4/ll/docs/l10n-spec $KDEREPO/branches/stable/l10n-kde4/ll/docs/l10n-spec </syntaxhighlight> both with the following, branch-independent content: source-root = ll/docs:documentation source-vcs = svn bundle-vcs = svn track-by-subdir = yes

    greedy-bundling = yes greedy-monolingual = yes greedy-from-level = 2

    ignore-unbundled = \

       CMakeLists.txt */CMakeLists.txt */*/CMakeLists.txt */*/*/CMakeLists.txt \
       */*/*.docbook */*/*/*.docbook \
    

    </syntaxhighlight> (note the language code in ll/docs:... at the top, that you should change to yours).

    Finally, run lbundle-check.py (which lives in .../scripts/) on the documentation directories of your language, e.g. for the trunk: $ lbundle-check.py $KDEREPO/trunk/l10n-kde4/ll/docs/ </syntaxhighlight> After some disk churning, depending on how many localized screenshots you have made already, a whole lot of output may appear -- listing of files having been added to tracking. But, more important is that each subdirectory in ll/docs/, which has (or could have) at least one localized screenshot, will get a tracking file, named l10n-track (they will also be auto-added to version control, so you can commit ll/docs/ right away).

    What does a tracking file look like? For example, original documentation for Okular contains these files: documentation/kdegraphics/okular/

       CMakeLists.txt
       configure.png 
       embedded-files-bar.png
       index.docbook
    

    </syntaxhighlight> Assume that your language's localized documentation contains: ll/docs/kdegraphics/okular/

       CMakeLists.txt
       configure.png
       index.docbook
    

    </syntaxhighlight> i.e. the localized variant of embedded-files-bar.png is missing. Then, the ll/docs/kdegraphics/okular/l10n-track that got created will have such content:

    1. Do not edit manually, except to remove complete lines.

    ok ¦configure.png¦ 58bab833ed1363b27a20e08d63de872a 756845 missing ¦embedded-files-bar.png¦ 4dd248d1130d3bb7dc64cafa34a35ebf 756845 </syntaxhighlight> Each line contains four pieces of data: the sync state and the name of the localized file, and the checksum and revision of the original file. Since this the first run of lbundle-check.py, it assumed that the present localized configure.png is up-to-date (more below on this assumption) and set its state to ok, while it set missing state for embedded-files-bar.png which is not there.

    The day-to-day operation constitutes running lbundle-check.py from time to time, in the same way as the first time. When a new original screenshot is added, the appropriate l10n-track file will get a new missing entry. When a localized screenshot is produced and put into correct place, the missing state will change into ok: $ cd $KDEREPO/trunk/l10n-kde4/ $ (...put localized embedded-files-bar.png into ll/docs...) $ lbundle-check.py ll/docs/ A (bin) ll/docs/kdegraphics/okular/embedded-files-bar.png M ll/docs/kdegraphics/okular/l10n-track


    New 'ok': 1

     ll/docs/kdegraphics/okular/embedded-files-bar.png (...)
    

    $ </syntaxhighlight> Note that version control operations are done automatically too.

    Assume now that, in the above example, after some time the original configure.png is modified, and the original embedded-files-bar.png is removed. Then, running the tracker produces: $ cd $KDEREPO/trunk/l10n-kde4/ $ lbundle-check.py ll/docs/ M ll/docs/kdegraphics/okular/l10n-track


    New 'fuzzy': 1

     ll/docs/kdegraphics/okular/configure.png (...)
    

    New 'obsolete': 1

     ll/docs/kdegraphics/okular/embedded-files-bar.png (...)
    

    $ </syntaxhighlight> and the content of tracking file becomes:

    1. Do not edit manually, except to remove complete lines.

    fuzzy ¦configure.png¦ 58bab833ed1363b27a20e08d63de872a 756845 obsolete ¦embedded-files-bar.png¦ 4dd248d1130d3bb7dc64cafa34a35ebf 756845 </syntaxhighlight> Now the translator should remove the localized embedded-files-bar.png, and check the difference between the new and old original configure.png (the revision in the fourth column is still the old one). When the fuzzy state of localized configure.png is resolved (new localized screenshot made, or original difference can be ignored), the translator edits l10n-track to remove its line. After rerunning lbundle-check.py, new localized configure.png will be picked up with the default ok state (and cheksum and revision of the current original file), and line for embedded-files-bar.png will be automatically removed.

    Warning
    As the top comment in l10n-track files states, never manually edit them except to unfuzzy entries by removing their complete lines. E.g. do not change the state manually, which would leave the original checksum and revision wrong.


    Instead of running lbundle-check.py on the whole ll/docs tree, it can also be run on any given subdirectory of it. It will then check only the files in that subdirectory and below.

    Many Localized Screenshots at Start

    There is a small issue with the fact that lbundle-check.py will set state of new files to ok. If the documentation directory already has a lot of screenshots with unknown state compared to the original, it is not appropriate to consider them ok, but fuzzy. To have this, after the first run of lbundle-check.py simply postprocess all l10n-track files: $ find ll/docs/ -iname l10n-track | xargs perl -pi -e 's/^ok /fuzzy/' $ find ll/docs/ -iname l10n-track | xargs perl -pi -e 's/[0-9a-z]{32}/0/' </syntaxhighlight> (The second command kills the original checksums, as otherwise the files would be again recognized as up-to-date on the next run.) Afterwards, in due time, existing screenshots can be inspected one by one, and their fuzzy state removed as explained previously.