Localization/Concepts/Transcript

Warning

This section needs improvements: Please help us to

cleanup confusing sections and fix sections which contain a todo

Where does this article belong to? Should there be a Translation section on Techbase?

Translation Scripting?

Current state of affairs in localization of user-visible strings (messages) in application interfaces is such that translator is sometimes forced to supply an inadequate translation. This problem typically occurs when a message contains a placeholder to be substituted at runtime, or when two unrelated strings become related by placement in the interface. In either case, the modest requirements of English language on congruence of words in a sentence allow original string to remain grammatically correct, while not so in many other languages.

Translators and programmers sometimes try to work out a change in the code which would provide a more workable alternative. However, this process is difficult, and worse yet, the outcome is still language-dependent: solving the problem for a few languages does not necessarily solve it for all other.

One way to overcome these problems in a more general and compartmentalized manner, is to provide translators with a way to modify translated strings at runtime, depending on the context (eg. particular placeholder substitutes). In other words, to script translation. Translator should be able to operate on any interface string he wishes, while the programmer shouldn't bear any extra burden (or know about translation scripting at all).

The Transcript Engine

KDE4 comes ready with a translation scripting system, the Transcript. Several strategic choices were made in its design:

programmers are unaware of scripting, which means that translator can script any message without outside coordination
unless translator wants to script a message, he is faced with familiar, standard Gettext PO environment (ie. translators too can be scripting-agnostic)
scripting is a low-level bolt-on to Gettext environment, to keep existing PO tools in the game
to have enough power for unforeseen needs, a general-purpose scripting language is provided: JavaScript (with extensions for interfacing with Transcript)

Note

For comparison, another translation scripting approach taking some different decisions (new translation environment down to the file format, scripting facilities more specialized, etc.) is brewing at http://wiki.mozilla.org/L20n

To script a particular message, translator writes short scripting calls into msgstr in the PO file which expand into parts of msgstr (interpolations), and JavaScript code which defines these calls into accompanying Transcript module file (eg. foo.po can be augmented with foo.js).

In case the application draws translations from several PO files, scripting calls defined in one of the Transcript modules are available in all used PO files. Since every KDE app uses kdelibs.po, calls defined in kdelibs.js are available everywhere.

The scripting process is illustrated by several examples. More detailed explanations of the elements are given in following sections.

A Useless Example

In Nevernessian it is impolite to speak out a greet with the same tone of voice throughout; instead, the name of the person must be shouted out. Hence, the translator wants to capitalize the placeholder substitute in the following login greet in neverness.po:

neverness_login.cpp:10

msgid "Hello, %1!" msgstr "Heelyy, %1!"

So the translator adds a scripted msgstr, with an interpolation:

neverness_login.cpp:10

msgid "Hello, %1!" msgstr "Heelyy, %1!" "|/|" "Heelyy, $[shout %1]!"

The first thing to note is that, while a bit longer, msgstr is still a proper PO msgstr, which means that it can be edited and processed by the usual PO tools.

The first part of msgstr is same as before, and called the fallback in this context: if the scripted part happens to fail in some way, the fallback translation is used. The fallback is followed by the fence |/|, which separates the fallback and scripted translation (and, for that matter, indicates that this message is scripted).

Finally, there is the scripted translation after the fence. Compared to fallback, it contains the interpolation $[shout %1], which is supposed to evaluate to a capitalized version of the placeholder substitute. It is composed of the call name, shout, and one argument to it, the %1 placeholder which will be replaced by its substitute. The syntax and expansion rules for interpolations are similar to Unix shell.

The call shout itself is defined in the Transcript module neverness.js, which contains only these lines:

function capitalize (str) {

   return str.toUpperCase();

Ts.setcall("shout", capitalize);

Here the function capitalize is an ordinary JavaScript function which takes a string argument and returns all-caps version of it.

The link with the PO file is established by the call to Ts.setcall() -- the Transcript interface is represented by the property functions of the Ts object. In this variant, the Ts.setcall() takes the name of the call for the interpolations in the PO messages (a string), and the JavaScript function which will actually be invoked (bound to the call).

That's it, now the fair Nevernesse folks are greeted properly.

Tip

The source (repository) and install locations of Transcript modules is not quite decided yet. For the moment, if the PO file is installed as $KDEDIR/share/locale/lang/LC_MESSAGES/foo.mo, the Transcript module will be looked for as $KDEDIR/share/locale/lang/LC_SCRIPTS/foo/foo.js -- note the extra subfolder named like the base filename. Debug shell output will show when the module has been loaded.

Basic Case Resolution

One problem frequently encountered is wrong noun case when placeholder is substituted in the msgstr. For example, in many languages every KDE app has such a problem in the Help menu, with one or both of "About %1..." and "%1 &Handbook". This can be scripted in kdelibs.po like this:

msgid "&About %1" msgstr "&O %1" "|/|" "&O $[get-case dative %1]"

The get-case interpolation is supposed to get the dative case of whatever app name the %1 happens to be. The Transcript module kdelibs.js contains the definition of get-case, as well as the dictionary of cases:

function getProperty (prop, key) {

   return _dict_[key][prop];

}
Ts.setcall("get-case", getProperty);
_dict_ = {};
function addDictCases (key, gen, dat, acc, ins) {

   if (!_dict_[key])
       _dict_[key] = {};
   _dict_[key]["genitive"]     = gen;
   _dict_[key]["dative"]       = dat;
   _dict_[key]["accusative"]   = acc;
   _dict_[key]["instrumental"] = ins;

// dictionary entries follow: addDictCases("KWrite", "KWritea", "KWriteu", "KWrite", "KWriteom"); addDictCases("Konsole", "Konsole", "Konsoli", "Konsolu", "Konsolom"); ...

Function getProperty, bound to get-case call, simply returns the entry from the dictionary of forms. Function addDictCases is responsible for adding the static entries (name and its cases) into the dictionary, which is done in the final few lines for all apps of interest.

This completes the example, but for better modularization, it is also possible split out the dictionary insertion in a separate file, eg. appdict.js:

// appdict.js addDictCases("KWrite", "KWritea", "KWriteu", "KWrite", "KWriteom"); addDictCases("Konsole", "Konsole", "Konsoli", "Konsolu", "Konsolom"); ...

and use Transcript interface to load this file in the kdelibs.js:

// kdelibs.js ... ... ... Ts.load("appdict");

Note that Ts.load() takes filename without extension, and assumes its location is relative to the folder of the parent file (ie. in this case kdelibs.js and appdict.js should be in the same folder).

Dynamic Case Setting

The previous scripted example solves the original problem, but introduces the burden of maintaining the dictionary insertion file. There is no way around this when the placeholder substitutes are "dead" strings from outside (eg. from .desktop files), but when they are coming from KDE's PO files at runtime, this burden can be removed.

The app name in KDE's Help menu indeed comes from the app PO file, and it is of course encountered at runtime before the menu strings come into focus. This allows setting the cases of app name in the PO msgstr which contains it. For example, katepart.po contains the "KWrite" string, and the forms could be set at that point:

msgid "KWrite" msgstr "KWrite" "|/|" "$[set-cases KWritea KWriteu KWrite KWriteom]"

The set-cases is a side-effect interpolation: it should set the dictionary entries, but this particular message should in any case use the ordinary translation. Assuming all the definitions from previous example are still in effect, here is how set-cases could be defined in kdelibs.js:

function dynamicSetCases (gen, dat, acc, ins) {

   addDictCases(Ts.msgstrf(), gen, dat, acc, ins);
   Ts.fallback();

} Ts.setcall("set-cases", dynamicSetCases);

In other words, this is little more than a wrapper to "static" addDictCases from previous example, but two new elements of Transcript interface appear. First is the Ts.msgstrf() function, which returns the finalized ordinary translation (placeholders substituted), and which is needed in this case as the dictionary key. Second is the Ts.fallback() function, which signalizes the Transcript engine to disregard the result of the scripted part of msgstr and use the ordinary translation.

Admittedly, the use of Ts.fallback() in this case is not necessary, but given for introductory purpose; dynamicSetCases might as well return the ordinary translation via Ts.msgstrf().

The PO Shell

This section gives the details of how the interpolations in the PO msgstr are expanded before evaluation.

The interpolations are parts of msgstr between $[...], and are parsed into a number of strings. The first string is the name of the call registered in the scripting module via Ts.setcall(), and the rest are arguments to bound JavaScript function. This means that all arguments passed by Transcript to the bound function are of JavaScript type String.

The special characters in the interpolation are whitespace, single quote (') and backslash (\). Whitespace separates arguments, whereas single quote is used for arguments which contain whitespaces. The backspace is used as escape; it can escape whitespace in non-quoted arguments, or single quotes in quoted arguments. This is pretty much like a typical Unix shell.

Double quotes are not special. Single quotes are used instead of double quotes because it makes it easier to edit interpolations in PO files, where double quotes would have to be escaped. This also means that when escape is needed in the interpolation, it must be escaped once itself for the PO msgstr.

The biggest difference from the shell expansion is that unlike with shell variables, the placeholders are expanded such that whitespace inside them is also not special. Otherwise, placeholders would always have to be quoted for safety, which is made unnecessary by this feature.

The call name bound to a JavaScript function using Ts.setcall() does not have to be a proper JavaScript identifier, but any Unicode string not containing the interpolation-special characters. This means that more "natural" call names can be used inside the msgstr, like those with dashes or non-Latin1 characters.

Sub-interpolations (in the line of shell backticks) are not implemented, partly because it may be a good style to keep scripted msgstr simple in the PO file, and provide all the logic in the accompanying Transcript module.

If a closing square bracket is needed as an argument inside the interpolation, it can be given within single quotes.

The Transcript Interface

Transcript provides extensions to the JavaScript, which interface with the PO file and the Transcript environment. They are all function properties of the Ts object, accessible as Ts.func(args).

setcall (name, func, obj)

Binds the call name to the JavaScript function, for use in the interpolations inside the PO file.

name name of the call. Can be any Unicode string, for ease of use in the msgstr

func function object

obj object to act as this in the function call

Returns Undefined.

setcall (name, func)

Binds the call name to the JavaScript function, for use in the interpolations inside the PO file. this in the function refers to global object.

name name of the call. Can be any Unicode string, for easy use in the PO file

func function object

Returns Undefined.

load (file*)

Evaluates the code in the specified files, in the left to right order. File paths are expected to be relative to current module's folder.

file file name without extension

Returns Undefined.

fallback (): Forces Transcript to use ordinary translation, regardless of whether the interpolation evaluates successfully or fails.; Returns Undefined.

msgid (): Returns msgid of the last message, with placeholders intact.

msgstrf (): Returns finalized ordinary translation of the last message, with placeholders substituted.

msgctxt (): Returns msgctxt of the last message, with placeholders intact.

msgkey (): Returns a String which is implementation-dependent combination of msgctxt and msgid with placeholders intact. Used to uniquely identify the message within the PO file, usefull as a dictionary key.

nsubs (): Returns the number of substitutes provided for placeholders in the last message. It is equal to the highest-numbered placeholder for a proper i18n call in the application code, but i18n calls do not have to be quite proper.

subs (index)

Used to access values of placeholder substitutes provided to the last message. Numbering is zero-based.

index index of placeholder substitute

Returns String, regardless of what original substitute is in the application code.

dbgputs (msg)

Outputs a debug message in the shell when KDE has been compiled with debug option.

msg message string

Returns Undefined.

Real-Life Examples

None yet

Tip

Feel free to add here Transcript snippets which illustrate different takes on already described problems, or present solutions to entirely new problems -- you never know what other language might be in the same trouble.