Development/Tutorials/Localization/i18n Krazy

From KDE TechBase
Revision as of 19:00, 6 August 2007 by Ilic (talk | contribs) (Add contact data.)

Abstract

There are small technical details of i18n which are not that easy to keep in mind at all times, as well as number of i18n recommendations to uphold during the development. To help you with this, the Krazy code checker also looks for some frequently encountered i18n issues. This article documents these issues as reported by Krazy, for cases when you are not sure what the remedy should be.

Placeholders and Arguments

The i18n API is very strict about congruence between the %number placeholders in the message, and the arguments actually supplied to substitute them. Effectively, the placeholders directly index arguments, albeit one- rather than zero-based.

"wrong argument count, have num1 need num2"
Either some of the arguments have not been provided, or there is a stray placeholder inside the message string. A likely cause is forgetting that in KDE4, arguments are added as parameters to the i18n call itself, rather than appended via arg() methods as in KDE3:

i18n("Found key: %1", key); // correct i18n("Found key: %1").arg(key); // ***wrong

"too many arguments, have num max 9"
i18n calls can take at most 9 arguments as parameters to the call. If more than that is needed, the ki18n*() series of calls must be used, see the documentation to KLocalizedString class. The calls with more than 9 parameters are extremely rare, though.
"gaps in placeholder numbering, ..."
Except in case of plural i18n calls, there must be no gaps in placeholder sequence, starting from %1. In plural calls, the placeholder of the first number (which determines plural) form, may be omitted, both in singular and plural:

i18n("Line: %1 Column: %2", lineNo, colNo); // correct i18n("Line: %1 Column: %3", lineNo, colNo); // ***wrong

i18np("Found a file in folder %2",

     "Found %1 files in folder %2", nfiles, folder);    // correct

i18np("Found a file in folder %2",

     "Found some files in folder %2", nfiles, folder);  // also correct

i18np("Found a file in folder %1",

     "Found some files in folder %1", folder, nfiles);  // ***wrong

"legacy %n placeholder in plural call"
This is a remnant from KDE3, where in plural i18n calls, the argument determining the plural form had special %n placeholder. In KDE4, all arguments have ordinary %number placeholders, as in the examples above (the plural form is decided upon the lowest-numbered argument that is an integer).

Ambiguous Short Messages

English is a rather noninflected language compared to many others; single English word can frequently be noun, verb, or adjective, while retaining its original form. This presents frequent problem for the translator into inflected language when the original message is short, especially single-worded. The solution is to add context to the message via i18nc() call.

"single adjective as message, probably ambiguous; ..."
Words that can be treated as adjectives are especially prone to ambiguities. Therefore Krazy checks single-worded messages against list of adjectives collected from the KDE codebase, and issues this warning if the matching message does not have a context. For example:

titleFinal = title.isEmpty() ?

            i18n("Unknown") : title; // ambiguous

titleFinal = title.isEmpty() ?

            i18nc("An unknown title", "Unknown") : title; // clarified

"reported ambiguous message by translators; ..."
There are other troublesome words, or even phrases, which were explicitly reported by translators as ambiguous. This warning means that such a message without context has been detected.

While you are at adding contexts, consider providing the appropriate KUIT context marker as well, which will further zero-in translators' job: titleFinal = title.isEmpty() ?

            i18nc("@item:intable An unknown title", "Unknown") : title;

// way to go!

Number Formatting

The number-valued (either integer or real) arguments to i18n messages are formatted automatically into given language, without programmer's intervention. Using other methods to format numbers into strings may circumvent proper formatting for the language.

"use of QString::number() on an argument"
QString::number() should never be used to format "amount" numbers, because within KDE code it will do so using English conventions. However, sometimes the number is not an amount, e.g. port number 15000 should not be formatted as "15,000" in English; use <numid> KUIT tag in this situation:

i18n("Number of pages: %1", numPages); // good, localized amount format i18n("Connected to port %1.", port); // bad, amount format not desired i18n("Connected to port %1.", QString::number(port)); // bad, not localized i18n("Connected to port <numid>%1</numid>.", port); // good

"use of KLocale::formatNumber() on an argument (...)"
A smarter way to format numbers is using KLocale::formatNumber(), which will honor user's settings. However, then the format cannot be decided upon the language of the particular message in question (some applications may not have translations), so it's best avoided in i18n arguments. Use it for "live numbers", e.g. in spreadsheet tables and calculator displays, where the format should match user's number-typing habit.

Even when the complete message is a single number, it should be i18n'd, with a proper context: result = QString::number(z); // bad result = i18nc("Atomic number", "%1", z); // good

When the number is to be formatted in a special way (field width, number of decimals, etc.) into the message, still neither QString::number() nor KLocale::formatNumber() should be used, but ki18n*() series of calls with subs() methods (see KLocalizedString documentation): i18n("Percent complete: %1", QString::number(percent, 'f', 1)); // bad ki18n("Percent complete: %1").subs(percent, 0, 'f', 1).toString(); // good

HTML and KUIT Markup

Every i18n message in KDE4 is effectively XML markup. HTML tags come from Qt's rich text, and can be used only in rich-text capable widgets; KUIT tags are new KDE4 semantic markup, which should be preferred to HTML, and can be used in any i18n message (plain or rich-text output is decided on the basis of context marker).

"malformed markup (unmatched tags, etc.)"
Since every message is XML, all tags must be properly closed. Opening <p> must not miss closing </p>, etc. This also holds for breaking HTML tags like <br> and <hr>, which must be closed in place: <br/>, <hr/>.
Verbatim greater-than sign means opening of a tag, which is not always meant. This can be avoided by use of predefined XML &lt; entity (other predefined entities are &gt;, &amp;, &apos;, and &quot;), but for a frequent case of marking generic or user-replaceable text, it is better to use <placeholder> KUIT tag:

i18n("headers go into <includes>"); // ***error in XML markup i18n("headers go into <includes>"); // no markup problem, but... i18n("headers go into <placeholder>includes</placeholder>"); // better

Given it's frequency, it is not needed to use &amp; for shortcut markers; there is some heuristics around XML parsing to allow presence of naked &. Basically, &amp; is needed only in the rare case of &no_whitespace_sequence; pattern which is not meant as an XML entity.
"unclosed <br> ... use proper paragraphs <p>...</p> instead"
This just signals a particularly frequent markup problem, that of <br> not closed in place, like <br/>. Also, sometimes <br> is used to split logical paragraphs (especially when doubled, <br><br>), where proper paragraph tags should be used instead.
"tag is neither KUIT nor HTML tag"
Some of the tags in the message is simply unknown, and will not be understood at runtime. If it is not a typo, but the message is really speaking about tags to the user (e.g. in a HTML editing application), use &lt;foo&gt; pattern.
"HTML tag tag is not advised with KUIT markup"
When Krazy encounters the KUIT context marker in a message, it assumes that message is semantically tagged, and disapproves of some HTML tags which are best replaced with semantic equivalents. For example:

i18n("@info:whatsthis",

    "...this cannot be undone.");               // Krazy complains

i18n("@info:whatsthis",

    "...this <emphasis>cannot</emphasis> be undone."); // fine

i18n("@info",

    "Really delete %1?", filename);               // complains

i18n("@info",

    "Really delete <filename>%1</filename>?", filename); // fine

"tag tag1 cannot be subtag of tag2"
"tag tag has no att attribute"
"tag tag cannot have text content"
These are validity checks for KUIT markup, a bit relaxed form of formal XML validation. The rules of which KUIT tag can contain which, and so on, are given with tag descriptions.

KUIT Context Markers

Semantic context markers give great deal of information to translators of where and how the message is used at runtime. Providing them for any future messages is strongly encouraged, and even equipping existing messages when not in message freeze is welcomed. Krazy helps with the following checks.

"missing KUIT context marker"
If Krazy detects that the source file has a number of messages equipped with KUIT context markers, above some threshold, it assumes that developer's intention was to have all messages marked (as is recommended), and issues this warning for any non-marked message.
"invalid semantic role role"
"invalid interface subcue cue to role role"
"invalid visual format fmt"
Krazy checks that all ingredients of the context marker are defined, and that a particular combination of them is valid.
"expected context marker ctxmark1, got ctxmark2"
For messages used in some boiler-plate situations, canonical context markers have been established. Krazy warns if the context marker to such a message, although valid by itself, was not the expected one. For example, such messages are encountered in setting up the KAboutData information.

Contact

For any questions or suggestions, Krazy i18n checks are presently maintained by Chusslove Illich <[email protected]>.