Development/Tutorials/Localization/i18n Krazy: Difference between revisions

From KDE TechBase
(How to add context in XML files.)
(→‎Abstract: fix link to the krazy code checker on community)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Abstract ==
== Abstract ==


There are small technical details of i18n which are not that easy to keep in mind at all times, as well as number of i18n recommendations to uphold during the development. To help you with this, the [[Development/Tutorials/Code_Checking|Krazy code checker]] also looks for some frequently encountered i18n issues. This article documents these issues as reported by Krazy, for cases when you are not sure what the remedy should be.
There are small technical details of i18n which are not that easy to keep in mind at all times, as well as number of i18n recommendations to uphold during the development. To help you with this, the [https://community.kde.org/Guidelines_and_HOWTOs/Code_Checking Krazy code checker] also looks for some frequently encountered i18n issues. This article documents these issues as reported by Krazy, for cases when you are not sure what the remedy should be.


== Placeholders and Arguments ==
== Placeholders and Arguments ==
Line 9: Line 9:
; <tt>"wrong argument count, have ''num1'' need ''num2''"</tt>
; <tt>"wrong argument count, have ''num1'' need ''num2''"</tt>
:: Either some of the arguments have not been provided, or there is a stray placeholder inside the message string. A likely cause is forgetting that in KDE4, arguments are added as parameters to the i18n call itself, rather than appended via <tt>arg()</tt> methods as in KDE3:
:: Either some of the arguments have not been provided, or there is a stray placeholder inside the message string. A likely cause is forgetting that in KDE4, arguments are added as parameters to the i18n call itself, rather than appended via <tt>arg()</tt> methods as in KDE3:
<code cpp>
<syntaxhighlight lang="cpp-qt">
i18n("Found key: %1", key);      // correct
i18n("Found key: %1", key);      // correct
i18n("Found key: %1").arg(key);  // ***wrong
i18n("Found key: %1").arg(key);  // ***wrong
</code>
</syntaxhighlight>


; <tt>"too many arguments, have ''num'' max 9"</tt>
; <tt>"too many arguments, have ''num'' max 9"</tt>
Line 19: Line 19:
; <tt>"gaps in placeholder numbering, ..."</tt>
; <tt>"gaps in placeholder numbering, ..."</tt>
:: Except in case of plural i18n calls, there must be no gaps in placeholder sequence, starting from <tt>%1</tt>. In plural calls, the placeholder of the first number (which determines plural) form, may be omitted, both in singular and plural:
:: Except in case of plural i18n calls, there must be no gaps in placeholder sequence, starting from <tt>%1</tt>. In plural calls, the placeholder of the first number (which determines plural) form, may be omitted, both in singular and plural:
<code cpp>
<syntaxhighlight lang="cpp-qt">
i18n("Line: %1 Column: %2", lineNo, colNo);  // correct
i18n("Line: %1 Column: %2", lineNo, colNo);  // correct
i18n("Line: %1 Column: %3", lineNo, colNo);  // ***wrong
i18n("Line: %1 Column: %3", lineNo, colNo);  // ***wrong
Line 29: Line 29:
i18np("Found a file in folder %1",
i18np("Found a file in folder %1",
       "Found some files in folder %1", folder, nfiles);  // ***wrong
       "Found some files in folder %1", folder, nfiles);  // ***wrong
</code>
</syntaxhighlight>


; <tt>"legacy %n placeholder in plural call"</tt>
; <tt>"legacy %n placeholder in plural call"</tt>
Line 40: Line 40:
; <tt>"single adjective as message, probably ambiguous; ..."</tt>
; <tt>"single adjective as message, probably ambiguous; ..."</tt>
:: Words that can be treated as adjectives are especially prone to ambiguities. Therefore Krazy checks single-worded messages against a list of adjectives collected from the KDE codebase, and issues this warning if the matching message does not have a context. For example:
:: Words that can be treated as adjectives are especially prone to ambiguities. Therefore Krazy checks single-worded messages against a list of adjectives collected from the KDE codebase, and issues this warning if the matching message does not have a context. For example:
<code cpp>
<syntaxhighlight lang="cpp-qt">
titleFinal = title.isEmpty() ?
titleFinal = title.isEmpty() ?
             i18n("Unknown") : title; // ambiguous
             i18n("Unknown") : title; // ambiguous
titleFinal = title.isEmpty() ?
titleFinal = title.isEmpty() ?
             i18nc("An unknown title", "Unknown") : title; // clarified
             i18nc("An unknown title", "Unknown") : title; // clarified
</code>
</syntaxhighlight>


; <tt>"reported ambiguous message by translators; ..."</tt>
; <tt>"reported ambiguous message by translators; ..."</tt>
Line 51: Line 51:


While you are at adding contexts, consider providing the appropriate [[Development/Tutorials/Localization/i18n_Semantics#Context_Markers|KUIT context marker]] as well, which will further zero-in translators' job:
While you are at adding contexts, consider providing the appropriate [[Development/Tutorials/Localization/i18n_Semantics#Context_Markers|KUIT context marker]] as well, which will further zero-in translators' job:
<code cpp>
<syntaxhighlight lang="cpp-qt">
titleFinal = title.isEmpty() ?
titleFinal = title.isEmpty() ?
             i18nc("@item:intable An unknown title", "Unknown") : title;
             i18nc("@item:intable An unknown title", "Unknown") : title;
// way to go!
// way to go!
</code>
</syntaxhighlight>


The ambiguity warning can also be issued for <tt>.ui</tt>, <tt>.rc</tt> and <tt>.kcfg</tt> files. In <tt>.ui</tt> files, text labels can have the <tt>comment</tt> attribute (accessible in Qt Designer as a property to the label), which can be used for adding contexts same as the first argument of the <tt>i18nc()</tt> call. In <tt>.rc</tt> and <tt>.kcfg</tt> files, contexts are added via the <tt>context</tt> attribute.
The ambiguity warning can also be issued for <tt>.ui</tt>, <tt>.rc</tt> and <tt>.kcfg</tt> files. In <tt>.ui</tt> files, text labels can have the <tt>comment</tt> attribute (accessible in Qt Designer as a "disambiguation" property to the label, or "comment" prior to Qt 4.5), which can be used for adding contexts same as the first argument of the <tt>i18nc()</tt> call. In <tt>.rc</tt> and <tt>.kcfg</tt> files, contexts are added via the <tt>context</tt> attribute.


== Number Formatting ==
== Number Formatting ==
Line 65: Line 65:
; <tt>"use of QString::number() on an argument"</tt>
; <tt>"use of QString::number() on an argument"</tt>
:: <tt>QString::number()</tt> should never be used to format "amount" numbers, because within KDE code it will do so using English conventions. However, sometimes the number is not an amount, e.g. port number 15000 should not be formatted as "15,000" in English; use <tt>&lt;numid&gt;</tt> KUIT tag in this situation:
:: <tt>QString::number()</tt> should never be used to format "amount" numbers, because within KDE code it will do so using English conventions. However, sometimes the number is not an amount, e.g. port number 15000 should not be formatted as "15,000" in English; use <tt>&lt;numid&gt;</tt> KUIT tag in this situation:
<code cpp>
<syntaxhighlight lang="cpp-qt">
i18n("Number of pages: %1", numPages); // good, localized amount format
i18n("Number of pages: %1", numPages); // good, localized amount format
i18n("Connected to port %1.", port);  // bad, amount format not desired
i18n("Connected to port %1.", port);  // bad, amount format not desired
i18n("Connected to port %1.", QString::number(port)); // bad, not localized
i18n("Connected to port %1.", QString::number(port)); // bad, not localized
i18n("Connected to port <numid>%1</numid>.", port);  // good
i18n("Connected to port <numid>%1</numid>.", port);  // good
</code>
</syntaxhighlight>


;; <tt>"use of KLocale::formatNumber() on an argument (...)"</tt>
;; <tt>"use of KLocale::formatNumber() on an argument (...)"</tt>
Line 76: Line 76:


Even when the complete message is a single number, it should be i18n'd, with a proper context:
Even when the complete message is a single number, it should be i18n'd, with a proper context:
<code cpp>
<syntaxhighlight lang="cpp-qt">
result = QString::number(z);              // bad
result = QString::number(z);              // bad
result = i18nc("Atomic number", "%1", z); // good
result = i18nc("Atomic number", "%1", z); // good
</code>
</syntaxhighlight>


When the number is to be formatted in a special way (field width, number of decimals, etc.) into the message, still neither <tt>QString::number()</tt> nor <tt>KLocale::formatNumber()</tt> should be used, but <tt>ki18n*()</tt> series of calls with <tt>subs()</tt> methods (see {{class|KLocalizedString}} documentation):
When the number is to be formatted in a special way (field width, number of decimals, etc.) into the message, still neither <tt>QString::number()</tt> nor <tt>KLocale::formatNumber()</tt> should be used, but <tt>ki18n*()</tt> series of calls with <tt>subs()</tt> methods (see {{class|KLocalizedString}} documentation):
<code cpp>
<syntaxhighlight lang="cpp-qt">
i18n("Percent complete: %1", QString::number(percent, 'f', 1));    // bad
i18n("Percent complete: %1", QString::number(percent, 'f', 1));    // bad
ki18n("Percent complete: %1").subs(percent, 0, 'f', 1).toString(); // good
ki18n("Percent complete: %1").subs(percent, 0, 'f', 1).toString(); // good
</code>
</syntaxhighlight>


== HTML and KUIT Markup ==
== HTML and KUIT Markup ==
Line 95: Line 95:


: Verbatim greater-than sign means opening of a tag, which is not always meant. This can be avoided by use of predefined XML <tt>&amp;lt;</tt> entity (other predefined entities are <tt>&amp;gt;</tt>, <tt>&amp;amp;</tt>, <tt>&amp;apos;</tt>, and <tt>&amp;quot;</tt>), but for a frequent case of marking generic or user-replaceable text, it is better to use <tt>&lt;placeholder&gt;</tt> KUIT tag:
: Verbatim greater-than sign means opening of a tag, which is not always meant. This can be avoided by use of predefined XML <tt>&amp;lt;</tt> entity (other predefined entities are <tt>&amp;gt;</tt>, <tt>&amp;amp;</tt>, <tt>&amp;apos;</tt>, and <tt>&amp;quot;</tt>), but for a frequent case of marking generic or user-replaceable text, it is better to use <tt>&lt;placeholder&gt;</tt> KUIT tag:
<code cpp>
<syntaxhighlight lang="cpp-qt" line="">
i18n("headers go into <includes>");      // ***error in XML markup
i18n("headers go into <includes>");      // ***error in XML markup
i18n("headers go into &lt;includes&gt;"); // no markup problem, but...
i18n("headers go into &lt;includes&gt;"); // no markup problem, but...
i18n("headers go into <placeholder>includes</placeholder>"); // better
i18n("headers go into <placeholder>includes</placeholder>"); // better
</code>
</syntaxhighlight>


: Given it's frequency, it is not needed to use <tt>&amp;amp;</tt> for shortcut markers; there is some heuristics around XML parsing to allow presence of naked &amp;. Basically, <tt>&amp;amp;</tt> is needed only in the rare case of <tt>&amp;''no_whitespace_sequence'';</tt> pattern which is ''not'' meant as an XML entity.
: Given it's frequency, it is not needed to use <tt>&amp;amp;</tt> for shortcut markers; there is some heuristics around XML parsing to allow presence of naked &amp;. Basically, <tt>&amp;amp;</tt> is needed only in the rare case of <tt>&amp;''no_whitespace_sequence'';</tt> pattern which is ''not'' meant as an XML entity.
Line 111: Line 111:
; <tt>"HTML tag ''tag'' is not advised with KUIT markup"</tt>
; <tt>"HTML tag ''tag'' is not advised with KUIT markup"</tt>
:: When Krazy encounters the KUIT context marker in a message, it assumes that message is semantically tagged, and disapproves of some HTML tags which are best replaced with semantic equivalents. For example:
:: When Krazy encounters the KUIT context marker in a message, it assumes that message is semantically tagged, and disapproves of some HTML tags which are best replaced with semantic equivalents. For example:
<code cpp>
<syntaxhighlight lang="cpp-qt">
i18n("@info:whatsthis",
i18n("@info:whatsthis",
     "...this <i>cannot</i> be undone.");              // Krazy complains
     "...this <i>cannot</i> be undone.");              // Krazy complains
Line 121: Line 121:
i18n("@info",
i18n("@info",
     "Really delete <filename>%1</filename>?", filename); // fine
     "Really delete <filename>%1</filename>?", filename); // fine
</code>
</syntaxhighlight>


; <tt>"tag ''tag1'' cannot be subtag of ''tag2''"</tt>
; <tt>"tag ''tag1'' cannot be subtag of ''tag2''"</tt>
Line 142: Line 142:
; <tt>"expected context marker ''ctxmark1'', got ''ctxmark2''"</tt>
; <tt>"expected context marker ''ctxmark1'', got ''ctxmark2''"</tt>
:: For messages used in some boiler-plate situations, canonical context markers have been established. Krazy warns if the context marker to such a message, although valid by itself, was not the expected one. For example, such messages are encountered in setting up the {{class|KAboutData}} information.
:: For messages used in some boiler-plate situations, canonical context markers have been established. Krazy warns if the context marker to such a message, although valid by itself, was not the expected one. For example, such messages are encountered in setting up the {{class|KAboutData}} information.
== UI files ==
When Krazy complains about ambiguity in i18n translation in an UI file. You can either add a comment="" attribute using QtDesigner or using a text editor.


== Contact ==
== Contact ==


For any questions or suggestions, Krazy i18n checks are presently maintained by Chusslove Illich &lt;[email protected]&gt;.
For any questions or suggestions, Krazy i18n checks are presently maintained by Chusslove Illich &lt;[email protected]&gt;.

Latest revision as of 13:06, 19 August 2017

Abstract

There are small technical details of i18n which are not that easy to keep in mind at all times, as well as number of i18n recommendations to uphold during the development. To help you with this, the Krazy code checker also looks for some frequently encountered i18n issues. This article documents these issues as reported by Krazy, for cases when you are not sure what the remedy should be.

Placeholders and Arguments

The i18n API is very strict about congruence between the %number placeholders in the message, and the arguments actually supplied to substitute them. Effectively, the placeholders directly index arguments, albeit one- rather than zero-based.

"wrong argument count, have num1 need num2"
Either some of the arguments have not been provided, or there is a stray placeholder inside the message string. A likely cause is forgetting that in KDE4, arguments are added as parameters to the i18n call itself, rather than appended via arg() methods as in KDE3:
i18n("Found key: %1", key);      // correct
i18n("Found key: %1").arg(key);  // ***wrong
"too many arguments, have num max 9"
i18n calls can take at most 9 arguments as parameters to the call. If more than that is needed, the ki18n*() series of calls must be used, see the documentation to KLocalizedString class. The calls with more than 9 parameters are extremely rare, though.
"gaps in placeholder numbering, ..."
Except in case of plural i18n calls, there must be no gaps in placeholder sequence, starting from %1. In plural calls, the placeholder of the first number (which determines plural) form, may be omitted, both in singular and plural:
i18n("Line: %1 Column: %2", lineNo, colNo);  // correct
i18n("Line: %1 Column: %3", lineNo, colNo);  // ***wrong

i18np("Found a file in folder %2",
      "Found %1 files in folder %2", nfiles, folder);    // correct
i18np("Found a file in folder %2",
      "Found some files in folder %2", nfiles, folder);  // also correct
i18np("Found a file in folder %1",
      "Found some files in folder %1", folder, nfiles);  // ***wrong
"legacy %n placeholder in plural call"
This is a remnant from KDE3, where in plural i18n calls, the argument determining the plural form had special %n placeholder. In KDE4, all arguments have ordinary %number placeholders, as in the examples above (the plural form is decided upon the lowest-numbered argument that is an integer).

Ambiguous Short Messages

English is a rather noninflected language compared to many others; single English word can frequently be noun, verb, or adjective, while retaining its original form. This presents frequent problems for the translator while translating into inflected languages when the original message is short, especially single-worded. The solution is to add context to the message via i18nc() call.

"single adjective as message, probably ambiguous; ..."
Words that can be treated as adjectives are especially prone to ambiguities. Therefore Krazy checks single-worded messages against a list of adjectives collected from the KDE codebase, and issues this warning if the matching message does not have a context. For example:
titleFinal = title.isEmpty() ?
             i18n("Unknown") : title; // ambiguous
titleFinal = title.isEmpty() ?
             i18nc("An unknown title", "Unknown") : title; // clarified
"reported ambiguous message by translators; ..."
There are other troublesome words, or even phrases, which were explicitly reported by translators as ambiguous. This warning means that such a message without context has been detected.

While you are at adding contexts, consider providing the appropriate KUIT context marker as well, which will further zero-in translators' job:

titleFinal = title.isEmpty() ?
             i18nc("@item:intable An unknown title", "Unknown") : title;
// way to go!

The ambiguity warning can also be issued for .ui, .rc and .kcfg files. In .ui files, text labels can have the comment attribute (accessible in Qt Designer as a "disambiguation" property to the label, or "comment" prior to Qt 4.5), which can be used for adding contexts same as the first argument of the i18nc() call. In .rc and .kcfg files, contexts are added via the context attribute.

Number Formatting

The number-valued (either integer or real) arguments to i18n messages are formatted automatically into given language, without programmer's intervention. Using other methods to format numbers into strings may circumvent proper formatting for the language.

"use of QString::number() on an argument"
QString::number() should never be used to format "amount" numbers, because within KDE code it will do so using English conventions. However, sometimes the number is not an amount, e.g. port number 15000 should not be formatted as "15,000" in English; use <numid> KUIT tag in this situation:
i18n("Number of pages: %1", numPages); // good, localized amount format
i18n("Connected to port %1.", port);   // bad, amount format not desired
i18n("Connected to port %1.", QString::number(port)); // bad, not localized
i18n("Connected to port <numid>%1</numid>.", port);   // good
"use of KLocale::formatNumber() on an argument (...)"
A smarter way to format numbers is using KLocale::formatNumber(), which will honor user's settings. However, then the format cannot be decided upon the language of the particular message in question (some applications may not have translations), so it's best avoided in i18n arguments. Use it for "live numbers", e.g. in spreadsheet tables and calculator displays, where the format should match user's number-typing habit.

Even when the complete message is a single number, it should be i18n'd, with a proper context:

result = QString::number(z);              // bad
result = i18nc("Atomic number", "%1", z); // good

When the number is to be formatted in a special way (field width, number of decimals, etc.) into the message, still neither QString::number() nor KLocale::formatNumber() should be used, but ki18n*() series of calls with subs() methods (see KLocalizedString documentation):

i18n("Percent complete: %1", QString::number(percent, 'f', 1));    // bad
ki18n("Percent complete: %1").subs(percent, 0, 'f', 1).toString(); // good

HTML and KUIT Markup

Every i18n message in KDE4 is effectively XML markup. HTML tags come from Qt's rich text, and can be used only in rich-text capable widgets; KUIT tags are new KDE4 semantic markup, which should be preferred to HTML, and can be used in any i18n message (plain or rich-text output is decided on the basis of context marker).

"malformed markup (unmatched tags, etc.)"
Since every message is XML, all tags must be properly closed. Opening <p> must not miss closing </p>, etc. This also holds for breaking HTML tags like <br> and <hr>, which must be closed in place: <br/>, <hr/>.
Verbatim greater-than sign means opening of a tag, which is not always meant. This can be avoided by use of predefined XML &lt; entity (other predefined entities are &gt;, &amp;, &apos;, and &quot;), but for a frequent case of marking generic or user-replaceable text, it is better to use <placeholder> KUIT tag:
i18n("headers go into <includes>");       // ***error in XML markup
i18n("headers go into &lt;includes&gt;"); // no markup problem, but...
i18n("headers go into <placeholder>includes</placeholder>"); // better
Given it's frequency, it is not needed to use &amp; for shortcut markers; there is some heuristics around XML parsing to allow presence of naked &. Basically, &amp; is needed only in the rare case of &no_whitespace_sequence; pattern which is not meant as an XML entity.
"unclosed <br> ... use proper paragraphs <p>...</p> instead"
This just signals a particularly frequent markup problem, that of <br> not closed in place, like <br/>. Also, sometimes <br> is used to split logical paragraphs (especially when doubled, <br><br>), where proper paragraph tags should be used instead.
"tag is neither KUIT nor HTML tag"
Some of the tags in the message is simply unknown, and will not be understood at runtime. If it is not a typo, but the message is really speaking about tags to the user (e.g. in a HTML editing application), use &lt;foo&gt; pattern.
"HTML tag tag is not advised with KUIT markup"
When Krazy encounters the KUIT context marker in a message, it assumes that message is semantically tagged, and disapproves of some HTML tags which are best replaced with semantic equivalents. For example:
i18n("@info:whatsthis",
     "...this <i>cannot</i> be undone.");               // Krazy complains
i18n("@info:whatsthis",
     "...this <emphasis>cannot</emphasis> be undone."); // fine

i18n("@info",
     "Really delete <b>%1</b>?", filename);               // complains
i18n("@info",
     "Really delete <filename>%1</filename>?", filename); // fine
"tag tag1 cannot be subtag of tag2"
"tag tag has no att attribute"
"tag tag cannot have text content"
These are validity checks for KUIT markup, a bit relaxed form of formal XML validation. The rules of which KUIT tag can contain which, and so on, are given with tag descriptions.

KUIT Context Markers

Semantic context markers give great deal of information to translators of where and how the message is used at runtime. Providing them for any future messages is strongly encouraged, and even equipping existing messages when not in message freeze is welcomed. Krazy helps with the following checks.

"missing KUIT context marker"
If Krazy detects that the source file has a number of messages equipped with KUIT context markers, above some threshold, it assumes that developer's intention was to have all messages marked (as is recommended), and issues this warning for any non-marked message.
"invalid semantic role role"
"invalid interface subcue cue to role role"
"invalid visual format fmt"
Krazy checks that all ingredients of the context marker are defined, and that a particular combination of them is valid.
"expected context marker ctxmark1, got ctxmark2"
For messages used in some boiler-plate situations, canonical context markers have been established. Krazy warns if the context marker to such a message, although valid by itself, was not the expected one. For example, such messages are encountered in setting up the KAboutData information.

UI files

When Krazy complains about ambiguity in i18n translation in an UI file. You can either add a comment="" attribute using QtDesigner or using a text editor.

Contact

For any questions or suggestions, Krazy i18n checks are presently maintained by Chusslove Illich <[email protected]>.