Development/Tutorials/Localization/i18n Semantics

From KDE TechBase
Warning
This section needs improvements: Please help us to

cleanup confusing sections and fix sections which contain a todo


This article is describing a system currently in RFC phase, pending design modifications. Skip it unless you would like to make comments and proposals yourself.

Abstract

Typical way of formatting user visible strings in application interfaces, for a long time has been that of plain text or at most visual markup like HTML tags. In most textual content environments, shift to semantic markup has been recognized as superior to visual (for example, Docbook XML for documentation). Why not go down the same road for UI strings?

Semantic Markup by Examples

In semantic model, text elements are marked for their meaning, rather than for their visual appearance. Consider a few i18n examples of visual formatting: i18n("Move");

i18n("Descending");

i18n("<qt>%1 does not exist</qt>", fname);

i18n("

History Sidebar

You can configure the history sidebar here.");

Using KDE UI text semantic markup (KUIT for short), these strings would be formated like this: i18nc("@action", "Move");

i18nc("@item", "Descending");

i18nc("@info", "<filename>%1</filename> does not exist", fname);

i18nc("@info",

     "<title>History Sidebar</title>"
     "<para>You can configure the history sidebar here.</para>");

Two distinct differences between visual and KUIT markup can be observed.

The first is the use of context i18n calls, the i18nc(), to convey the broad semantic context of a string. "Move" has been assigned the @action context, which means that clicking on its widget in the interface will cause something to happen (e.g. a button text). String "Descending" is in @item context, which indicates that it is one of several possible choices (e.g. item in drop-down list).

The other difference is the use of semantic tags, which convey the meaning of the element. For example, the <filename>%1</filename> bit tells that the substituted text is the name of a file, and the <title> and <para> tags in the other message lay out a clear structure of longer informational texts.

Note
The semantic context @-markers can be added when working with Qt Designer too. Each text label of a widget has a comment attribute, which can be used in same manner as context argument of i18nc() call.


The context markers do not prevent programmers from providing the free-form context to translators. It is just separated by a whitespace from context marker, like this: i18nc("@item Sorting order", "Descending");

Advantages of Semantic Markup

In general, KUIT markup has advantages both to users and to translators of the application.

For the users, the use of semantic tags means consistent formatting of same kinds of text. The notorious example of inconsistent visual formatting would be filenames and paths, which are sometimes put in as is, sometimes in quotes (and ordinary quotes at that, rather than proper English fancy quotes), and sometimes in bold tags. Furthermore, the text withing the tag may be modified with semantic markup; for example, the <filename> text is transformed from "/" path delimiters to platform specific ones.

Translators will benefit from both semantic context markers and tags. The "Move" string in the example above had @action context, for which the translator may use command form of the verb, while gerund form (like "Moving") may be more appropriate in the @title context, which would be used if the string was title of the menu, option group, etc. Tags will benefit translators in the similar way, as they may clear up the structure of the sentence, especially in presence of placeholder substitutions.

The context markers also serve a technical purpose. They decide whether what form of visual formatting is used. E.g. the @title context will use plain text, whereas @info will be formatted with HTML tags.

None the least, while contexts and tags may be something more to learn for programmers, it removes the burden of thinking about the visual formatting to apply. "Should I put the path in quotes or <b>?", "Should the title be <h2> or <h3>?", and so on.

Context Markers

KUIT presently defines the following semantic contexts:

@action
Text to all clickable widgets that cause some action to be performed, like an operation on the data, view restructuring, or opening a dialog. The button texts and menu items (except submenus) all fall into this category.
@title
Text that is semantically a title in the interface. These would include window titles, menu titles, option group names and tab names.
@option
Text to yes/no or on/off logical choices. These are the labels to GUI checkboxes.
@item
Strings that can be considered one from a list of possible choices. Items in dropdown and combo boxes are obvious, but also some menu items (like encoding selection, or sort orderings), and especially radio-buttons are included here. (Radio buttons are just another way to convey a list of possibilities.)
@label
Texts to other widgets in the interface, which are none of @action, @title, @option or @item. These include labels to sliders, counters, font and color choosers, combo, edit and text boxes.
@info
Any general body of text for user's information. These are texts to message boxes, tooltips and whatsthis entries.
@process
Similar to @info, but more specific in that the strings are messages output by an ongoing process. For example, "Copying files..." in file-copy progress dialog, or "Computing checksum..." in CD burning application.

Semantic Tags

Terminal tags

The following KUIT tags are mostly-terminal, meaning that they will not admit any subtags (or just a few selected, where indicated):

<application>
Name of an application.

i18nc("@action", "Open with <application>%1</application>", appName);

<bcode>
Line-braking body of code, for short listings.

i18nc("@info",

     "You can try the following snippet:<bcode>"
     "\begin{equation}"
     "  C_{D_i} = \frac{C_z^2}{e \pi \lambda}"
     "\end{equation}"
     "</bcode>");

<command>
Name of shell command or system call. Man section can be provided via section attribute.

i18nc("@info", "This will call <command>%1</command> internally.", cmdName);

i18nc("@info",

     "Consult man entry for <command section='1'>%1</command>", cmdName);

<email>
Email addres. Without attributes, the tag text is the address. Address can also be given with address attribute, in which case the tag text is name or description attached to the address.

i18nc("@info", "Send bug reports to <email>%1</email>.", emailNull);

i18nc("@info",

     "Send praises to <email address='%1'>the author</email>.", emailMy);

The construct will be hyperlinked in rich text format.

<emphasis>
Emphasize a word or phrase in the text.

i18nc("@process", "Checking <emphasis>feedback</emphasis> circuits...");

<envar>
Environment variable. The $ sign will be prepended automatically in formatted text.

i18nc("@info", "Assure that your <envar>PATH</envar> is properly set.");

<filename>
File or folder name or path. The path separators will be transformed into what is native to the platform.

i18nc("@info", "Cannot read <filename>%1</filename>.", filename);

i18nc("@info",

     "<filename><envar>HOME</envar>/.foorc</filename> does not exist.");

The <envar> can be used as subtag.

<icode>
Inline code, like shell command lines.

i18nc("@info", "Executes <icode>svn merge</icode> with given revisions.");

<interface>
Path to GUI interface element. Use "/", "|" or "->" to delimit elements, which will be converted into canonical form.

i18nc("@info",

     "The line colors can be changed under "
     "<interface>Settings->Visuals</interface>.");

<link>
Link to a URL-addressable resource. Without attributes, the tag text is the URL; alternatively, URL can be given by url attribute, and then the tag text serves as description.

i18nc("@info", "Check the <link>%1</link> website.", urlKDE);

i18nc("@info", "Check <link url='%1'>the KDE website</link>.", urlKDE); The variant with URL/description separation is preferred when applicable. The construct will be hyperlinked in rich text format.

<message>
An external message to be reported to the user.

i18nc("@info", "Fortune cookie says: <message>%1</message>", trouble);

<numid>
By default, numbers supplied as arguments to i18n calls are formatted into localized form. If the number is supposed to be a numeric identifier instead, like a port number, use this tag to signal numeric-id context.

i18nc("@process", "Connecting to <numid>%1</numid>...", portNo);

<placeholder>
A placeholder text, either something to be replaced by the user, or a generic item in a list.

i18nc("@info", "Replace <placeholder>name</placeholder> with your name."); i18nc("@item", "<placeholder>All images</placeholder>");

<resource>
General named resource. Names of documents, sessions, projects, toolbars, plugins, schemes and themes, accounts, etc.

i18nc("@info", "Apply color scheme <resource>%1</resource>?", colScheme);

<shortcut>
Combination of key to press. Separate the keys by "+" or "-", and the shortcut will be converted into canonical form.

i18nc("@info",

     "Cycle through layouts by <shortcut>Alt+Space</shortcut>.");

Sentence tags

Some sentences can be given a special meaning, by using the sentence tags. These tags will admit any terminal tags for subtags.

<note>
The sentence is a side note of significance to the topic.

i18nc("@info",

     "Probably the best known of all duck species is the Mallard. "
     "It breeds throughout the temperate areas around the world. "
     "<note>Most domestic ducks are derived from Mallard.</note>");

Do not explicitly add "Note:", it will be formatted automatically.

<warning>
The sentence is a warning.

i18nc("@info",

     "Really delete this key?"
     "<warning>This cannot be undone.</warning>");

Do not explicitly add "Warning:", it will be formatted automatically.

Structuring tags

For structuring longer texts, the following tags are available:

<para>
Text paragraph.
<title>
The title of the text. Must be the first tag if present, but can be omitted.
<subtitle>
Subtitle in the text. Must be followed by at least one <para>.
<list>
List of items. Can contain only <item> as subtags. List is considered an element of the paragraph, so the <list> must be found inside <para>.
<item>
List item.

Structuring tags (other than <list>) can contain any terminal or sentence tags.

If any of the structuring tags is present, then there must be no text outside of these elements. The following is not valid KUIT markup: // invalid markup i18nc("@info",

     "<title>History Sidebar</title>"
     "You can configure the history sidebar here."); // <para> missing

Qt's rich text HTML tags can be used concurrently with KUIT tags, but this is not advised unless necessary. They may be needed, for example, to create tables or insert images, as KUIT does not implement this functionality at the moment.

Limitations to Use of Semantic Markup

Semantic markup cannot be used in "dumb" strings, which do not pass through KDE's i18n subsystem. These would be, for example, strings in .desktop format files. But not the strings in UI files, as in Qt Designer they can be equipped with both context markers (via comment field to text properties) and semantic tags.

Sometimes, the visual formatting may not be quite appropriate for the output device. For example, if the @info context is applied to string which is to be output to the terminal window, it will come out with HTML tags. To handle this, formatting can be explicitly signaled by /format modifier to context marker:

i18nc("@info/plain", "<filename>%1</filename> does not exist", fname);

Presently, the possible format modifiers are /plain and /rich.