Development/Tutorials/Localization/i18n Mistakes

    From KDE TechBase
    Avoiding Common Localization Pitfalls
    Tutorial Series   Localization
    Previous   Writing Applications With Localization in Mind
    What's Next   Incorporating i18n Into the Build System
    Further Reading   n/a

    Abstract

    There are a few common pitfalls that prevent applications from being properly translated or otherwise localized. These include using pixel based layouts, "word puzzles" and writing code that does not deal with Unicode characters properly. This tutorial covers each of these issues, explaining what to avoid and how to do it properly.


    Pitfall #1: Pixel Based Layouts

    English text is often very compact compared to other languages where the translated text is often substantially longer. Therefore the interface must be able to adjust the size to accommodate the length of translations provided at runtime. If it can't do this, then messages will end up misaligned and truncated.

    The answer is to use layout managers. Qt provides a number of such layout managers pre-made for you. They include QHBoxLayout, QVBoxLayout, QGridLayout and QStackedLayout, all of which are subclasses of QLayout. You may also create your own QLayout based classes, but this is generally not needed.

    These layout classes manage the pixel positioning of widgets for you at runtime, so no matter what the size of the translated strings your interface will adjust properly. For more information look at the documentation for QLayout.

    Pitfall #2: Word Puzzles

    Another thing to be aware of is to not concatenate pieces of sentences together like this:

    QString msg=i18n("Do you want to replace ") + 
                     oldFile+i18n(" with ") + 
                     newFile + "?"
    

    Such "word puzzles" are very hard or even impossible to translate. This is because the structure of the sentence will often be completely different in another language and thus must be controlled by the translator. When the order of words and phrases is hard-coded as in the above example, the translator can not create a proper translation.

    Adding to this problem, a translator will only see parts of the sentence while translating and will have to guess at what belongs together.

    The solution thankfully is quite simple: use %number placeholder substitution, which lets the translators not only make good translations because they can see the entirety of the sentence during translation, but which also lets them change the order of the arguments freely. The arguments themselves are passed as extra parameters to i18n().

    The above example written properly would then look like this:

    QString msg = i18n("Do you want to replace %1 with %2?",
                       oldFile, newFile)
    

    It is a good idea to always explain what each %number placeholder means because in some languages the translation depends on what it contains. For example, prepositions might have to be replaced with a specific form of a noun that describes the placeholder's content. Thus it is important to know that %1 refers to a file and not to a folder or something else.

    Thus it would be even better, if you wrote the example more explictly:

    QString msg = i18n("Do you want to replace file %1 with file %2?",
                       oldFile, newFile)
    

    Or like this:

    QString msg = i18nc("%1 and %2 are file names", "Do you want to replace %1 with %2?",
                       oldFile, newFile)
    

    It is also possible to use a comment above the message, but this isn't much used in KDE. Like this:

    /* TRANSLATORS: Replacing an old file (%1) with a new one (%2). */
    QString msg = i18n("Do you want to replace %1 with %2?",
                       oldFile, newFile)
    
    Note
    Avoid inserting anything other than numbers or nouns with this method, since in some languages the translation depends on the inserted words. It is, therefore, best to create strings that are as complete sentences as possible.


    A related mistake is not including markup tags in rich text, such as <b></b> or <i></i>, in the translatable string. Not all languages use such markup in an identical fashion to English and so it is necessary for the translator to be able to "translate" the markup accordingly as well.

    Similarly, messages that contain a version string or other often changing parts should be inserted by placeholders into the message. This prevents unnecessary changes that cause the translators to have to change the translated messages as well.

    Since KDE is translated into more than 65 languages a single string change causes at least 65 people to open the file, find the changed message, look carefully if this is the only thing that has changed, change the translation, save the file again and commit the changed file into the code repository. All in all such a small change might create hours of work which could be easily avoided.

    Pitfall #3: Lack of Unicode Support

    Whenever there is source code that handles strings using a datatype (such as char) or class (such as std::string) that can not handle Unicode, translations will break.

    To avoid this, never call QString::latin1() or QString::ascii() on translated strings. This also applies to information resulting from user input such as passwords, URLs and filenames. If you really need a plain char* representation of a string, it is better to use QString::utf8().

    Note
    For more information on character sets and Unicode, see the Unicode tutorial.


    KIO slaves may also provide paths and file names encoded using UTF-8. It is up to the programmer, however, to take care of passing properly encoded filenames to any KIO method in question. The correct way to do this is not to guess at user's filesystem encoding but to use QFile::encodeName() and QFile::decodeName() instead.

    Tip
    You can turn KIO's UTF-8 file name support on for testing by exporting the KDE_UTF8_FILENAMES environment variable in your shell's startup file (e.g. ~/.bashrc).


    Pitfall #4: Complex Text Flow

    When designing an application that needs non-standard text flow, don't assume that the same rules apply to all languages. Given vertical writing, as an example, East-Asian languages using Chinese characters have a long history of vertical writing, even longer than horizontal. Strings are not rotated by 90 degrees but instead, single characters are placed under one another. There might be just a different behavior with different scripts. Expect the need to implement specialized versions.

    Success!

    If you avoid the four common categories of pitfalls detailed in this tutorial, your application should be fully localizable by the various KDE translation teams around the world and open up your application to the majority of people on the planet.