Jump to content

Development/Tutorials/Localization/i18n Build Systems

From KDE TechBase
Revision as of 20:51, 29 January 2015 by Albert Astals Cid (talk | contribs) (We don't have svn2dist around anymore)
Building KDE's l10n Module
Tutorial Series   Localization
Previous   Writing Applications With Localization in Mind
What's Next   Dealing with language changes
Further Reading   n/a

Abstract

Now that your application is ready to be localized, we next look at how to incorporate the necessary mechanisms into the CMake build system of your application.

First we'll explain the "theory" of the steps that need to happen from extracting message strings to installing the generated .po files. After that we'll look at how to implement those steps (#Theory: The xgettext toolchain). If your application is developed in KDE's repositories, many of those steps will happen automatically (see #Handling i18n in KDE's repositories). Else you will want to read on in the #Handling i18n outside KDE's repositories section.

Theory: The xgettext toolchain

Making translations work consists of the following steps:

  1. Extract the translatable strings
  2. Merge the new or changed strings with existing translations
  3. Compile the translations into message catalogs
  4. Install the message catalogs
  5. Use the message catalogs in the application

Extracting the strings

In this step, all the strings you marked as i18n()/ki18n()/etc. in your sources need to be collected into a translation template (.pot) file. Some translateable strings are also contained in .ui, .rc, or .kcfg files. Also tips-of-the-day need to be collected into the .pot file.

These steps are handled by xgettext, extractrc, and preparetips programs, respectively. In some cases you may also need extractattr.

Merging translations

Generally, only a few translatable string will change at a time, in an application. Some will be removed, some will be added, some will be changed into different strings, and some will be moved around in the source files. These changes need to be reflected in the translations, but of course it would be a huge effort to redo the entire translation every time a string was changed. Rather the changes should be merged with the existing translations. This is the job of the msgmerge tool.

Compiling the translations

In order to make message lookup fast, the .po files need to be compiled into so-called "message catalogs" (.mo / .gmo). This is done using the msgfmt tool.

Installing the message catalogs

The compiled message catalogs need to be installed alongside the application. In KDE, the standard location for message catalogs is $KDEDIR/share/locale/xx/LC_MESSAGES/.

Using the message catalogs

Finally, when the application is run, it needs to load the appropriate message catalog in order to be able to look up and show translated strings. In KDE applications, this is the job of the KLocale class, and in the great majority of cases happens automatically.

For some special cases, such as plugins, look at #Runtime Loading Of Catalogs.

Handling i18n in KDE's repositories

If your application is developed inside KDE's subversion repository, most of the steps outlined above are automated. In this case, generally, all you will need to do is provide a simple script called Messages.sh, which we will look at below.

Of course, for the curious, a more detailed account of what happens behind the scenes is also provided.

Writing a Messages.sh script

Basically, the only thing that is necessary to prepare and install translations for applications in KDE's subversion repository, is to provide information, which sources, ui-files or tips need to be translated. For this purpose, you write a small script called Messages.sh and place it in your sources. Here is an example with inline comments:

#!bin/sh

# invoke the extractrc script on all .ui, .rc, and .kcfg files in the sources
# the results are stored in a pseudo .cpp file to be picked up by xgettext.
$EXTRACTRC `find . -name \*.rc -o -name \*.ui -o -name \*.kcfg` >> rc.cpp
# if your application contains tips-of-the-day, call preparetips as well.
$PREPARETIPS > tips.cpp
# call xgettext on all source files. If your sources have other filename
# extensions besides .cc, .cpp, and .h, just add them in the find call.
$XGETTEXT `find . -name \*.cc -o -name \*.cpp -o -name \*.h -name \*.qml` -o $podir/APPNAME.pot

As you can see, this script contains only three actual lines of code, and not all may even be needed. The $XGETTEXT, $PREPARETIPS, $EXTRACTRC, and $podir environment variables are predefined, you do not need to worry about setting these. The only thing that you will need to do is to replace "APPNAME" with the name of your application (but see #Naming .pot Files for exceptions).

Translatable strings are automatically extracted from C++ files. For non-C++ files, the following scripts are used

  • $EXTRACTRC - Extract translatable strings from xml configuration and ui files.
  • $EXTRACT_GRANTLEE_TEMPLATE_STRINGS - Extract translatable strings from Grantlee template files.

Try to make sure your Messages.sh script collects only those messages that are really needed. If in doubt, look at other Messages.sh files in the KDE subversion repository for inspiration, or -- of course -- ask.

What's happening behind the scenes

Tip
If your application is in the KDE code repository, all this happens automatically. If your code is not in the KDE repository, you must perform these steps yourself. See #handling i18n in third party applications.


Periodically, the script extract-messages.sh (a.k.a. scripty) runs on the KDE server. This program basically calls all Messages.sh scripts in the repository with the appropriate parameters. The extracted messages are stored in template (.pot) files in the templates folder of the l10n module. See What Is Scripty for more information.

The KDE translation teams translate the messages and commit them into a messages folder corresponding to the language, which is further broken down by module. For example, German translated messages for konqueror are committed to l10n/de/messages/kdebase/konqueror.po.

When the l10n module is built, the .po files are compiled into a binary format for fast lookup and installed as .mo files to $KDEDIR/share/locale/xx/LC_MESSAGES/, where xx is the two-letter ISO 639 code for the language. These are called the message catalogs.

At runtime, the i18n(...) function, using the original string you coded, looks up the string in the message catalog of the user's desktop language and returns the translated string. If the message catalog is missing, or the specific string is not found, i18n(...) falls back to the original string in your code.

.desktop files in your project are handled separately. makemessages extracts strings, such as Name and Comment from the .desktop files and places them into a file named desktop_mmmm.pot, where mmmm is the module name, in the templates folder. Once translators have translated this file, makemessages inserts the translated strings back into the .desktop files. The list of strings extracted is in l10n/scripts/apply.cc. Here's the code that checks for them:

if (checkTag("Name", in, argc, argv, newFile))
    continue;
if (checkTag("Comment", in, argc, argv, newFile))
    continue;
if (checkTag("Language", in, argc, argv, newFile))
    continue;
if (checkTag("Keywords", in, argc, argv, newFile))
    continue;
if (checkTag("About", in, argc, argv, newFile))
    continue;
if (checkTag("Description", in, argc, argv, newFile))
    continue;
if (checkTag("GenericName", in, argc, argv, newFile))
    continue;
if (checkTag("Query", in, argc, argv, newFile))
    continue;
if (checkTag("ExtraNames", in, argc, argv, newFile))
    continue;

Testing your Messages.sh script

To test your Messages.sh script, you need a checkout of the l10n scripts. To get one:

   svn checkout svn://anonsvn.kde.org/home/kde/trunk/l10n-kde4/scripts

You can then go to your project dir and run:

   mkdir po
   PATH=/path/to/l10n-kde4/scripts:$PATH bash /path/to/l10n-kde4/scripts/extract-messages.sh

When it is done, the po/ dir should contain the generated .pot files.

Handling i18n outside KDE's repositories

If your application is developed outside of KDE's repositories, you need to take care of generating and installing message catalogs yourself. This is not too hard to do, but you should make sure you have a basic understanding of all steps involved, so you will probably want to read #Theory: The xgettext toolchain, first.

Also, there is more than one way to deal with i18n in your application. We will outline one approach to do so for simple applications, but you may want to diverge from this, if you like to.

General considerations

The i18n generation process can roughly be divided into two steps:

  1. Extracting and merging messages
  2. Compiling and installing message catalogs

The first step really concerns the sources, while the second step is a natural part of compiling and installing an application. Hence, while the second step should definitely be incorporated into the build system for your application (we assume CMake, here), there is no striking reason for the first step to be handled by the build system.

In fact, the extraction and merging of messages does not map well onto the CMake concept of out-of-source builds, and CMake can't provide too much help for this step, either.

Hence, in this tutorial, we will handle the first step with a standalone shell script, and only the second step will be handled by CMake.

Extracting and merging messages

If you have read the section #handling i18n in KDE's subversion repository you will know that in the KDE repository, message extraction and merging is handled by a simple script called Messages.sh, which is invoked by a script called extract-messages.sh. Outside of KDE's repository, you will need to take care of both parts, but fortunately, this is easy enough. Here's a sample script:

#!/bin/sh
BASEDIR="../rkward/"	# root of translatable sources
PROJECT="rkward"	# project name
BUGADDR="http://sourceforge.net/tracker/?group_id=50231&atid=459007"	# MSGID-Bugs
WDIR=`pwd`		# working dir


echo "Preparing rc files"
cd ${BASEDIR}
# we use simple sorting to make sure the lines do not jump around too much from system to system
find . -name '*.rc' -o -name '*.ui' -o -name '*.kcfg' | sort > ${WDIR}/rcfiles.list
xargs --arg-file=${WDIR}/rcfiles.list extractrc > ${WDIR}/rc.cpp
# additional string for KAboutData
echo 'i18nc("NAME OF TRANSLATORS","Your names");' >> ${WDIR}/rc.cpp
echo 'i18nc("EMAIL OF TRANSLATORS","Your emails");' >> ${WDIR}/rc.cpp
cd ${WDIR}
echo "Done preparing rc files"


echo "Extracting messages"
cd ${BASEDIR}
# see above on sorting
find . -name '*.cpp' -o -name '*.h' -o -name '*.c' | sort > ${WDIR}/infiles.list
echo "rc.cpp" >> ${WDIR}/infiles.list
cd ${WDIR}
xgettext --from-code=UTF-8 -C -kde -ci18n -ki18n:1 -ki18nc:1c,2 -ki18np:1,2 -ki18ncp:1c,2,3 -ktr2i18n:1 \
	-kI18N_NOOP:1 -kI18N_NOOP2:1c,2 -kaliasLocale -kki18n:1 -kki18nc:1c,2 -kki18np:1,2 -kki18ncp:1c,2,3 \
	--msgid-bugs-address="${BUGADDR}" \
	--files-from=infiles.list -D ${BASEDIR} -D ${WDIR} -o ${PROJECT}.pot || { echo "error while calling xgettext. aborting."; exit 1; }
echo "Done extracting messages"


echo "Merging translations"
catalogs=`find . -name '*.po'`
for cat in $catalogs; do
  echo $cat
  msgmerge -o $cat.new $cat ${PROJECT}.pot
  mv $cat.new $cat
done
echo "Done merging translations"


echo "Cleaning up"
cd ${WDIR}
rm rcfiles.list
rm infiles.list
rm rc.cpp
echo "Done"

Of course you will want to adjust the variable definitions at the top, and -- if needed -- add code to extract tips-of-the-day or other additional strings.

The example script assumes that all .po files and the .pot file are kept in a single directory, which is appropriate for most projects.

Compiling and installing message catalogs

Assuming you use the script from the previous section, you can place the following CMakeLists.txt in the directory containing the .po files:

FIND_PROGRAM(GETTEXT_MSGFMT_EXECUTABLE msgfmt)

IF(NOT GETTEXT_MSGFMT_EXECUTABLE)
	MESSAGE(
"------
                 NOTE: msgfmt not found. Translations will *not* be installed
------")
ELSE(NOT GETTEXT_MSGFMT_EXECUTABLE)

        SET(catalogname rkward)

        FILE(GLOB PO_FILES *.po)
        SET(GMO_FILES)

        FOREACH(_poFile ${PO_FILES})
                GET_FILENAME_COMPONENT(_poFileName ${_poFile} NAME)
                STRING(REGEX REPLACE "^${catalogname}_?" "" _langCode ${_poFileName} )
                STRING(REGEX REPLACE "\\.po$" "" _langCode ${_langCode} )

                IF( _langCode )
                        GET_FILENAME_COMPONENT(_lang ${_poFile} NAME_WE)
                        SET(_gmoFile ${CMAKE_CURRENT_BINARY_DIR}/${_lang}.gmo)

                        ADD_CUSTOM_COMMAND(TARGET ${_gmoFile}
                                COMMAND ${GETTEXT_MSGFMT_EXECUTABLE} --check -o ${_gmoFile} ${_poFile}
                                DEPENDS ${_poFile})
                        INSTALL(FILES ${_gmoFile} DESTINATION ${LOCALE_INSTALL_DIR}/${_langCode}/LC_MESSAGES/ RENAME ${catalogname}.mo)
                        LIST(APPEND GMO_FILES ${_gmoFile})
                ENDIF( _langCode )

        ENDFOREACH(_poFile ${PO_FILES})

        ADD_CUSTOM_TARGET(translations ALL DEPENDS ${GMO_FILES})

ENDIF(NOT GETTEXT_MSGFMT_EXECUTABLE)

This iterates over all .po files in the directory, compiles them using msgfmt, and installs them to the standard location. You will want to change "catalogname" to the name of your application.

Getting translations

Now you just have to find translators to translate the extracted messages into the various languages. As your application gets used by more people, you will find that translators will volunteer to do this. Translated PO files then have to be stored in the po folder with the naming scheme <languagecode>.po.

Translating .desktop Files

Translating .desktop files is a bit tricky because the translation of .desktop files isn't fetched from .po/.gmo files, but must be part of the .desktop file itself. So we have 2 tasks:

  • extract messages from the .desktop file
  • merge the translations in the .desktop file

Our friends from Gnome provide the intltool package. While this tools is mainly written for the old autotools build system (but KDE uses CMake since KDE 4), we can use this package nevertheless for our purpose. Here is a step-by-step guide:

  1. Install the intltool package
  2. We assume that your .desktop file is named "MYPROGRAM.desktop".
  3. Make a copy of "MYPROGRAM.desktop" which is named "MYPROGRAM.desktop.template".
  4. In MYPROGRAM.desktop.template substitute the keys "Name" by "_Name", "GenericName" by "_GenericName" and "Comment" by "_Comment".
  5. Remove all translations from "MYPROGRAM.desktop.template".
  6. Add the following lines to the above described build script, at the end of the section "Preparing rc files":
intltool-extract --quiet --type=gettext/ini PATH/MYPROGRAM.desktop.template
cat PATH/MYPROGRAM.desktop.template.h >> ${WDIR}/rc.cpp
rm PATH/MYPROGRAM.desktop.template.h
cd ${WDIR}

intltool-extract will create a dummy C++ header (PATH/MYPROGRAM.desktop.template.h) with the strings whose key starts with "_". We append the content of this header file into our "rc.cpp" and remove remove the file than. (We assume that "PATH" is the path to "MYPROGRAM.desktop.template", relative to the working directory of the script.)

  1. Add "-kN_:1" as additional parameter to the xgettext call in the section "Extracting messages". This is necessary because dummy code that intltool-extract produces uses "N_()" instead of "i18n()".
  2. Add the following lines to the above described build script, at the end of the section "Merging translations":
cd ${WDIR}
intltool-merge --quiet --desktop-style ${WDIR} PATH/MYPROGRAM.desktop.template PATH2/MYPROGRAM.desktop

This will produce MYPROGRAM.desktop from MYPROGRAM.desktop.template, inserting all available translations in the .po files in the directory ${WDIR}. (We assume that "PATH" is the path to "MYPROGRAM.desktop.template", relative to the working directory of the script. We assume that "PATH2" is the path to the final location of "MYPROGRAM.desktop", relative to the working directory of the script.)

Note
Names of .po files):For intltool, it is required that the .po files have the name <languagecode>.po without other characters (NOT programname_<languagecode>.po). This is important because intltool-merge simply uses the .po file name (without ".po") as language key in the .desktop file.


Special cases (plugins, multiple catalogs, etc.)

Runtime Loading Of Catalogs

To have translations show up properly in an application, the name of the .pot file must match the name of the message catalog (.mo) file that your application will read at runtime. But what name will be used at runtime?

In general, it will be the value returned by KGlobal::instance()->instanceName(). But what will that be? The general rules for standalone applications are:

  • the message catalog will default to the name of the application passed as the first argument to KAboutData
  • if your code calls KLocale::setMainCatalog(), that name will be used instead
  • if your code calls KLocale::insertCatalog(const QString&), the specified catalog will also be searched in addition to the main catalog

If your code does not call either of the KLocale methods mentioned above and your application is a plugin, it gets a little more complicated. If your code exports your plugin using the K_EXPORT_COMPONENT_FACTORY macro, like this:

K_EXPORT_COMPONENT_FACTORY( libkhtmlkttsdplugin, KGenericFactory<KHTMLPluginKTTSD>("khtmlkttsd") )

then the catalog name will be the name passed in the KGenericFactory constructor - in the example above that woudl be khtmlkttsd. This is because the macro creates a Kinstance for you with the specified name.

However, some classes, such as KTextEditor create a KPart containing your component and the name passed in the macro is not used. In this case, you should call KLocale::insertCatalog(const QString&) in the component's constructor.

If in doubt, the safest thing to do is to call KLocale::insertCatalog(const QString&).

Tip
If you are not using the export macro for your plugin, you probably should be. See the API documentation of KGenericFactory for more information.


Calling KLocale::insertCatalog(const QString&) if the catalog has already been inserted does nothing. However, calling KLocale::removeCatalog(const QString&) removes the catalog no matter how many times it has been inserted. Therefore, take care that you do not inadvertently remove the catalog when multiple components are sharing a single catalog. For example, the following sequence will break translations:

// Component A loads and uses catalog xyz by calling
insertCatalogue("xyz");
// Component B loads and also uses the same catalog.
insertCatalogue("xyz");
// component B unloads and calls
removeCatalogue("xyz")
// Component A now doesn't translate properly.

Notice that a sequence such as above will occur automatically if both components A and B are loaded using the K_EXPORT_COMPONENT_FACTORY macro and pass the same name argument to the KGenericFactory constructor.

Naming .pot Files

The name that you settle on for both the .pot file and catalog should be governed by where your code resides. If it is a component of another application and resides in that application's source tree, you'll want to share the main catalog of the application.

If is a component of another application but resides elsewhere in the code repository, you will probably have to create a separate .pot, but keep in mind that .pot filenames must be unique across all of KDE.

If it is a component of another application, but resides in the source tree of a second application, you should share with the second application. For example, the KDE text-to-speach daemon (KTTSD) includes a plugin for embedded Kate in kdeaccessibility/kttsd source tree. Since the plugin is not installed unless kttsd is installed, it shares its catalog with kttsd and calls KLocale::insertCatalog("kttsd") to do this.

Declarative plasmoids

For plasmoids written in pure qml, the .pot file must be named the same as its plugin name (with plasma_applet_ prefix), specified in its desktop file as X-KDE-PluginInfo-Name. For instance if we have X-KDE-PluginInfo-Name=org.kde.active.activityscreen the .pot file will be called plasma_applet_org.kde.active.activityscreen.pot. The KLocale::insertCatalog() call is performed by the declarative scriptengine itself, so the developer of the qml plasmoid doesn't have to worry about that.

If the qml files are developed in the form of package instead of an autonomous plasmoid (for instance a package loaded by a C++ plasmoid) the .pot file will have the prefix plasma_package_ instead.

Akonadi Agents

Akonadi agents get the catalog loaded based on the name of the binary, so your .pot file should be named the same way your binary adding .pot at the end You can check that in AgentBase::parseArguments in kdepimlibs

Qt5-only: Code using Qt translation system

Some Qt5-based applications and libraries use the Qt translation system instead of KI18n. Examples of such code are:

  • Tier 1 KF5 frameworks because they cannot depend on KI18n.
  • Applications which provide a Qt-only version.

To make translators aware of this distinction, your .pot file should follow these naming schemes:

  • *.po: code using gettext with KDE translation extensions. .po files are compiled into .mo files (most common case).
  • *_qt.po: code using Qt translation system. The .po files are compiled into .qm files (some KF5 frameworks, applications providing a Qt-only version).
  • *_qt-mo.po: code using pure gettext. The .po files are compiled into .mo files (legacy).

Handbooks

Handbooks, which are written in docbook format, are handled different from applications. The translated Handbooks are committed into the l10n module in the docs folder under each language. In addition, the docs folder is broken down by module and application. For example, the German Kate Handbook is committed to the l10n-kde4/de/docs/kde-baseapps/kate/ folder. When compiled, the German Kate Handbook is installed to $KDEDIR/share/doc/HTML/de/kate/index.cache.bz2.

Note that it is up to each translation team to generate the translated Handbook when they feel it is complete.