Wednesday May 28, 2008

Internationalisation perl gotcha

One of the issues to look out for when applying gettext to perl programs is variable substitution. For example:

  print "My name is $name\\n"

Should not be internationalised as:

  print gettext("My name is $name\\n");

Because the variable substitution for $name occurs before the call to gettext - hence gettext would get the text including the value of the variable $name, and would require a separate translation for every value of $name.

There are number of alternatives to this, printf or string concatenation. These work for many situations, but can present a problem when there are multiple variables being substituted and the order of the variables is different in different locales.

Tuesday May 27, 2008

Internationalising perl with gettext

The previous blog entry showed how to do internationalisation using an example in C. Unsurprisingly very similar code can be used in perl.

#!/bin/perl -w

use POSIX;
use Sun::Solaris::Utils qw(textdomain gettext bindtextdomain);


printf gettext("test\\n");

The perl code uses the same text domain, and same binding to end up using the same message catalogue as the C example. So again this code will output "notest" rather than "test" when run.

One thing that's caught me a couple of times is the use of the setting LC_MESSAGES in the call to setlocale. It's tempting to use LC_ALL which means that all text should be translated, but that does not work - I believe the reason is that a sublocale needs to be specified. The higher level locale may have sublocales with conflicting settings - for example an application that outputs messages in English could have Pounds or Dollars as currency.

The other thing to be aware of is that the call to bindtextdomain needs to be performed before that domain is used by the textdomain call.

Internationalisation with gettext

One of my projects for this week is interationalizing the tool spot. spot is coded in perl, but it's useful to prototype with a C version. You can find resources on the gnuversion of gettext and the Solaris version. There's also some helpful tutorials. There's also some examples, including a perl example.

Anyway, assume that the application initially looks like:

#include <stdio.h>
int main (int argc, char \*argv[])

The call to gettext will return a translated version of the string "test", but there's a couple of other calls that need to be performed to set things up.

The call setlocale will set the locale (or language, currency etc.) that the application will use. The call takes two parameters, the first is the item that should be translated - we want to translate only the messages, and the second is the language to translate into (if this is blank then the locale is obtained from the system setting).

The call to bindtextdomain takes a name of a domain, and then binds this to a directory where the translations will reside.

The call to textdomain will use the binding for an existing domain to look up the translations.

Probably the best way of seeing this is to look at the modified source:

#include <stdio.h>
#include <locale.h>
#include <libintl.h>

int main (int argc, char \*argv[])
  setlocale(LC_MESSAGES, "fr");
  bindtextdomain("EX", "/export/home/test/locale");
  return 0;

The call to setlocale requests that the code use the "fr" locale. The call to bindtextdomain tells the application to look in the root directory /export/home/test/locale/ for the translations.

The program can now be compiled:

$ cc test.c

However, there's no message catalogue to provide the translations. So the next step is to identify the text which requires translation. The command to do this is xgettext:

$ xgettext test.c

This command generates a file messages.po which contains all the messages that need translation:

$ more messages.po
domain "messages"
# File:m.c, line:10, textdomain("EX");
msgid  "test\\n"

The messages in the file can then be translated:

$ more messages.po
domain "messages"
# File:m.c, line:10, textdomain("EX");
msgid  "test\\n"
msgstr "notest\\n"

Once the message file exists it needs to be converted into a message file that gettext can read, this is performed by the command msgfmt:

$ msgfmt -o /export/home/test/locale/fr/LC_MESSAGES/ messages.po

The msgfmt command has an optional -o parameter to specify the location of the output file. The place where the file needs to reside and the name of the file are determined by the directory passed to bindtextdomain, the locale, and the name given to the domain.

When the program is run the output is:

$ a.out

Which shows that although the text was "test" the output from the program is the localised "notest".

Friday May 23, 2008

Internationalising perl using gettext

Just looking at using gettext to internationalise a perl program. Some functions that are coded in perl are kstat and psrinfo.

Wednesday Jan 09, 2008

Internationalising a perl program

I have a number of useful perl scripts, they all have English messages, and to be friendly need to be internationalised (or internationalized, depending on your locale). Fortunately, there's a perl module called Maketext which appears to do exactly what I need. In fact it seems to be deceptively simple, compared to the horror described in this article. Here's an example program:

$ more
#!/bin/perl -w

printf ("%i of %i optimised\\n",1,7);

To make the internationalisation happen the print statements need to be converted into calls to maketext. Maketext has a slightly different syntax since the order that the parameters are written depends on the language that they are written in. The converted version of the code looks like:

$ more
#!/bin/perl -w

use Test::L10N;

my $lang = Test::L10N->get_handle() || die "Language support error";

print $lang->maketext("[_1] of [_2] optimised\\n",1,7);

Rather than use the above version of the code, the behaviour of which will depend on the locale that it's run in, I'll modify it to assume that it's run in a US English locale.

$ more
#!/bin/perl -w

use Test::L10N;

my $lang = Test::L10N->get_handle('en_us') || die "Language support error";

print $lang->maketext("[_1] of [_2] optimised\\n",1,7);

The code now makes use of a perl module, called that is specific for my "Test" program. This module needs to be created in a Test subdirectory, and the module basically exists to include the Maketext module from the Locale package:

$ more Test/
package Test::L10N;
use base qw(Locale::Maketext);

For English locales, we'll default to keeping the original phrase as the output phrase.

$ more Test/L10N/
package Test::L10N::en;

use base qw(Test::L10N);


If we want to translate the message into US English, then we add a file:

$ more Test/L10N/
package Test::L10N::en_us;

use base qw(Test::L10N);
%Lexicon=("[_1] of [_2] optimised\\n"=>"[_1] of [_2] optimized\\n",);


The output of the program is 1 of 7 optimized rather than 1 of 7 optimised. Much clearer.


Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge


« April 2014
The Developer's Edge
Solaris Application Programming
OpenSPARC Book
Multicore Application Programming