Monday Jul 23, 2007

OpenInstaller: Internationalization for Serviceability

If you have ever written internationalized programs for computers, you have invariably had to write i18n'd strings like

CANT_FIND_FILE=Cannot find file {0} because {1}
FILE_IS_NOT_READABLE=The file {0} is not readable. 

This could be a typical message seen in a UI (e.g. a popup error), or in a log file, or any other place where a user may encounter such an error.  Pretty straightforward, right?  Well, not when you think about how this string may be localized by a translation team in a different locale or language than you know.  In order to understand the problems that are faced here, you need to think about how computer programs of all kinds are designed such that they can easily be used by speakers and denizens of other locales, where the spoken and written language is different, the punctuations and symbols for things like date/time separators, currency symbols, and the like are different. 

How does a program get designed to be internationalized?

Typically the program's executable content is separated from the content that needs to be localized later (like user interface strings, images, audio, etc).  This allows the base program to be produced and re-produced at will, without depending on the translations of localizeable content to be available.  The to-be-localized content is typically kept in separate files which are produced along with the base content.  The to-be-localized content is then sent to one or more entities which perform the localization by translating the to-be-localized content.  This is typically done by giving a human being who is familiar with both locales (e.g. a person who is fluent in Japanese and American English, and is also familiar with customs in both locales).  This person's job is to translate between languages/locales.  They are given a giant list of strings/images/audio files, and produce an equally giant list as a result.  Each individual item is considered independent of others.

The problems: Context

So, when the French translator is faced with

CANT_FIND_FILE=Cannot find file {0} because {1}

They may translate this as

Impossible de trouver le fichier {0} car {1}

The problem is that the word "because" is stuck in there between two contextual items, but the translator has no idea what the content of {0} and {1} are (or will be).  For some (most) languages, the phrase is going to read wrong to a native speaker, when the phrase is re-constructed. If the "because" part was "out of memory", translated to "capacite memoire insuffisante", the the final phrase a French-speaking user would see is "Impossible de trouver le fichier /tmp/foo.txt car capacite memoire insuffisante" which is improper French.  A French-speaking person could figure it out, but it makes your application a little childish.  It gets even worse in Asian languages.

Taking this to an extreme, what if someone thought they were clever, and produced this in their to-be-localized file:


The coder was thinking "If I can get these 5 words to be translated I can use them over and over again and only require 5 actual strings to be localized, thereby saving money and complexity!" (typically, localization costs money on a per-word basis).  With these 5 words, one could produce any number of phrases in the program:


The French translator is going to translate the 5 words to:

{Un} {Mauvais} {Example} {A} {suivre}

Now, when the program is run in the fr (French) locale, when the {A} {BAD} {EXAMPLE} {TO} {FOLLOW} string is needed, the user is going to see "Un mauvais example a suivre"  Doesn't make much sense to a French-speaking person. This is an extreme example, but illustrates the problem of "context"

Dynamic substitution

Most applications that deal with Strings (like the above)  store the translations in a file that has a bunch of key/value pairs.  During execution, when a string needs to be shown, a lookup is performed on that table, to find the translation of a particular string for a particular locale/language.  The key used to perform the lookup is specified in the program.  e.g. in Java, to create a button, one might put:

JButton b = new JButton( "SOME_KEY" ) ;

The SOME_KEY is used to lookup the string to show to the user. 

A common error is including dynamic values in the key to be used in a lookup.  For example, in Unix shell script, one might use the gettext utility in this way:

echo `${GETTEXT} "${JAVA_HOME} must be the root directory of a valid JVM installation"`

See the problem?  The key used to look up the value in the translations will contain a dynamic pathname, based on the user's local system.  This key will obviously never be found in the translation table, because the translation table only contains ONE entry for this message (which, incidentally, will never actually be found, as the value of ${JAVA_HOME} at the time the translation table was created was probably "" (empty string)). 

The solution here is to remove the dynamic stuff from the string.  For example:

printf "`${GETTEXT} %s must be the root directory of a valid JVM installation`" ${JAVA_HOME}

Better still, to eliminate the problems of context (as explained above), one might:


printf "`${GETTEXT} Invalid JVM installation directory.` `${GETTEXT} directory`=${JAVA_HOME}" 

The solution is to completely avoid doing parameterized substitution in error messages, or any other message that needs to be localized.  This avoids the problems of lack of context and dynamic substitution illustrated above.  For example, instead of

FILE_NOT_FOUND=The file {0} could not be found because {1}.

This is instead written as:

FILE_NOT_FOUND=The specified file could not be found.

The "because" part (the reason the file could not be found) is not included in the original message.  Instead, it is associated with the error using a context object which is attached to the error message and optionally shown to the user when the final string is constructed for display (or logging).  The context items are shown with the error message, but not as part of the message.  They are typically shown after the message.  For example:

The specified file could not be found.  File=c:\\temp\\foo.txt Reason=Out Of Memory

Again, not all parameterized messages suffer from context problems.  However, as a best practice it results in more serviceable error messages and logs, especially when being serviced by personnel who aren't as fluent in a particular language or locale as a native.


Internally, openInstaller uses the org.openinstaller.util.EnhancedException class as a superclass for all project-specific exceptions thrown.  This class has the ability to attach one or more contexts.  For example:

 throw new EnhancedException("FILE_NOT_FOUND", "file=" + file, "reason=" + theReason);

You'll notice that there is no Resource Bundle lookups, and no substitutions occuring here.  The information is attached to the exception object in its raw form.  Only when the content is shown to the user (e.g. when it is displayed in a popup, or written to a persistent log file) is the final message formed, using the techniques detailed above.  In the above example, there are two strings attached to the exception (each representing a piece of context that is associated with the error message).  The first string, "file=" + file, denotes the file that has the problem.  openInstaller will attempt to translate the left-hand side of the = sign ("file").  This means that the final message may appear as:

File Not Found.  File=/tmp/foo.txt Reason=Out Of Memory

in French, this may be shown as:

Fichier non trouve.  Fichier=/tmp/foo.txt Raison=pas assez de memoire

By using this throughout the project, openInstaller avoids any translation artifacts and phrases that appear as though a 3 year old child spoke them. 

Also, openInstaller also uses an emerging format for logging messages which allows logs to be translated and re-translated independent of the programs that produced the original logs.  More on this in a future blog. 

How can you use EnhancedException in openInstaller?

As openInstaller is fully declarative, in most cases you won't even need to worry about this.  However, if you are writing custom validation code for a configuration parameter (e.g. asking for a port number, and the port number must be > 1023), then you can throw an EnhancedException when a failure occurs.  This allows the openInstaller engine to log the failure, as well as produce a nicely-formatted message for the user (in all display modes, even CUI and Script/Batch/Silent mode).  For example:

theValue = (String) thisProperty.getUnconfirmedValue();
if (theMainPassword != null && !theMainPassword.equals(theValue)) {
throw new EnhancedException("PASSWORDS_MISMATCH", new String[] {"reason=" + reason});


This would appear in your configuration schema (xcs) file which describes the configuration parameters.  When the user enters a value on the associated UI screen, and clicks "Next", this code snippet is run (in addition to any basic validation parameters, like whether a string is really a string, or whether it is an integer within a desired range).  More details on configuration validation can be found at Sandeep's blog.





« August 2016