Wednesday Jul 25, 2007

Merging with tkdiff and Mercurial

I first encountered Mercurial last year when my project (openInstaller) decided to switch from TeamWare to Mercurial because everyone else was doing it :)  I had many years of TeamWare experience and was reluctant to switch.  Well, I got over it, and now I see why everyone was all fired up.  It's a pretty sweet system.  But that's not the point of this article.

Merging sucks

I personally hate to merge.  I avoid it whenever possible, because in a complex merge, you'll usually make a mistake or two.  I've also done my share of silent mismerges.  Usually, if I have a large-ish change and have to merge it with another large-ish change, I'll do it manually (e.g. take the parent repository, and manually re-introduce my changes into the files in conflict by hand).  Mercurial and TeamWare both support the concept of merging, but mercurial does it better under the covers (heh, sounds like a great bumper sticker).  The problem is, the stock version of Mercurial on my Solaris Express (and probably openSolaris) doesn't find any GUI utilities like tkdiff or kdiff3 because they aren't installed on the stock version of the OS.

If I install tkdiff, things get better, but I'm faced with a ghastly set of default color choices:

I am blinded by black-on-white (or anything-on-white for that matter).  I prefer the "blackout windows" color themes, where the majority of your screen's pixels are black (or close to black) most of the time.  Easier on my eyes, anyway.  So, using this .tkdiffrc:

# This file was generated by TkDiff 4.1.3
# Mon Mar 26 12:14:03 EDT 2007

set prefsFileVersion {4.1.3}

# Automatically center current diff region
define autocenter {1}

# Automatically select the nearest diff region while scrolling
define autoselect {0}

# Tag options for characters in line view
define bytetag {-background blue -foreground white}

# Tag options for changed diff region
define chgtag {-background blue}

# Color change bars to match the diff map
define colorcbs {1}

# Tag options for the current diff region
define currtag {-background #333300}

# Tag options for deleted diff region
define deltag {-background #660000 -font {Courier -12 bold}}

# diff command
define diffcmd {diff}

# Tag options for diff regions
define difftag {-background #222222}

# Program for editing files
define editor {}

# Windows-style toolbar buttons
define fancyButtons {0}

# Text window size
define geometry {80x30}

# Ignore blanks when diffing
define ignoreblanks {1}

# Ignore blanks option
define ignoreblanksopt {-w}

# Tag options for diff region inline differences
define inlinetag {-background DodgerBlue -font {Courier -12 bold}}

# Tag options for inserted diff region
define instag {-foreground green -font {Courier -12 bold}}

# Tag options for overlap diff region
define overlaptag {-background yellow}

# Show change bars
define showcbs {1}

# Show inline diffs (byte comparisons)
define showinline1 {0}

# Show inline diffs (recursive matching algorithm)
define showinline2 {0}

# Show current line comparison window
define showlineview {1}

# Show line numbers
define showln {1}

# Show graphical map of diffs
define showmap {1}

# Synchronize scrollbars
define syncscroll {1}

# Tab stops
define tabstops {8}

# Highlight change bars
define tagcbs {0}

# Highlight line numbers
define tagln {1}

# Highlight file contents
define tagtext {1}

# Text widget options
define textopt {-background black -foreground gray -font {Courier -14} -wrap non

# Directory for scratch files
define tmpdir {/tmp}

# Use icons instead of labels in the toolbar
define toolbarIcons {1

 I get a nice-looking mostly-black window.  Additions are represented with green, changes with blue, and deletions with red.


To install tkdiff on recent versions of Solaris, I had to:

1. Download it, extract it, and ensure that the 'tkdiff' binary was on my $PATH.

2. Solaris, for some bizarre does not include wish (The Tk windowing shell which tkdiff requires).  Oh, my mistake, it does include it, but the command is called wish8.3!  I brought this up earlier, but no satisfactory answer has come my way.  So, I simply created a symlink:

[jhf@spirit]:bin$> pwd

[jhf@spirit]:bin$> ls -l wish
lrwxrwxrwx 1 jhf other 20 Nov 9 2006 wish -> /usr/sfw/bin/wish8.3

Problem solved!


Monday Jul 23, 2007

OpenInstaller: Internationalization for Serviceability

If you have ever written internationalized programs for computers, you have invariably had to write i18n'd strings like

CANT_FIND_FILE=Cannot find file {0} because {1}
FILE_IS_NOT_READABLE=The file {0} is not readable. 

This could be a typical message seen in a UI (e.g. a popup error), or in a log file, or any other place where a user may encounter such an error.  Pretty straightforward, right?  Well, not when you think about how this string may be localized by a translation team in a different locale or language than you know.  In order to understand the problems that are faced here, you need to think about how computer programs of all kinds are designed such that they can easily be used by speakers and denizens of other locales, where the spoken and written language is different, the punctuations and symbols for things like date/time separators, currency symbols, and the like are different. 

How does a program get designed to be internationalized?

Typically the program's executable content is separated from the content that needs to be localized later (like user interface strings, images, audio, etc).  This allows the base program to be produced and re-produced at will, without depending on the translations of localizeable content to be available.  The to-be-localized content is typically kept in separate files which are produced along with the base content.  The to-be-localized content is then sent to one or more entities which perform the localization by translating the to-be-localized content.  This is typically done by giving a human being who is familiar with both locales (e.g. a person who is fluent in Japanese and American English, and is also familiar with customs in both locales).  This person's job is to translate between languages/locales.  They are given a giant list of strings/images/audio files, and produce an equally giant list as a result.  Each individual item is considered independent of others.

The problems: Context

So, when the French translator is faced with

CANT_FIND_FILE=Cannot find file {0} because {1}

They may translate this as

Impossible de trouver le fichier {0} car {1}

The problem is that the word "because" is stuck in there between two contextual items, but the translator has no idea what the content of {0} and {1} are (or will be).  For some (most) languages, the phrase is going to read wrong to a native speaker, when the phrase is re-constructed. If the "because" part was "out of memory", translated to "capacite memoire insuffisante", the the final phrase a French-speaking user would see is "Impossible de trouver le fichier /tmp/foo.txt car capacite memoire insuffisante" which is improper French.  A French-speaking person could figure it out, but it makes your application a little childish.  It gets even worse in Asian languages.

Taking this to an extreme, what if someone thought they were clever, and produced this in their to-be-localized file:


The coder was thinking "If I can get these 5 words to be translated I can use them over and over again and only require 5 actual strings to be localized, thereby saving money and complexity!" (typically, localization costs money on a per-word basis).  With these 5 words, one could produce any number of phrases in the program:


The French translator is going to translate the 5 words to:

{Un} {Mauvais} {Example} {A} {suivre}

Now, when the program is run in the fr (French) locale, when the {A} {BAD} {EXAMPLE} {TO} {FOLLOW} string is needed, the user is going to see "Un mauvais example a suivre"  Doesn't make much sense to a French-speaking person. This is an extreme example, but illustrates the problem of "context"

Dynamic substitution

Most applications that deal with Strings (like the above)  store the translations in a file that has a bunch of key/value pairs.  During execution, when a string needs to be shown, a lookup is performed on that table, to find the translation of a particular string for a particular locale/language.  The key used to perform the lookup is specified in the program.  e.g. in Java, to create a button, one might put:

JButton b = new JButton( "SOME_KEY" ) ;

The SOME_KEY is used to lookup the string to show to the user. 

A common error is including dynamic values in the key to be used in a lookup.  For example, in Unix shell script, one might use the gettext utility in this way:

echo `${GETTEXT} "${JAVA_HOME} must be the root directory of a valid JVM installation"`

See the problem?  The key used to look up the value in the translations will contain a dynamic pathname, based on the user's local system.  This key will obviously never be found in the translation table, because the translation table only contains ONE entry for this message (which, incidentally, will never actually be found, as the value of ${JAVA_HOME} at the time the translation table was created was probably "" (empty string)). 

The solution here is to remove the dynamic stuff from the string.  For example:

printf "`${GETTEXT} %s must be the root directory of a valid JVM installation`" ${JAVA_HOME}

Better still, to eliminate the problems of context (as explained above), one might:


printf "`${GETTEXT} Invalid JVM installation directory.` `${GETTEXT} directory`=${JAVA_HOME}" 

The solution is to completely avoid doing parameterized substitution in error messages, or any other message that needs to be localized.  This avoids the problems of lack of context and dynamic substitution illustrated above.  For example, instead of

FILE_NOT_FOUND=The file {0} could not be found because {1}.

This is instead written as:

FILE_NOT_FOUND=The specified file could not be found.

The "because" part (the reason the file could not be found) is not included in the original message.  Instead, it is associated with the error using a context object which is attached to the error message and optionally shown to the user when the final string is constructed for display (or logging).  The context items are shown with the error message, but not as part of the message.  They are typically shown after the message.  For example:

The specified file could not be found.  File=c:\\temp\\foo.txt Reason=Out Of Memory

Again, not all parameterized messages suffer from context problems.  However, as a best practice it results in more serviceable error messages and logs, especially when being serviced by personnel who aren't as fluent in a particular language or locale as a native.


Internally, openInstaller uses the org.openinstaller.util.EnhancedException class as a superclass for all project-specific exceptions thrown.  This class has the ability to attach one or more contexts.  For example:

 throw new EnhancedException("FILE_NOT_FOUND", "file=" + file, "reason=" + theReason);

You'll notice that there is no Resource Bundle lookups, and no substitutions occuring here.  The information is attached to the exception object in its raw form.  Only when the content is shown to the user (e.g. when it is displayed in a popup, or written to a persistent log file) is the final message formed, using the techniques detailed above.  In the above example, there are two strings attached to the exception (each representing a piece of context that is associated with the error message).  The first string, "file=" + file, denotes the file that has the problem.  openInstaller will attempt to translate the left-hand side of the = sign ("file").  This means that the final message may appear as:

File Not Found.  File=/tmp/foo.txt Reason=Out Of Memory

in French, this may be shown as:

Fichier non trouve.  Fichier=/tmp/foo.txt Raison=pas assez de memoire

By using this throughout the project, openInstaller avoids any translation artifacts and phrases that appear as though a 3 year old child spoke them. 

Also, openInstaller also uses an emerging format for logging messages which allows logs to be translated and re-translated independent of the programs that produced the original logs.  More on this in a future blog. 

How can you use EnhancedException in openInstaller?

As openInstaller is fully declarative, in most cases you won't even need to worry about this.  However, if you are writing custom validation code for a configuration parameter (e.g. asking for a port number, and the port number must be > 1023), then you can throw an EnhancedException when a failure occurs.  This allows the openInstaller engine to log the failure, as well as produce a nicely-formatted message for the user (in all display modes, even CUI and Script/Batch/Silent mode).  For example:

theValue = (String) thisProperty.getUnconfirmedValue();
if (theMainPassword != null && !theMainPassword.equals(theValue)) {
throw new EnhancedException("PASSWORDS_MISMATCH", new String[] {"reason=" + reason});


This would appear in your configuration schema (xcs) file which describes the configuration parameters.  When the user enters a value on the associated UI screen, and clicks "Next", this code snippet is run (in addition to any basic validation parameters, like whether a string is really a string, or whether it is an integer within a desired range).  More details on configuration validation can be found at Sandeep's blog.


Thursday Jul 12, 2007

openInstaller: GUI and CUI parity, made easy

openInstaller has come up with a revolutionary way for developers to achieve GUI and CUI functional parity virtually for free. By using a combination of several open source projects, including SwiXML, Charva, and nCurses,
the install developer can simply create their UI layout once, in a
single descriptive  ML file (more to come there, a GUI designer tool is
in the works). 

For example, here's a typical license screen in GUI:


Here's the same screen rendered in CUI:



[Read More]



« August 2016