Unbearable Lightness of Localizing Java Applications
By Denis GURCHENKOV
Preparing a localized version of a Java desktop application is supposed to be an easy and well documented task. This article highlights the small problems that typically stay off screen.
Back in 2001, a company that resells software development tools to the Japanese market has expressed interest in localizing and distributing our flagship product. I was hacking a Java-to-native compiler core at that time, so the assignment to create Japanese versions of our Swing-based wizards seemed to be an easy ride.
Everybody knows that Java is an excellent platform, it addresses most localization issues and Sun provides detailed tutorials on the subject. This is exactly what I thought in the beginning. However, by Murphy's law, "If everything seems to be going well, you have obviously overlooked something". This article describes what I have overlooked. May it protect you from repeating my mistakes, and save your time for more interesting ones ;-).
Localization vs. Internationalization
Obviously, you should care about localization as early as possible, and hopefully you would read some docs before coding. A good place to start is the Sun's Java Internationalization page. It in particular outlines the difference between "localization" and "internationalization":
Internationalization is the process of designing software so that it can be adapted (localized) to various languages and regions easily, cost-effectively, and in particular without engineering changes to the software.
Localization of internationalized software is done by simply adding locale-specific components, such as translated messages, data describing locale-specific behavior, fonts, and input methods, etc.
|Sun Java Internationalization Pages|
|Sun Java Localization Tutorial|
|MSDN Localization Article|
|"Internationalizaton road hazards" at IBM DeveloperWorks web site|
Before you code your app, read the Sun's internationalization tutorial. It is very short and informative, and it covers all those obvious but often-forgotten isues such as separting strings from code, using formatted output instead of string concatenation, separating currency symbols, etc. Perhaps you know all this well, but a quick review won't hurt.
I always pay attention to what IBM Java guys do and say. There is a fresh article at IBM DeveloperWorks web site that outlines typical problems in a short and lively manner.
Yet these resources do not cover everything. Below I describe a few issues that I faced when localizing our apps to Japanese and later to German.
Abbrechen, Wiederholen, Ignorieren?
|Size of translated text|
|Compare the original English button text and the translated ones. Japanese was ok, but I had to decrease the font size for the German text to fit the button.|
The English language is more compact than other European languages. This is a well-known problem for GUI designers - the translated text just does not fit, or, if layout managers are used, some dialogs look crooked.
Our applications employ custom GUI layouts crafted by a professional graphic designer. This fact made things much worse than average: having the power of all the Java layout managers at my disposal I had to deal with places where there were exactly 120 pixels available for a form field label, and not a pixel more. The form was designed with English labels, so English text fitted perfectly, and Japanese was ok in most cases. As a rule, German text did not fit at all.
In our case, translation of text strings to German required several iterations, because the first version was so large that no way we could fit it into the GUI layout. With our help, the German translators have managed to rephrase the text, select shorter synonyms, etc. This required more time than expected, so make sure to add some extra days to this stage on your schedule.
To avoid problems of this kind, read the excellent MSDN article. It states what a developer must think about in order to make future localization easy BEFORE starting to write code and design GUI. In particular, the article contains concise, specific recommendations on text sizes and free space reservation.
Printing messages to System.out
Windows uses two codepages for each language: ANSI and OEM, except in Asian, Unicode-based locales. When non-Unicode text is rendered in a graphic context, it is assumed to be encoded as ANSI. When text is printed to the standard output or console, it is supposed to be OEM-encoded. For the English language, these codepages are identical, so an English string will look the same whether you print it to the standard output or display in a GUI control. Yet for many languages these codepages are different. So if your app prints a Cyrillic string to the standard output and displays the same string in a MessageBox, one of them will be shown wrong.
One of our Swing-based GUI applications also has a command-line interface. In the command-line mode, status messages and errors are printed to the standard output. So we had to translate messages and file names from ANSI to OEM encoding.
As far as I know, there is no Java API to translate a string
from ANSI to OEM. I ended up writing a custom native
method that invokes the
AnsiToOEM() Win32 API function.
Java Platform API Behavior
|Japanese Text Layout|
|The last character of the label gets replaced with "..." even though there is enough space for it:|
Be prepared for bugs specific to localized versions of the J2SE platform. Well, "bugs" is not the right word here. Just be prepared to modify your software, because J2SE API behavior depends on the current locale.
As an example, I have discovered that on a Japanese Windows PC the JLabel rendering code may sometimes reduce the end of the label string to "..." even if there is enough space to display the entire string. The problem resolves easily, by playing with layout managers. But it required changes in my code.
|Turkish "i" problem|
|If the default locale on your PC is Turkish,|
Another well known example is the uppercase "i" problem in Turkish Java, explained in details in this JDJ article. In the Turkish alphabet, there are two letters for "i," dotless and dotted. In J2SE 1.4, the dotted "i" in lowercase becomes the dotted "I" in uppercase.
|Localized Tooltip in JavaHelp|
Even if you are not going to create localized versions of your English software, try running it with non-English locales set as default and watch out for surprises. For instance, JavaHelp displays localized tooltips according to the defaultt locale. This can be switched off, but you wouldn't know about the problem if you do not run the application in a non-English environment.
Was It All Worth It?
|Japanese localization service|
You may wish to ask me now whether it all paid off. Well, I can say that beforehand we had just a handful of sales of the English version to Japan, whereas in 2004 Japanese customers have accounted for about 10% of our net sales. I have to make a few remarks here:
- What we have done was internationalization. It was our partner who localized the product, at no cost to us (see the section Terms above if you have forgotten the difference.)
- It took us dozens of person-years to create the product, so its internationalization was a minor effort. Moreover, it would have required much less resources if we thought about it before coding.
- Our partner does sales, marketing and first line support in Japan at no up front cost to us (but their reseller commission is high.)
To summarize, with very little effort we had a 10% increase in sales, so our investment in internationalization was paid back many times over.
But above all, I have had quite some fun doing that work. Have you ever tried to set up a Japanese Linux box? If not, try to guess in what language it prints all error messages...
Localization of Java desktop application requires solving a series of small problems originating from different aspects of the J2SE API. These problems are somewhat unexpected, but easy to solve.
You will be awarded not only by attraction of new users and increase in sales, but the process itself is interesting and may add something new to your experience.