Friday Apr 01, 2011

Context for All

Interesting post over on the Content Rules blog, discussing the issue of context (or lack of, really) for translators, and how it relates to granularity of content. It's great to see this issue raised and I think we need to be a lot more hardcore about examining the claims about how intractable the problem of lack of context for translation can be.

For me, the problem with this context debate is that it is decontextualized (ho, ho) from the total content lifecycle and the tools and process side of things. For one thing, has anyone considered what context an application developer or technical writer has when reusing content (whether burst DocBook objects in a CMS or DITA conrefs)? Is it any worse/better than what translation teams have? Surely, if content developers have context from their CMS (or development environment) then isn't the issue why translators aren't accessing the CMS/environment and working in there too? Why ever remove content from a database just so to translate it? What is very clear to me is that context can be easily and automatically derived from the development environment and included within a translatable file format, if you so design it. Here's a simple Oracle-related example:

xliff_context.png

Information quality helps a lot here too. A term should only have one meaning in that context. So, derived context, information quality, and repository-based operations can solve the problem. Sure, a badly written little piece of text copied and pasted into Notepad and sent around the world via e-mail is going to lead to trouble. Duh.

What translation teams should not be pushing for however, is dumbed-down text devoid of any real style or necessary references just to make translation easier. Contextual information is a critical UX. Bland, generic content is not - that stuff damages the UX in all languages. Nor should content developers have to "write in" context in the form of translation notes for translation teams. That is a waste of development time and resources. Derive the context automatically, instead.

Regardless of how this context is provided, the most frustrating part of this context debate is the lack of insight displayed by advocates about the application lifecycle. Context has been positioned by internationalization and translation teams as something exclusively required on translatability grounds. It's not. In fact, context is a critical part of any UX-effective customization or extensibility efforts.

Less than a quarter of enterprise application deployments stay 'vanilla'. The rest are customized: modified for customer needs, and with extra bits of functionality added on that need to look the same as the rest. Without context for developers and functional users, such customization/extensibility efforts can be very tough indeed in UX terms. In fact, 'translation context' columns in databases are usually repurposed description columns intended for development, implementor and customization team notes. That internationalization and translation teams never leveraged a wider argument in support of improved context doesn't surprise me.

I believe the context for all requirement is one that can be met. But it will be by UX people, and not translation teams.

Wednesday Jan 05, 2011

Translation and Localization Resources for UX Designers

Here is a handy list of translation and localization-related resources for user experience professionals. Following some basic guidelines will help you design an easily translatable user experience.

Most of the references here are for web pages or software. Fundamentally, remember your designs will be consumed globally, and never divorce the design process from the development or deployment effort that goes into bringing your designs to life in code. Designers, ask yourself today: Do you know how the text you are using in your designs is delivered to the customer, even in English?

Key areas that UX designers always seen to fall foul of, in the enterprise applications space anyway, are:

  • Terminology that is impossible to translate (jargon, multiple modifiers, gerunds) or is used inconsistently.
  • Poorly written, verbose text (really, just write well in English, no special considerations).
  • String construction (concatenation of parts, assembled dynamically). This seems particularly problematic in search or calendar user interfaces. Days, weeks, months, and years are gender dependent in some languages. Thus, we have the composite messaging and positioning  issue (my favorite):
concat_calendar.png
  • Hard-coded fonts, small font sizes, or character formatting or casing that doesn't work globally.
  • Format that is not separate from content. 
  • Restricted real estate not allowing for text expansion in translation.
  • Forcing formatting with breaks, and hard-coding alphabetical sorting in one language.
  • Graphics that do not work for bi-di languages (because they indicate directionality and can't flip) or contain embedded text. The problems of culturally offensive icons are well known by now in the enterprise applications space, though there are some dangers, such as the use of flags to indicate languages, for example.

Resources

Doc and help considerations I can deal with later.

Saturday Jan 01, 2011

Where Next for Google Translate? And What of Information Quality?

Fascinating article in the UK Guardian newspaper called "Can Google break the computer language barrier?" In the article, Andreas Zollman, who works on Google Translate, comments that the quality of Google Translate's output relative to the amount of data required to create that output is clearly now falling foul of the law of diminishing returns. He says:

"Each doubling of the amount of translated data input led to about a 0.5% improvement in the quality of the output," he suggests, but the doublings are not infinite. "We are now at this limit where there isn't that much more data in the world that we can use," he admits. "So now it is much more important again to add on different approaches and rules-based models."

The Translation Guy has a further discussion on this, called "Google Translate is Finished". He says: 

"And there aren't that many doublings left, if any. I can't say how much text Google has assimilated into their machine translation databases, but it's been reported that they have scanned about 11% of all printed content ever published. So double that, and double it again, and once more, shoveling all that into the translation hopper, and pretty soon you get the sum of all human knowledge, which means a whopping 1.5% improvement in the quality of the engines when everything has been analyzed. That's what we've got to look forward to, at best, since Google spiders regularly surf the Web, which in its vastness dwarfs all previously published content. So to all intents and purposes, the statistical machine translation tools of Google are done. Outstanding job, Googlers. Thanks."

Surprisingly, all this analysis hasn't raised that much comment from the fans of machine translation (MT), or its detractors either for that matter. Perhaps, it's the season of goodwill? What is clear to me, however, of course is that Google Translate isn't really finished (in any sense of the word). I am sure Google will investigate and come up with new rule-based translation models to enhance what they have already and that will also scale effectively where others didn't. So too, will they harness human input and guidance, which really is the way to go in training MT in the right quality direction.

But that aside, what does it say about the quality of the data that is being used for statistical machine translation in the first place? From the Guardian article it's clear that a huge human-translated corpus drove the gains for Google Translate and now what's left is the dregs of badly translated and poorly created source materials that just can't deliver quality translations. There's a message about information quality there, surely.

In the enterprise applications space, where we have some control over content this whole debate reinforces the relationship between information quality at source and translation efficiency, regardless of the technology used to do the translation. But as more automation comes to the fore, that information quality is even more critical if you want anything approaching a scalable solution. This is important for user experience professionals. Issues like user generated content translation, multilingual personalization, and scalable language quality are central to a superior global UX; it's a competitive issue we cannot ignore.

About

Oracle applications global user experience (UX): Culture, localization, internationalization, language, personalization, more. For globally-savvy UX people, so that it all fits together for Oracle's worldwide customers.

Audience: Enterprise applications translation and localization topics for the user experience professional (designers, engineers, developers, researchers)!
Profile

Ultan Ó Broin. Director, Global Applications User Experience, Oracle Corporation. On Twitter: @localization

Links

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today