Saturday Feb 09, 2008

Sorting with different locale

Sorting is always a tricky game in any language. Language like Java has
its own high class sorting API's. But have you ever think, how sorting
works on different locale ? How it works in French or in Spanish? Lets
have a look...

This is my String Array which I want to sort:
String[] names = {"fácil", "facil", "fast","Où", "êtes-vous", "spécifique", "specific", "ou"};
It contains words of French Locale(some of my fav. words like Où :-) )
And here goes our typical sorting program:

String[] names = {"fácil", "facil", "fast","Où", "êtes-vous", "spécifique", "specific", "ou"};
List list = Arrays.asList(names);
Collections.sort(list);
Iterator itr = list.iterator();
while(itr.hasNext()) {
System.out.print(itr.next()+ " ");
}

And the result:
Où facil fast fácil ou specific spécifique êtes-vous

Result
can surprise you and can make your French friend angry :-) because he
never want "fast" should come before "fácil", just because there is one
special 'á' (sorting is true according to UNICODE sequence but not
according to locale)

To face this problem JDK comes with something called Collator (I guess in 1.4 onwards) which take care of locale while sorting.
Collator is an abstract class. You can look the source code at location in Openjdk: jdk\\src\\share\\classes\\java\\text\\Collator.java. Highly documented file.

Collator
has some flavors like PRIMARY, SECONDARY, TERTIARY, IDENTICAL which all
tells what need to take care while sorting. Please read the javadoc for detail.

Now here is my simple code:

import java.text.\*;
import java.util.\*;


class CollatorTest {

public static void main(String[] args) {
String[] names = {"fácil", "facil", "fast","Où", "êtes-vous", "spécifique", "specific", "ou"};
List list = Arrays.asList(names);
Collections.sort(list);
Iterator itr = list.iterator();
while(itr.hasNext()) {
System.out.print(itr.next()+ " ");
}

Locale[] loc = Collator.getAvailableLocales();

/\* for(int i=0;
i<loc.length;i++)
{
System.out.println(loc[i].getDisplayName());
}
\*/
Collator myCollator = Collator.getInstance(new Locale("fr"));
myCollator.setStrength(Collator.PRIMARY);
Collections.sort(list, myCollator);
itr = list.iterator();
System.out.println("");
while(itr.hasNext()) {
System.out.print(itr.next() + " ");
}

myCollator.setStrength(Collator.TERTIARY);
Collections.sort(list, myCollator);
itr = list.iterator();
System.out.println("");
while(itr.hasNext()) {
System.out.print(itr.next() + " ");
}
}
}

And here is the result:
Où facil fast fácil ou specific spécifique êtes-vous
êtes-vous facil fácil fast Où ou specific spécifique
êtes-vous facil fácil fast ou Où specific spécifique

First
one is the normal sorting, second and third is Collator sorting with 2
different types. You can very easily see that we are giving respect to
other locale as well in sorting. There are 2-3 line comments in the
code, which will tell which all locale Collator is supporting.

About

Hi, I am Vaibhav Choudhary working in Sun. This blog is all about simple concept of Java and JavaFX.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today