New in OpenDS 2.0: I18N Collation Matching Rules

Opends2 PictoOpenDS 2.0 has just been released and there are several new and exciting features in it.

Today we will be taking a closer look at the I18N Collation Matching Rules.

In LDAP, most of the data is made of DirectoryStrings which are UTF-8 encoded strings. LDAPv3 specifications and more precisely RFC 4518, defines the way to prepare UTF-8 strings to be compared in LDAP and OpenDS being fully compliant with LDAPv3 implements this RFC.
This means that the server will properly case-fold non-ascii characters and be able to compare properly and in a case insensitive way, none ascii characters like the French é or Japanese characters.

OpenDS Entry Editor Panel

Let's work with an example, an entry with the givenName "Hélène" illustrated on the right (click on the image for a larger view).

If I search the directory for that givenname, I can retrieve the entry:

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname=hélène)'
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
mail: Helene.Detroy-AT-example-DOT-com
givenName:: SMOpbMOobmU=
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: top
uid: hdetroie
cn:: SMOpbMOobmUgRGVUcm9pZQ==
sn: DeTroie

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname=HÉLNE)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

Note : DN, CN and GivenName are base64 encoded in the result as expected per LDIF Specifications.
Note : To be able to type in correctly the string "hélène" in a terminal (like to the filters above), make sure the LANG is set to use UTF-8 encoded characters (LANG=en_US.UTF-8).

Let's see what happens if I'm searching for the same user without the accentuated letters.

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname=helene)'
$

Nothing returned. This is because in Unicode the letter e and é do not normalize the same. Now this is a big problem, especially in Europe because we do not like when our name is not written how it should be and also because the person that searches may not remember exactly how to spell the name or may not know how to type the composed character on his machine. Also in French (but with other locales as well), the letters e and é and É are considered comparing equal.

That's where the I18N Collation Matching Rules come to the rescue.
OpenDS 2.0, like his far ancestor Sun Directory Server, supports by default a set of extensible matching rules that are locale specific.
This means I can now search for the GivenName according to the Collation Rules associated with French or German or Norvegian or Japanese.

Each locale has been assigned an OID and then there are 6 different matching rules per locale : LowerOrEqual, LowerThan, Equality,GreaterOrEqual, GreaterThan , Substring.

So if one would like to match givenname for equality according to the French collation rules, the filter would be the following:
(givenname:1.3.6.1.4.1.42.2.27.9.4.76.1.3:=Helene)

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname:1.3.6.1.4.1.42.2.27.9.4.76.1.3:=helene)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

Or for a substring match, still according to the French collation rules:

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname:1.3.6.1.4.1.42.2.27.9.4.76.1.6:=hel\*)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

But remembering OID for each locale and type of matching is not easy. So we've also provided some shortcuts in the form of the locale name and a short string representing the different matching; lte, lt, eq, gte, gt, sub

Examples:

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname:fr.eq:=helene)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname:fr.sub:=hel\*)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

$ bin/ldapsearch -p 2389 -b "dc=example,dc=com" '(givenname:de.eq:=helene)' givenName
dn:: Y249SMOpbMOobmUgRGVUcm9pZSxvdT1QZW9wbGUsZGM9ZXhhbXBsZSxkYz1jb20=
givenName:: SMOpbMOobmU=

So not only those I18N Collation Matching Rules can be used in Search filters to search, but they can be used for indexing as well, and also for server side sorting.
Unfortunately, setting extensible matching rules for indexes is not possible from the ControlPanel. So it has to be done with dsconfig.

$ dsconfig set-local-db-index-prop \\
--backend-name userRoot \\
--index-name givenName \\
--add index-extensible-matching-rule:fr.eq \\
--hostname ludovic-poitous-computer-2.local \\
--port 5444 \\
--trustStorePath /Users/ludo/dev/Tests/OpenDS2rc4/config/admin-truststore \\
--bindDN cn=Directory\\ Manager \\
--bindPassword \*\*\*\*\*\* \\
--no-prompt

Don't forget to rebuild the index for the givenName attribute (bin/rebuild-index -b dc=example,dc=com -i givenname).

You can find more information about the I18N Collation Matching Rules on the OpenDS 2.0 Documentation Wiki.

Technorati Tags: , , , , ,

Comments:

[Trackback] Read their highlights, complete with configuration steps and other nuances, thanks to Ludo Poitou.

Posted by Marina Sum's Blog on July 27, 2009 at 03:22 PM CEST #

Post a Comment:
Comments are closed for this entry.
About

This is the blog of a senior software engineer, specialized in LDAP, Directory Server and OpenDS. Ludovic Poitou works in France at the Grenoble Engineering Center, in the Directory Services Engineering team. Outside work, I love skiing and taking photo

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today