Most applications store a wide variety of text such as names, addresses, and descriptions. When searching on these values, often you want letters in the values to match the search pattern disregarding the case.
For example, the Olympic data set stores athlete names in this format:
FAMILY NAME, Given Names
To find all the people who have Barry for any of their names, you can (upper- or) lowercase the column and search string:
select athlete_name from olym_athletes where upper ( athlete_name ) like upper ( '%barry%' ) order by athlete_name; ATHLETE_NAME BARRY, Kevin BARRY, William Louis DAGGER, Barry Edward DANCER, Barry DAVIS, Barry Alan DEMET-BARRY, Deirdre KELLY, Barry MAGEE, Arthur Barry MAISTER, Barry John WEITZENBERG, Charles Barry
While simple, it’s easy to forget to standardize the case and makes it harder for the optimizer to use indexes.
Plus it only solves some of the text searching challenges. What if you want accent insensitive searches too? Or to ignore case when sorting the values? Or take language-specific cases into consideration, like German ‘ß’ matching ‘SS’?
From Oracle Database 12.2 these are all easy. In this post you’ll see how to do:
- Case-insensitive search
- Accent-insensitive search
- Sort rows regardless of case or accents
- Define search and sorting rules for a column
- Ignore case or diacritics before Oracle Database 12.2
- Create indexes for fast searches
- A word of caution when collating long text
If you want to try the examples yourself, you can get the scripts on Live SQL.

Photo by Andrea Piacquadio from Pexels
How to do case-insensitive searches
Oracle Database 12.2 added the collate operator. This allows you to specify the search and sort semantics for text in a query. This comes after the expression you want to apply these to.
The format for this is:
<expression> collate <collation name>
The collation defines the comparison rules; binary_ci does a case-insensitive comparison using the numeric value of the characters. So to find all the athletes with “barry” in their name in any case, write:
select athlete_name
from olym_athletes
where athlete_name like '%barry%'
collate binary_ci
order by athlete_name;
ATHLETE_NAME
BARRY, Kevin
BARRY, William Louis
DAGGER, Barry Edward
...
This applies the collation to the literal %barry%. This takes priority over any object or session collation settings.
You can use collations anywhere you compare character values (varchar2, char, nvarchar2, and nchar). This includes order by, group by, set operators, and many character functions. Refer to the documentation for a complete list.
They also apply to the standard comparison conditions: =, <, >=, in, between, etc. So to find how many athletes whose family name starts with the letter “n” or later in the alphabet, use:
select count (*)
from olym_athletes
where athlete_name >= 'n'
collate binary_ci;
COUNT(*)
8203
Without the collate clause this returns:
select count (*)
from olym_athletes
where athlete_name >= 'n';
COUNT(*)
12
Lowercase comes after uppercase in a binary collation and all the names in the table start with uppercase letters. So who are the athletes with lowercase family names?
select athlete_name from olym_athletes where athlete_name >= 'n'; ATHLETE_NAME ÖRSTED, Hans-Henrik ÖRTEGREN, Ruben ÖRVIG, Erik ÖRVIG, Olav ÖRVIG, Thor ÖSTENSEN, Östen ÖSTERVOLD, Henrik ÖSTERVOLD, Jan Olsen ÖSTERVOLD, Kristian Olsen ÖSTERVOLD, Ole Olsen ÖSTMO, Ole ÖSTRAND, Per-Olof
Interesting.
Their names are in uppercase, but start with “Ö” (O-umlaut). Characters with diacritics (acute, cedilla, etc.) are after all other letters in a binary collation. So O < n, but Ö > n.
This raises a couple of questions: how can we do accent-insensitive searches and how can we sort diacritics to the correct location in an alphabet?
How to do accent-insensitive searches
To ignore diacritics in comparisons, use an accent-insensitive collation. Oracle Database has a wide range of collations. These are of the form:
<collation name>{_(CI|AI)}
The suffix determines the comparison rules:
- CI = case-insensitive, accent-sensitive
- AI = accent- and case-insensitive
If you omit the suffix, the collation is case and accent sensitive. There are no case-sensitive but accent insensitive collations.
So to find all athletes with a name like “Helene” including the accented variations (é, etc.), use:
select substr (
athlete_name,
instr ( athlete_name, ', ' ) + 2
) given_names,
athlete_name
from olym_athletes
where athlete_name like '%helene%'
collate binary_ai
order by given_names;
GIVEN_NAMES ATHLETE_NAME
Helene CORTIN, Helene
Helene JUNKER, Helene
Helene MAYER, Helene
Helene MADISON, Helene
Helene "Leni" SCHMIDT, Helene "Leni"
Hélène PREVOST, Hélène
Marie-Helene PREMONT, Marie-Helene
Note that Hélène appears last in this list of Helenes; Helene “Leni” appears above it.
You can change Hélène’s position in the results by defining a collation for the sort.
Image by Michael Schwarzenberger from Pixabay
How to do case- or accent-insensitive sorting
You can state which collation to use when sorting data by either:
- Including the collation in the
order by - Defining the collation in the
selectlist, alias the expression, reference the alias in theorder by
For example:
select substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) given_names,
athlete_name
from olym_athletes
where athlete_name like '%helene%'
collate binary_ai
order by given_names
collate binary_ai;
GIVEN_NAMES ATHLETE_NAME
Helene CORTIN, Helene
Helene JUNKER, Helene
Helene MAYER, Helene
Helene MADISON, Helene
Hélène PREVOST, Hélène
Helene "Leni" SCHMIDT, Helene "Leni"
Marie-Helene PREMONT, Marie-Helene
-- this returns the same output as above
select substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate binary_ai given_names,
athlete_name
from olym_athletes
where athlete_name like '%helene%'
collate binary_ai
order by given_names;
With the collation binary_ai, Helene has the same value as Hélène. So the output of these queries are non-deterministic. Hélène could appear anywhere in the list of Helenes.
In many languages, some accented characters have a specific position in their alphabet. To ensure these letters appear in the correct location for that language, you can use a linguistic collation. In general these match the name of the target language, for example:
- French (_ci, _ai)
- German (_ci, _ai)
- Czech (_ci, _ai)
Using linguistic accent-sensitive collations you can change where Hélène appears. This shows the difference between French and Czech ordering:
select substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate czech as given_names,
athlete_name
from olym_athletes
where athlete_name like '%, helene%'
collate binary_ai
order by given_names;
GIVEN_NAMES ATHLETE_NAME
Helene CORTIN, Helene
Helene JUNKER, Helene
Helene MAYER, Helene
Helene MADISON, Helene
Helene "Leni" SCHMIDT, Helene "Leni"
Hélène PREVOST, Hélène
select substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate french as given_names,
athlete_name
from olym_athletes
where athlete_name like '%, helene%'
collate binary_ai
order by given_names;
GIVEN_NAMES ATHLETE_NAME
Helene CORTIN, Helene
Helene JUNKER, Helene
Helene MAYER, Helene
Helene MADISON, Helene
Hélène PREVOST, Hélène
Helene "Leni" SCHMIDT, Helene "Leni"
In French, Hélène is above Helene “Leni”; in Czech the opposite is true.
If you’re unsure which order diacritic characters appear in a language, you can check their sort key using nlssort. Just pass the characters to this with your desired collation:
select substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate french as given,
nlssort (
substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate french
) as french_sort_value,
nlssort (
substr (
athlete_name, instr ( athlete_name, ',' ) + 2
) collate czech
) as czech_sort_value
from olym_athletes
where athlete_name like '%, helene%'
collate binary_ai
order by given;
GIVEN FRENCH_SORT_VALUE CZECH_SORT_VALUE
Helene 37284B2855280001020202020200 37284B2855280002010101010100
Helene 37284B2855280001020202020200 37284B2855280002010101010100
Helene 37284B2855280001020202020200 37284B2855280002010101010100
Helene 37284B2855280001020202020200 37284B2855280002010101010100
Hélène 37284B2855280001040206020200 37284B55280002030100E8010100
...
The collate operator gives you lots of flexibility to search and sort text exactly as you want. But it suffers from the same issues as using upper or lower. Namely you have to remember to do it and it can limit the optimizer’s ability to use indexes.
To overcome these you can push collation definitions down to the column itself.
Make case-insensitive the default with column-level collation
When creating tables you can specify a collation for each text column and a default for the whole table:
create table olym_athletes (
id number
not null,
athlete_name varchar2(255)
collate binary_ai
not null,
athlete_gender varchar2(10)
collate binary_ci
not null
) default collation binary_ai;
NOTE: To define collations at the column, table or schema level you need to enable extended types. You must also set compatible to 12.2 or higher.
Any columns with unspecified collations inherit this from the table default. If you omit this, the database takes the user’s default collation.
You can check the default collation for a table and any column-level overrides with this query:
select table_name,
default_collation,
column_name,
collation
from user_tables
join user_tab_cols
using ( table_name )
where table_name = 'OLYM_ATHLETES';
TABLE_NAME DEFAULT_COLLATION COLUMN_NAME COLLATION
OLYM_ATHLETES BINARY_AI ID <null>
OLYM_ATHLETES BINARY_AI ATHLETE_NAME BINARY_AI
OLYM_ATHLETES BINARY_AI ATHLETE_GENDER BINARY_CI
You can also change these at any time with alter table:
alter table olym_athletes modify athlete_name collate french_ci;
These are metadata changes, so are instant no matter how many rows are in the table.
NOTE: You must drop any indexes on the target column before you can change its collation and recreate them afterwards. Creating an index on a large table can take a long time. There are also several other restrictions for when you can change the collation of a column.
With column-level collation in place, comparisons use the specified collation by default:
select athlete_name from olym_athletes where athlete_name like '%barry%'; ATHLETE_NAME WEITZENBERG, Charles Barry DAGGER, Barry Edward DANCER, Barry DAVIS, Barry Alan DEMET-BARRY, Deirdre KELLY, Barry MAGEE, Arthur Barry MAISTER, Barry John BARRY, Kevin BARRY, William Louis
As with columns, you can change the default collation for a table or user with alter table and alter user respectively:
alter table <table_name> default collation <collation_name>; alter user <username> default collation <collation_name>;
When you change the default for a table or schema, existing columns keep their current collation. Only new columns use the new default. So if you want to make all the character columns in an existing schema case-insensitive, you should:
- Change the default for the user
- Change the default for all the tables
- Drop indexes on all character columns
- Change the collation for every column
- Re-create the indexes
It is rare you’ll want to do this for an entire schema. While it’s handy to default people’s names, addresses, product descriptions, etc. to a case-insensitive collation, applications typically store many other types of text. In some cases, these need to be case-sensitive, for example, Oracle usernames.
A better default collation is binary. This is the fastest collation for comparisons and can use standard indexes. It also makes comparisons of column values immune to session settings, which are used in the legacy method for case-insensitive search. This is often critical for correct results.
If you’ve never used the collate keyword in any create or alter statements, all columns have the collation using_nls_comp, which means the columns use the legacy method. When
You can also define collations in virtual columns and views. So if you often search a column with different language or sensitivity rules, you could add these for each common collation you’ll work with:
alter table olym_athletes
add (
athlete_name_french as
( athlete_name collate french ),
athlete_name_czech as
( athlete_name collate czech ),
athlete_name_binary_ci as
( athlete_name collate binary_ci )
);
NOTE: You must omit the data type of the virtual column when defining a collation. If you include the data type, the virtual column will use the table’s default collation.
So this is all great if you’re on 12.2 or newer, but what if you’re stuck on an ancient version of Oracle Database?
Image by Andreas Lischka from Pixabay
How to do accent- and case-insensitive search on Oracle Database 12.1 and earlier
At this point you may be wondering: how do I do all this on vintage versions of Oracle Database I have?
We’ve already seen the most common workaround – converting all the characters to a standard case. You can also do case-insensitive comparisons using regexp_like.
But what if you want this to happen transparently, like with column-level collation? Or need accent-insensitive searches?
You can do this by setting the nls_sort and nls_comp parameters. Check their current values with this query:
select * from nls_session_parameters where parameter in ( 'NLS_COMP', 'NLS_SORT' ); BINARY BINARY
To use a different collation for comparisons, set nls_comp to linguistic and nls_sort to the collation you want to use:
alter session set nls_sort = binary_ai; alter session set nls_comp = linguistic;
Use alter system to change the default settings for your database:
alter system set nls_sort = binary_ai scope = spfile; alter system set nls_comp = linguistic scope = spfile;
You must restart your instance for these settings to take effect.
You can continue to use these parameters to control searching and sorting from 12.2 onwards. If you do, the collate operator and any column-level collations you’ve specified take precedence over these parameters.
Note that setting nls_comp and nls_sort parameters affect all comparisons in a query. For complex queries, with many comparisons and joins, this may be a big performance problem. The collate operator gives you the flexibility to apply more expansive linguistic processing only where it is really needed.
If you’ve defined column-level collations and want to revert back to using the NLS parameters, change the column to use using_nls_comp:
alter table olym_athletes modify athlete_name collate using_nls_comp;
Whichever method you use to do insensitive queries, you’ll want the database to use an index to make them fast. The question is: how?
How to index case-insensitive queries
If you’re relying on NLS settings to do case-insensitive search, create a function-based index using nlssort:
create index alth_name_nls
on olym_athletes (
nlssort ( athlete_name, 'nls_sort = binary_ai')
);
select *
from olym_athletes
where athlete_name = 'latynina, larisa';
ID ATHLETE_NAME ATHLETE_GENDER
10549 LATYNINA, Larisa Women
-------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| OLYM_ATHLETES |
| 2 | INDEX RANGE SCAN | ALTH_NAME_NLS |
-------------------------------------------------------------
Queries using the collate operator can also range scan these indexes. So if you have an existing application with lots of nlssort indexes you can start using collate and the optimizer will still be able to use these indexes.
But once you’re on 12.2 it’s easier and more obvious to use collate within the index itself! For example:
alter session set nls_comp = binary;
alter session set nls_sort = binary;
select *
from olym_athletes
where athlete_name = 'latynina, larisa'
collate binary_ai;
-------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| OLYM_ATHLETES |
| 2 | INDEX RANGE SCAN | ALTH_NAME_NLS |
-------------------------------------------------------------
create index athlete_name_ci
on olym_athletes (
athlete_name
collate binary_ci
);
select *
from olym_athletes
where athlete_name = 'latynina, larisa'
collate binary_ci;
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| OLYM_ATHLETES |
| 2 | INDEX RANGE SCAN | ATHLETE_NAME_CI |
---------------------------------------------------------------
So however you choose to do your case-insensitive searches, you can index them for fast access.
There is one final thing to be aware of – using collate on long strings can lead to unexpected outcomes.
Insensitive searches on long text strings
Collation keys are raw values. This presents a couple of problems:
- The maximum size of
rawis 2,000 bytes ifmax_string_size = standardand 32,767 bytes if it’sextended - Collation keys are typically much longer than the source text
So when comparing long strings with the collate operator, there’s a good chance the collated values will go beyond the raw limit. What happens in these cases depends on whether you have extended data types enabled or not.
If these aren’t (the default), then the collate operator will only convert characters at the start of the text. Trailing parts of it are ignored.
This can lead to surprising results where strings with different endings are considered equal:
select *
from olym_athletes
where rpad ( athlete_name, 2000, 'trailing chars' ) =
rpad ( 'LATYNINA, Larisa', 2000, 'trailing chars' )
|| 'extra chars';
no rows selected
select *
from olym_athletes
where rpad ( athlete_name, 2000, 'trailing chars' ) =
rpad ( 'LATYNINA, Larisa', 2000, 'trailing chars' )
|| 'extra chars'
collate binary_ai;
ID ATHLETE_NAME ATHLETE_GENDER
10549 LATYNINA, Larisa Women
As collations apply to all comparisons, this goes beyond basic equality checks. You can find that group by and order by queries give incorrect results.
Remember this happens silently. Beware!
If you are using extended data types, the behaviour changes. When the sort key overflows the (much larger) raw limit, you’ll get an ORA-12742 error:
select *
from olym_athletes
where rpad ( athlete_name, 30000, 'trailing chars' ) =
rpad ( 'LATYNINA, Larisa', 30000, 'trailing chars' )
|| 'extra chars'
collate french_ai;
ORA-12742: unable to create the collation key
This can lead to a poor user experience, but there’s a bigger problem lurking. Hackers may be able to exploit this to carry out DoS attacks! To avoid this, ensure that the maximum length of values you collate are:
- 21,844 for the collation
binary_ci - 4,094 bytes for monolingual or multilingual collations
- 1,560 bytes for a UCA collation
In practice, this means you need to take one or more of these defensive actions:
- Only use
collateon columns storing short strings such as names - Design your applications to ensure one user can’t insert values that causes another’s to overflow
- Verify only safe strings are inserted into the table
You can do the final check in code or by adding a check constraint to the table that rejects values that could lead to an error, for example:
alter table olym_athletes
add constraint max_collation_c
check (
vsize (
nlssort ( athlete_name collate UCA1210_DUCET )
) != -1
);
The comparison in the above condition is irrelevant and is always true. However, as nlssort is evaluated when you add rows, the insert fails with etc.ORA-12742 instead of the later queries.
Note that the above issue only affects operations that use materialized collation keys behind the scene (implicitly add calls to nlssort into expression evaluation) and does not affect many other operations such as regular expression matching, select distinct, max, min, replace, instr, etc.
For more details on this issue, read the documentation on avoiding ORA-12742 errors.
Summary
So there you have it. Collations are an easy way to find and sort text according to the case and accent rules you want to use. Using the collate operator gives you fine-grained control over the rules you use in any textual comparison (searching, sorting, grouping, etc.).
To simplify your code, you can push collation definitions to the columns themselves. Generally you’ll want to define columns using the collation binary. Reserve linguistic collations for the few columns where all searches need to be accent- or case-insensitive. If you regularly search data with different language settings, you can add virtual columns for each you use.
If you’re on Oracle Database 12.1 or earlier, you can use the session parameters nls_comp and nls_sort to enable case-insensitivity. These – along with their nlssort indexes – will continue to work when you upgrade. So you can gradually migrate your code to use collate.
Remember that however you do linguistic comparisons, they’re only safe on short strings. When used on long text, you can get wrong results. Ensure you write your application to defend against this!
We hope you enjoy the new power these options give you; we’d love to hear how you’re using these.
Do you work with text in many languages? Have you used column-level collations yet? Let us know the comments!
Read more about linguistic sorting and matching in the documentation.
Get the code for this post on Live SQL
Looking for more SQL tips and tricks? Take Databases for Developers: Next Level to improve your SQL skills.
A massive thanks to Sergiusz Wolicki, Globalization PM, for reviewing this article and making many improvements.
UPDATE 23 May 2022: Added alter table and alter user examples
UPDATE 14 Apr 2025: Changed Live SQL script link to new platform
