Non-ascii filename and UTF-8

UTF-8 charset for Oracle DB used to include the FSS acronym which stands for "File System Safe".

5 Compatibility and Interoperability

AL24UTFFSS Character Set Desupported

 

It may be "safe" but one may want to avoid using Japanese/Chinese filenames because it's not easy to handle
them when logged on from tty console.

Here are some tips to deal with these files using only ascii chars. I use German char here instead of CJK , etc.

  1. Start a new bash session with LANG=C. With this, filename expansion works in 'byte' semantics.
  2. Also, it will change the output of 'ls'. With other LANG or LC_* environment values, even 'ls -q' doesn't show '?'.
  3. One can use '?' for bash filename expansion if there's no collision. i.e.) when it doesn't expand to multiple filenames.
  4. One cau use '[:ascii:]' for filename expansion but it may still collide.
$ LANG=C bash
$ echo *
L??w.txt Lo:w.txt Low.txt Löw.txt テスト.txt
$ echo L??w.txt
L??w.txt Lo:w.txt Löw.txt
$ echo *[^[:ascii:]]*
Löw.txt テスト.txt
$ ls -q L*[^[:ascii:]]* | sed 's/\?/[^[:ascii:]]/g'
L[^[:ascii:]][^[:ascii:]]w.txt
$ rm -i L[^[:ascii:]][^[:ascii:]]w.txt
rm: remove regular empty file `L\303\266w.txt'? n
$
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today