row-oriented text file vs. column-oriented compression ratio comparison

In my last post, I converted small row-store text file to column-store format.

Row-oriented plain text data to Column-orienterd conversion (INOUE Katsumi @ Tokyo)

$ sed -n '5,$p' ../a-employees.out | wc
107 1212 19781
$ cat * | wc
1177 1212 19782

In the above, "a-employees.out" file is SQL*Plus spool file so it's row-store format.
Column-store format files are actually separate files as you see on 2nd command above,
but when it's conCATenated , it's virtually one file.

Wikipedia says there may be difference in compression ratio.
Although my data are all text, I thought column-store will compress better.

Column-oriented DBMS - Wikipedia, the free encyclopedia

Column data is of uniform type; therefore, there are some opportunities for storage size optimizations

Below is the result. Column-oriented file compressed slightly better.
Ratio difference was 0.15% - 2% depending on command used.

$ sed -n '5,$p' ../a-employees.out | bzip2 -vc | wc --bytes
  (stdin):  6.494:1,  1.232 bits/byte, 84.60% saved, 19781 in, 3046 out.
3046
$ cat * | bzip2 -vc | wc --bytes
  (stdin):  6.559:1,  1.220 bits/byte, 84.75% saved, 19782 in, 3016 out.
3016
$ sed -n '5,$p' ../a-employees.out | gzip -vc | wc --bytes
 80.5%
3879
$ cat * | gzip -vc | wc --bytes
 82.1%
3563
$ sed -n '5,$p' ../a-employees.out | zip | wc --bytes
  adding: - (deflated 80%)
4005
$ cat * | zip | wc --bytes
 adding: - (deflated 82%)
3689
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today