Friday Dec 20, 2013

Row-oriented plain text data to Column-orienterd conversion

I think another way to look at my sqlplus spool file processing script is that it converts rowstore to columnstore, so to speak.

Column-oriented DBMS - Wikipedia, the free encyclopedia

A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on.

I'll use sample file named "a-employees.out" from Tyler's blog page.

Convert sqlplus spool output to CSV « Tyler Muth’s Blog

Here are my test case files used in the following examples.

Here are commands that I ran.

 

$ head -5 a-employees.out    //3rd line is column list
hr@orcl> select * from employees;

EMPLOYEE_ID FIRST_NAME           LAST_NAME                 EMAIL                     PHONE_NUMBER         HIRE_DATE        JOB_ID         SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
----------- -------------------- ------------------------- ------------------------- -------------------- ---------------- ---------- ---------- -------------- ---------- -------------
        198 Donald               OConnell                  DOCONNEL                  650.507.9833  ...
$ tail -5 a-employees.out   //line #5 to #111 are real data
        197 Kevin                Feeney                    KFEENEY                   650.507.9822         05/23/2006 00:00 SH_CLERK         3000                       124            50

107 rows selected.

hr@orcl> spool off
$ sed -i.bak '112,$d' a-employees.out   //removed spurious lines at the end.
$ ./plusSpool.sh a-employees.out 3    //ran my bash script with 'column list line#' as 2nd arg
$ cd out.4992    //cd to output dir. number part varies each time.
$ wc -l *           //looks like all file have correct line counts.
  107 COMMISSION_PCT
  107 DEPARTMENT_ID
  107 EMAIL
  107 EMPLOYEE_ID
  107 FIRST_NAME
  107 HIRE_DATE
  107 JOB_ID
  107 LAST_NAME
  107 MANAGER_ID
  107 PHONE_NUMBER
  107 SALARY
$ paste -d ' ' $(sed -n 3p ../a-employees.out) | diff  ../a-employees.out -     //The data part differs by 1 line but it's whitespace issue.
1,4d0
< hr@orcl> select * from employees;
<
< EMPLOYEE_ID FIRST_NAME           LAST_NAME                 EMAIL                     PHONE_NUMBER         HIRE_DATE        JOB_ID         SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
< ----------- -------------------- ------------------------- ------------------------- -------------------- ---------------- ---------- ---------- -------------- ---------- -------------
92c88
<         178 Kimberely            Grant                     KGRANT                    011.44.1644.429263   05/24/2007 00:00 SA_REP           7000            .15        149
---
>         178 Kimberely            Grant                     KGRANT                    011.44.1644.429263   05/24/2007 00:00 SA_REP           7000            .15        149 
$ sed -n '5,$p' ../a-employees.out | wc
    107    1212   19781
$ cat * | wc
   1177    1212   19782

I changed one line from original posting to make the output filesize the same.
Complete script is below. Please be aware that my script doesn't handle any corner case so your sqlplus spool file may not work.

#!/bin/bash
mkdir out.$$
sed -n "${2}p" $1 > /tmp/_columns.$$
sed -n "$(($2+1))p" $1 > /tmp/_seperator.$$
sed "1,$(($2+1))d;/^$(cat /tmp/_columns.$$)$/d;/^$(cat /tmp/_seperator.$$)$/d" $1 > /tmp/_rows.$$
pos=0
for i in $(cat /tmp/_columns.$$)
do
  echo -n "$i "
  echo -n "$(($pos+1))-"
  pos=$(grep -Eo "^.{$pos} {0,1}-+" /tmp/_seperator.$$|wc -c|tr -d ' ')
  echo $(($pos-1))
done |
while read colname posrange
do
  sed '/^$/d' /tmp/_rows.$$ | cut -c$posrange > out.$$/$colname
done
About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« December 2013 »
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
13
14
15
16
21
22
23
24
25
26
28
29
30
31
    
       
Today