Subscribe

Share

Database, SQL and PL/SQL

All Aboard the SQL*Loader Express

A new processing mode takes the hassle out of dealing with flat file loading.

By Connor McDonald

November/December 2015

Way back in 2011, database evangelist Tom Kyte wrote, “As far as I’m concerned, SQL*Loader is dead—because external tables rule.” And Kyte was right, in that using external tables opens up a world of possibilities that Oracle Database’s SQL*Loader feature does not address, such as

  • Using all the power of WHERE clause predicates to selectively load subsets of file data
  • Joining to other tables to perform lookups while loading data
  • Sorting data as it’s loaded for better attribute clustering and subsequent query performance
  • Applying preprocessing commands to data as it’s loaded
  • And many other benefits

However, one point of frustration remained for developers and DBAs alike. The syntax for external tables, although similar to that of the SQL*Loader control file, had some subtle differences. Getting that syntax correct often took several attempts, and error messages were not always helpful.

For example, let’s consider the case of Cameron, an intrepid developer who has been given a text file with the following departmental data:

12,RESEARCH,SARATOGA
10,ACCOUNTING,CLEVELAND
11,ART,SALEM
13,FINANCE,BOSTON
21,SALES,PHILADELPHIA
22,SALES,ROCHESTER
42,INTERNATIONAL,SAN FRAN

Cameron needs to load this data into a database. She has used SQL*Loader before, and because the file is a simple delimited list, she quickly whips up a SQL*Loader control file to load the data:

LOAD DATA
INFILE "dept.dat"
INTO TABLE dept
REPLACE
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(deptno, dname, loc)

But Cameron has been a keen follower of the Ask Tom site and a fan of Kyte’s books over the years, so she decides that using an external table is the more modern approach. She converts the SQL*Loader control syntax into the equivalent external table syntax, after first creating database directory objects, a prerequisite for external tables:

SQL> CREATE TABLE EXT_DEPT
2 ( DEPTNO NUMBER(2),
3 DNAME VARCHAR2(14),
4 LOC VARCHAR2(14)
5 )
6 ORGANIZATION EXTERNAL
7 ( ( FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
8 ( DEPTNO,DNAME,LOC )
9 )
10 location ('dept.dat' )
11 )
12 /
( ( FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
*
ERROR at line 7:
ORA-30654: missing DEFAULT keyword

Her first external table load attempt does not work. Cameron has discovered that porting the SQL*Loader syntax to external table syntax is not as intuitive as she first thought. The error message suggests that a DEFAULT keyword is missing, so after a quick web search, she tries this external table syntax:

SQL> CREATE TABLE EXT_DEPT
2 ( DEPTNO NUMBER(2),
3 DNAME VARCHAR2(14),
4 LOC VARCHAR2(14)
5 )
6 ORGANIZATION EXTERNAL
7 ( DEFAULT DIRECTORY LOAD_DIR
8 ( FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
9 ( DEPTNO,DNAME,LOC )
10 )
11 location ('dept.dat' )
12 )
13 /
( FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
*
ERROR at line 8:
ORA-30648: missing LOCATION keyword

This attempt produces an even more puzzling error, because there actually does seem to be a LOCATION keyword already.

With trial and error, she eventually arrives at the correct syntax (see Listing 1). The listing shows that Cameron needed to add three clauses after ORGANIZATION EXTERNAL. Also, she added REJECT LIMIT UNLIMITED, because, by default, if even a single row is deemed invalid for loading, no rows will be loaded.

Listing 1 demonstrates that the exercise of porting existing SQL*Loader control files to external table scripts is not trivial. The syntax can be so close to being correct but still not work correctly. Cameron decides that using the external table syntax might best be left for another day.

Code Listing 1: Correct external table definition

SQL> CREATE TABLE EXT_DEPT
2 ( DEPTNO NUMBER(2),
3 DNAME VARCHAR2(14),
4 LOC VARCHAR2(14)
5 )
6 ORGANIZATION EXTERNAL
7 ( TYPE ORACLE_LOADER
8 DEFAULT DIRECTORY LOAD_DIR
9 ACCESS PARAMETERS
10 ( FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
11 ( DEPTNO,DNAME,LOC )
12 )
13 location ('dept.dat' )
14 ) REJECT LIMIT UNLIMITED
15 /
Table created.
SQL> select * from EXT_DEPT;
DEPTNO DNAME LOC
—————————— —————————————— —————————
12 RESEARCH SARATOGA
10 ACCOUNTING CLEVELAND
11 ART SALEM
13 FINANCE BOSTON
21 SALES PHILA.
22 SALES ROCHESTER
42 INT'L SAN FRAN
7 rows selected.

The quirks of migrating a SQL*Loader control file to external tables were recognized as early as 2001 in Oracle9i Database. In an attempt to address the issue, SQL*Loader in Oracle9i Database added the EXTERNAL_TABLE command-line facility, with which an existing control file can be used to generate a template for the equivalent external table.

Cameron could have utilized that for her department data by first running SQL*Loader with the GENERATE_ONLY option for EXTERNAL_TABLE:

sqlldr control=dept.ctl userid=SCOTT/**** external_table=GENERATE_ONLY

Then, if she then looked in the SQL*Loader log file, she would have seen all the required data definition language (DDL) commands for creating and using the external table, as shown in Listing 2.

Code Listing 2: External_table-generated template for an equivalent external table

CREATE DIRECTORY statements needed for files
—————————————————————————————————————————————————————————
CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00001 AS 'C:\temp'
CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 AS 'c:\temp\'
CREATE TABLE statement for external table:
—————————————————————————————————————————————————————————
CREATE TABLE "SYS_SQLLDR_X_EXT_DEPT"
(
"DEPTNO" NUMBER(2),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(13)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00001':'dept.bad'
LOGFILE 'SYS_SQLLDR_XT_TMPDIR_00001':'dept.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"DEPTNO" CHAR(255)
TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
"DNAME" CHAR(255)
TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
"LOC" CHAR(255)
TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
)
)
location
(
'dept.dat'
)
)REJECT LIMIT UNLIMITED

But back in the present, Cameron is already exasperated. In a world where she needs to know HTML, JavaScript, Java, SQL, PL/SQL, and possibly even more languages and frameworks, she’s thinking: “Why do I need to learn new syntax just to load flat files into the database?”

Compounding this headache is the big data paradigm, driven by the notion that every piece of data must be captured and retained. Cameron sighs when she realizes that loading flat file data (and writing SQL*Loader scripts or external table scripts to perform those loads) is going to be an ever-increasing part of her workday. Surely there is an easier way to load flat files than toiling repeatedly over control files or external table DDL until the syntax is correct?


A Better Way

What Cameron would really like to do is something more intuitive. Namely, she wants to be able to take a quick look at the table definition:

SQL> desc DEPT
Name Null? Type
————————————————————— —————— —————————————
DEPTNO NUMBER(2)
DNAME VARCHAR2(14)
LOC VARCHAR2(14)

and then do a cursory check of the data in the flat file to make sure it looks like a decent match for the table’s columns:

12,RESEARCH,SARATOGA
10,ACCOUNTING,CLEVELAND
11,ART,SALEM
13,FINANCE,BOSTON
21,SALES,PHILADELPHIA
22,SALES,ROCHESTER
42,INTERNATIONAL,SAN FRAN

And, finally, just load the file with a minimum of fuss.

Cameron asks herself, “Why can’t I run SQL*Loader, pass the table name and the filename, and have the database just work it out?” It turns out that Cameron is in luck, because her DBA just upgraded to Oracle Database 12c.


A New Processing Mode in Oracle Database 12c

With Oracle Database 12c, when Cameron runs SQL*Loader and enters nothing but the filename and the table name

C:\temp>sqlldr userid=scott/tiger data=DEPT.dat table=DEPT

she is delighted that SQL*Loader happily accepts just those parameters. Cameron has implicitly invoked the new “express mode” processing mode. The onscreen output reveals the load details:

SQL*Loader: Release 12.1.0.2.0 - Production on Mon Aug 17 11:08:33 2015
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Express Mode Load, Table: DEPT
Path used: External Table, DEGREE_OF_PARALLELISM=AUTO
Table DEPT:
7 Rows successfully loaded.
Check the log files:
DEPT.log
DEPT _%p.log_xt
for more information about the load.

You can see what is going on under the hood, because SQL*Loader carefully logs the entire process in its associated log file, as shown in Listing 3. SQL*Loader creates an external table based on the flat file by default and then uses a direct mode insert to load the data. Other command-line parameters can be used to control how the loading is performed.

Code Listing 3: The log file from Cameron’s SQL*Loader run

SQL*Loader: Release 12.1.0.2.0 - Production on Fri Aug 21 15:54:18 2015
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
SQL*Loader-800: PARALLEL FILE specified and ignored
Express Mode Load, Table: DEPT
Data File: DEPT.dat
Bad File: DEPT_%p.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: External Table
Table DEPT, loaded from every logical record.
Insert option in effect for this table: APPEND
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ----------
DEPTNO FIRST * , CHARACTER
DNAME NEXT * , CHARACTER
LOC NEXT * , CHARACTER
Generated control file for possible reuse:
OPTIONS(EXTERNAL_TABLE=EXECUTE, TRIM=LRTRIM)
LOAD DATA
INFILE 'DEPT'
APPEND
INTO TABLE DEPT
FIELDS TERMINATED BY ","
(
DEPTNO,
DNAME,
LOC
)
End of generated control file for possible reuse.
enable parallel DML: ALTER SESSION ENABLE PARALLEL DML
creating external table "SYS_SQLLDR_X_EXT_DEPT"
CREATE TABLE "SYS_SQLLDR_X_EXT_DEPT"
(
"DEPTNO" NUMBER(2),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(14)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY LOG_DIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
BADFILE 'LOG_DIR':'DEPT_%p.bad'
LOGFILE 'DEPT_%p.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," LRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"DEPTNO" CHAR(255),
"DNAME" CHAR(255),
"LOC" CHAR(255)
)
)
location
(
'DEPT.dat'
)
)REJECT LIMIT UNLIMITED
executing INSERT statement to load database table DEPT
INSERT /*+ append parallel(auto) */ INTO DEPT
(
DEPTNO,
DNAME,
LOC
)
SELECT
"DEPTNO",
"DNAME",
"LOC"
FROM "SYS_SQLLDR_X_EXT_DEPT"
dropping external table "SYS_SQLLDR_X_EXT_DEPT"
Table DEPT:
7 Rows successfully loaded.
Run began on Fri Aug 21 15:54:18 2015
Run ended on Fri Aug 21 15:54:20 2015
Elapsed time was: 00:00:02.32
CPU time was: 00:00:00.09

In essence, express mode looks at the data dictionary to ascertain the table columns and their order and then makes some assumptions about the content of the data in the incoming file. It’s exactly what Cameron needs.
Conclusion

In Oracle Database 12c Release 1, SQL*Loader express mode opens up opportunities for quickly loading flat files with minimal effort. For complicated file layouts or processing requirements, all the bells and whistles of SQL*Loader and external tables are still available. But for many common flat file cases like the one Cameron encountered, you can skip the scripting and syntax headaches, jump straight in, and load your data.

Next Steps

 LEARN more about SQL*Loader express mode.

 DOWNLOAD Oracle Database 12c.



Photography by Aaron Burson, Unsplash