Uploading XLSX files is a very common requirement these days. Application Express at this time only supports uploading CSV files out of the box. The community provides quite a few blog postings and also plug-ins; most of these store the data into a collection, which makes it hard to parse an XLSX outside of an APEX context, e.g. in the background using a scheduler job.
This blog posting shows a method to parse XLSX files with only SQL and PL/SQL. The actual work will be done by a table function, thus no collections and no APEX session is required. Since a table function returns rows and columns, one can easily insert the results into a table, apply some transformation logic, push it into the background with a scheduler job or just display.
The XLSX format is based on XML: an XSLX file is actually a ZIP file (you can easily verify that by renaming to .zip and opening it) containing several XML files. To parse the data, we first have to extract the contents, look up the right XML files and parse these. For the latter task, the XMLTABLE SQL function comes in very handy.
The following example code creates an XLSX_PARSER package. It uses the APEX_ZIP package to unpack the XLSX file and the XMLTABLE SQL function in order to parse the XML. The PARSE function is implemented as a table function which returns the first 50 columns ("A" to "AX"). If you need more columns, you can easily extend the code accordingly; the upper limit is 1000 columns.
create or replace package xlsx_parser is
c_date_format constant varchar2(255) := 'YYYY-MM-DD';
-- we currently support 50 columns - but this can easily be increased. Just increase the columns in the
-- record definition and add corresponing lines into the package body
type xlsx_row_t is record(
line# number,
col01 varchar2(4000), col02 varchar2(4000), col03 varchar2(4000), col04 varchar2(4000), col05 varchar2(4000),
col06 varchar2(4000), col07 varchar2(4000), col08 varchar2(4000), col09 varchar2(4000), col10 varchar2(4000),
col11 varchar2(4000), col12 varchar2(4000), col13 varchar2(4000), col14 varchar2(4000), col15 varchar2(4000),
col16 varchar2(4000), col17 varchar2(4000), col18 varchar2(4000), col19 varchar2(4000), col20 varchar2(4000),
col21 varchar2(4000), col22 varchar2(4000), col23 varchar2(4000), col24 varchar2(4000), col25 varchar2(4000),
col26 varchar2(4000), col27 varchar2(4000), col28 varchar2(4000), col29 varchar2(4000), col30 varchar2(4000),
col31 varchar2(4000), col32 varchar2(4000), col33 varchar2(4000), col34 varchar2(4000), col35 varchar2(4000),
col36 varchar2(4000), col37 varchar2(4000), col38 varchar2(4000), col39 varchar2(4000), col40 varchar2(4000),
col41 varchar2(4000), col42 varchar2(4000), col43 varchar2(4000), col44 varchar2(4000), col45 varchar2(4000),
col46 varchar2(4000), col47 varchar2(4000), col48 varchar2(4000), col49 varchar2(4000), col50 varchar2(4000));
type xlsx_tab_t is table of xlsx_row_t;
--==================================================================================================================
-- table function parses the XLSX file and returns the first 15 columns.
-- pass either the XLSX blob directly or reference a name in the APEX_APPLICATION_TEMP_FILES table.
--
-- p_xlsx_name - NAME column of the APEX_APPLICATION_TEMP_FILES table
-- p_xlsx_content - XLSX as a BLOB
-- p_worksheet_name - Worksheet to extract
--
-- usage:
--
-- select * from table(
-- xlsx_parser.parse(
-- p_xlsx_name => :P1_XLSX_FILE,
-- p_worksheet_name => :P1_WORKSHEET_NAME ) );
--
function parse(
p_xlsx_name in varchar2 default null,
p_xlsx_content in blob default null,
p_worksheet_name in varchar2 default 'sheet1',
p_max_rows in number default 1000000 ) return xlsx_tab_t pipelined;
--==================================================================================================================
-- table function to list the available worksheets in an XLSX file
--
-- p_xlsx_name - NAME column of the APEX_APPLICATION_TEMP_FILES table
-- p_xlsx_content - XLSX as a BLOB
--
-- usage:
--
-- select * from table(
-- xlsx_parser.get_worksheets(
-- p_xlsx_name => :P1_XLSX_FILE ) );
--
function get_worksheets(
p_xlsx_content in blob default null,
p_xlsx_name in varchar2 default null ) return apex_t_varchar2 pipelined;
--==================================================================================================================
-- date and datetimes are stored as a number in XLSX; this function converts that number to an ORACLE DATE
--
-- p_xlsx_date_number numeric XLSX date value
--
-- usage:
-- select xlsx_parser.get_date( 46172 ) from dual;
--
function get_date( p_xlsx_date_number in number ) return date;
end xlsx_parser;
/
sho err
create or replace package body xlsx_parser is
g_worksheets_path_prefix constant varchar2(14) := 'xl/worksheets/';
--==================================================================================================================
function get_date( p_xlsx_date_number in number ) return date is
begin
return
case when p_xlsx_date_number > 61
then DATE'1900-01-01' - 2 + p_xlsx_date_number
else DATE'1900-01-01' - 1 + p_xlsx_date_number
end;
end get_date;
--==================================================================================================================
procedure get_blob_content(
p_xlsx_name in varchar2,
p_xlsx_content in out nocopy blob )
is
begin
if p_xlsx_name is not null then
select blob_content into p_xlsx_content
from apex_application_temp_files
where name = p_xlsx_name;
end if;
exception
when no_data_found then
null;
end get_blob_content;
--==================================================================================================================
function extract_worksheet(
p_xlsx in blob,
p_worksheet_name in varchar2 ) return blob
is
l_worksheet blob;
begin
if p_xlsx is null or p_worksheet_name is null then
return null;
end if;
l_worksheet := apex_zip.get_file_content(
p_zipped_blob => p_xlsx,
p_file_name => g_worksheets_path_prefix || p_worksheet_name || '.xml' );
if l_worksheet is null then
raise_application_error(-20000, 'WORKSHEET "' || p_worksheet_name || '" DOES NOT EXIST');
end if;
return l_worksheet;
end extract_worksheet;
--==================================================================================================================
procedure extract_shared_strings(
p_xlsx in blob,
p_strings in out nocopy wwv_flow_global.vc_arr2 )
is
l_shared_strings blob;
begin
l_shared_strings := apex_zip.get_file_content(
p_zipped_blob => p_xlsx,
p_file_name => 'xl/sharedStrings.xml' );
if l_shared_strings is null then
return;
end if;
select shared_string
bulk collect into p_strings
from xmltable(
xmlnamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
'//si'
passing xmltype.createxml( l_shared_strings, nls_charset_id('AL32UTF8'), null )
columns
shared_string varchar2(4000) path 't/text()' );
end extract_shared_strings;
--==================================================================================================================
procedure extract_date_styles(
p_xlsx in blob,
p_format_codes in out nocopy wwv_flow_global.vc_arr2 )
is
l_stylesheet blob;
begin
l_stylesheet := apex_zip.get_file_content(
p_zipped_blob => p_xlsx,
p_file_name => 'xl/styles.xml' );
if l_stylesheet is null then
return;
end if;
select lower( n.formatCode )
bulk collect into p_format_codes
from
xmltable(
xmlnamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
'//cellXfs/xf'
passing xmltype.createxml( l_stylesheet, nls_charset_id('AL32UTF8'), null )
columns
numFmtId number path '@numFmtId' ) s,
xmltable(
xmlnamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
'//numFmts/numFmt'
passing xmltype.createxml( l_stylesheet, nls_charset_id('AL32UTF8'), null )
columns
formatCode varchar2(255) path '@formatCode',
numFmtId number path '@numFmtId' ) n
where s.numFmtId = n.numFmtId ( + );
end extract_date_styles;
--==================================================================================================================
function convert_ref_to_col#( p_col_ref in varchar2 ) return pls_integer is
l_colpart varchar2(10);
l_linepart varchar2(10);
begin
l_colpart := replace(translate(p_col_ref,'1234567890','__________'), '_');
if length( l_colpart ) = 1 then
return ascii( l_colpart ) - 64;
else
return ( ascii( substr( l_colpart, 1, 1 ) ) - 64 ) * 26 + ( ascii( substr( l_colpart, 2, 1 ) ) - 64 );
end if;
end convert_ref_to_col#;
--==================================================================================================================
procedure reset_row( p_parsed_row in out nocopy xlsx_row_t ) is
begin
-- reset row
p_parsed_row.col01 := null; p_parsed_row.col02 := null; p_parsed_row.col03 := null; p_parsed_row.col04 := null; p_parsed_row.col05 := null;
p_parsed_row.col06 := null; p_parsed_row.col07 := null; p_parsed_row.col08 := null; p_parsed_row.col09 := null; p_parsed_row.col10 := null;
p_parsed_row.col11 := null; p_parsed_row.col12 := null; p_parsed_row.col13 := null; p_parsed_row.col14 := null; p_parsed_row.col15 := null;
p_parsed_row.col16 := null; p_parsed_row.col17 := null; p_parsed_row.col18 := null; p_parsed_row.col19 := null; p_parsed_row.col20 := null;
p_parsed_row.col21 := null; p_parsed_row.col22 := null; p_parsed_row.col23 := null; p_parsed_row.col24 := null; p_parsed_row.col25 := null;
p_parsed_row.col26 := null; p_parsed_row.col27 := null; p_parsed_row.col28 := null; p_parsed_row.col29 := null; p_parsed_row.col30 := null;
p_parsed_row.col31 := null; p_parsed_row.col32 := null; p_parsed_row.col33 := null; p_parsed_row.col34 := null; p_parsed_row.col35 := null;
p_parsed_row.col36 := null; p_parsed_row.col37 := null; p_parsed_row.col38 := null; p_parsed_row.col39 := null; p_parsed_row.col40 := null;
p_parsed_row.col41 := null; p_parsed_row.col42 := null; p_parsed_row.col43 := null; p_parsed_row.col44 := null; p_parsed_row.col45 := null;
p_parsed_row.col46 := null; p_parsed_row.col47 := null; p_parsed_row.col48 := null; p_parsed_row.col49 := null; p_parsed_row.col50 := null;
end reset_row;
--==================================================================================================================
function parse(
p_xlsx_name in varchar2 default null,
p_xlsx_content in blob default null,
p_worksheet_name in varchar2 default 'sheet1',
p_max_rows in number default 1000000 ) return xlsx_tab_t pipelined
is
l_worksheet blob;
l_xlsx_content blob;
l_shared_strings wwv_flow_global.vc_arr2;
l_format_codes wwv_flow_global.vc_arr2;
l_parsed_row xlsx_row_t;
l_first_row boolean := true;
l_value varchar2(32767);
l_line# pls_integer := 1;
l_real_col# pls_integer;
l_row_has_content boolean := false;
begin
if p_xlsx_content is null then
get_blob_content( p_xlsx_name, l_xlsx_content );
else
l_xlsx_content := p_xlsx_content;
end if;
if l_xlsx_content is null then
return;
end if;
l_worksheet := extract_worksheet(
p_xlsx => l_xlsx_content,
p_worksheet_name => p_worksheet_name );
extract_shared_strings(
p_xlsx => l_xlsx_content,
p_strings => l_shared_strings );
extract_date_styles(
p_xlsx => l_xlsx_content,
p_format_codes => l_format_codes );
-- the actual XML parsing starts here
for i in (
select
r.xlsx_row,
c.xlsx_col#,
c.xlsx_col,
c.xlsx_col_type,
c.xlsx_col_style,
c.xlsx_val
from xmltable(
xmlnamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
'//row'
passing xmltype.createxml( l_worksheet, nls_charset_id('AL32UTF8'), null )
columns
xlsx_row number path '@r',
xlsx_cols xmltype path '.'
) r, xmltable (
xmlnamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
'//c'
passing r.xlsx_cols
columns
xlsx_col# for ordinality,
xlsx_col varchar2(15) path '@r',
xlsx_col_type varchar2(15) path '@t',
xlsx_col_style varchar2(15) path '@s',
xlsx_val varchar2(4000) path 'v/text()'
) c
where p_max_rows is null or r.xlsx_row <= p_max_rows
) loop
if i.xlsx_col# = 1 then
l_parsed_row.line# := l_line#;
if not l_first_row then
pipe row( l_parsed_row );
l_line# := l_line# + 1;
reset_row( l_parsed_row );
l_row_has_content := false;
else
l_first_row := false;
end if;
end if;
if i.xlsx_col_type = 's' then
if l_shared_strings.exists( i.xlsx_val + 1) then
l_value := l_shared_strings( i.xlsx_val + 1);
else
l_value := '[Data Error: N/A]' ;
end if;
else
if l_format_codes.exists( i.xlsx_col_style + 1 ) and (
instr( l_format_codes( i.xlsx_col_style + 1 ), 'd' ) > 0 and
instr( l_format_codes( i.xlsx_col_style + 1 ), 'm' ) > 0 )
then
l_value := to_char( get_date( i.xlsx_val ), c_date_format );
else
l_value := i.xlsx_val;
end if;
end if;
pragma inline( convert_ref_to_col#, 'YES' );
l_real_col# := convert_ref_to_col#( i.xlsx_col );
if l_real_col# between 1 and 50 then
l_row_has_content := true;
end if;
-- we currently support 50 columns - but this can easily be increased. Just add additional lines
-- as follows:
-- when l_real_col# = {nn} then l_parsed_row.col{nn} := l_value;
case
when l_real_col# = 1 then l_parsed_row.col01 := l_value;
when l_real_col# = 2 then l_parsed_row.col02 := l_value;
when l_real_col# = 3 then l_parsed_row.col03 := l_value;
when l_real_col# = 4 then l_parsed_row.col04 := l_value;
when l_real_col# = 5 then l_parsed_row.col05 := l_value;
when l_real_col# = 6 then l_parsed_row.col06 := l_value;
when l_real_col# = 7 then l_parsed_row.col07 := l_value;
when l_real_col# = 8 then l_parsed_row.col08 := l_value;
when l_real_col# = 9 then l_parsed_row.col09 := l_value;
when l_real_col# = 10 then l_parsed_row.col10 := l_value;
when l_real_col# = 11 then l_parsed_row.col11 := l_value;
when l_real_col# = 12 then l_parsed_row.col12 := l_value;
when l_real_col# = 13 then l_parsed_row.col13 := l_value;
when l_real_col# = 14 then l_parsed_row.col14 := l_value;
when l_real_col# = 15 then l_parsed_row.col15 := l_value;
when l_real_col# = 16 then l_parsed_row.col16 := l_value;
when l_real_col# = 17 then l_parsed_row.col17 := l_value;
when l_real_col# = 18 then l_parsed_row.col18 := l_value;
when l_real_col# = 19 then l_parsed_row.col19 := l_value;
when l_real_col# = 20 then l_parsed_row.col20 := l_value;
when l_real_col# = 21 then l_parsed_row.col21 := l_value;
when l_real_col# = 22 then l_parsed_row.col22 := l_value;
when l_real_col# = 23 then l_parsed_row.col23 := l_value;
when l_real_col# = 24 then l_parsed_row.col24 := l_value;
when l_real_col# = 25 then l_parsed_row.col25 := l_value;
when l_real_col# = 26 then l_parsed_row.col26 := l_value;
when l_real_col# = 27 then l_parsed_row.col27 := l_value;
when l_real_col# = 28 then l_parsed_row.col28 := l_value;
when l_real_col# = 29 then l_parsed_row.col29 := l_value;
when l_real_col# = 30 then l_parsed_row.col30 := l_value;
when l_real_col# = 31 then l_parsed_row.col31 := l_value;
when l_real_col# = 32 then l_parsed_row.col32 := l_value;
when l_real_col# = 33 then l_parsed_row.col33 := l_value;
when l_real_col# = 34 then l_parsed_row.col34 := l_value;
when l_real_col# = 35 then l_parsed_row.col35 := l_value;
when l_real_col# = 36 then l_parsed_row.col36 := l_value;
when l_real_col# = 37 then l_parsed_row.col37 := l_value;
when l_real_col# = 38 then l_parsed_row.col38 := l_value;
when l_real_col# = 39 then l_parsed_row.col39 := l_value;
when l_real_col# = 40 then l_parsed_row.col40 := l_value;
when l_real_col# = 41 then l_parsed_row.col41 := l_value;
when l_real_col# = 42 then l_parsed_row.col42 := l_value;
when l_real_col# = 43 then l_parsed_row.col43 := l_value;
when l_real_col# = 44 then l_parsed_row.col44 := l_value;
when l_real_col# = 45 then l_parsed_row.col45 := l_value;
when l_real_col# = 46 then l_parsed_row.col46 := l_value;
when l_real_col# = 47 then l_parsed_row.col47 := l_value;
when l_real_col# = 48 then l_parsed_row.col48 := l_value;
when l_real_col# = 49 then l_parsed_row.col49 := l_value;
when l_real_col# = 50 then l_parsed_row.col50 := l_value;
else null;
end case;
end loop;
if l_row_has_content then
l_parsed_row.line# := l_line#;
pipe row( l_parsed_row );
end if;
return;
end parse;
--==================================================================================================================
function get_worksheets(
p_xlsx_content in blob default null,
p_xlsx_name in varchar2 default null ) return apex_t_varchar2 pipelined
is
l_zip_files apex_zip.t_files;
l_xlsx_content blob;
begin
if p_xlsx_content is null then
get_blob_content( p_xlsx_name, l_xlsx_content );
else
l_xlsx_content := p_xlsx_content;
end if;
l_zip_files := apex_zip.get_files(
p_zipped_blob => l_xlsx_content );
for i in 1 .. l_zip_files.count loop
if substr( l_zip_files( i ), 1, length( g_worksheets_path_prefix ) ) = g_worksheets_path_prefix then
pipe row( rtrim( substr( l_zip_files ( i ), length( g_worksheets_path_prefix ) + 1 ), '.xml' ) );
end if;
end loop;
return;
end get_worksheets;
end xlsx_parser;
/
sho err
The package provides the following functions:
FUNCTION GET_WORKSHEETS RETURNS APEX_T_VARCHAR2
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
P_XLSX_CONTENT BLOB IN DEFAULT
P_XLSX_NAME VARCHAR2 IN DEFAULT
FUNCTION PARSE RETURNS XLSX_TAB_T
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
P_XLSX_NAME VARCHAR2 IN DEFAULT
P_XLSX_CONTENT BLOB IN DEFAULT
P_WORKSHEET_NAME VARCHAR2 IN DEFAULT
P_MAX_ROWS NUMBER IN DEFAULT
You can pass either the XLSX content as a BLOB or the name of an uploaded file. In the latter case, the package will look up the file name in the APEX_APPLICATION_TEMP_FILES table. A typical call sequence is as follows:
- Look up the worksheets contained in the XSLX file with XLSX_PARSER.GET_WORKSHEETS
- Retrieve the Worksheet data with XLSX_PARSER.PARSE.
Let's try this out. Create an APEX application with an empty page, then add the following elements to that page.
- Add a Region of Static HTML type.
- Add a File Browse item named PX_XLSX_FILE. Choose Table APEX_APPLICATION_TEMP_FILES as the Storage Type and keep the files until the End Of Session.
- Add a Button to Upload the file. Choose Submit Page as the button action.
Your page should now look as follows:
You can already upload a file, but nothing will happen. Next, we'll want to show a select list to pick one of the worksheets contained in the XLSX file.
Now run your page again. After you have uploaded an XSLX file, your page should look as follows:

So we can now pick a worksheet. So far, so good. Time to actually do the job and extract data from the XLSX file. For now, we just want to display the data as a classic report. So create a Classic Report region and use the following SQL Query.
select * from table(
xlsx_parser.parse(
p_xlsx_name => :PX_XLSX_FILE,
p_worksheet_name => :PX_WORKSHEET ) );
Add the PX_WORKSHEET item to the Classic Reports Page Items to Submit attribute. Then add a Dynamic Action in order to refresh the report when a worksheet has been chosen in the Select List item.
- The dynamic action should fire on the Change event of the PX_WORKSHEET item
- As the TRUE action, choose Refresh of the Classic Report region you just have created.
You then should be able to do the following steps:
- Run the page and upload an XLSX file

- Pick a worksheet from the select list

- The Classic Report refreshes and shows the first 50 columns of worksheet data

And that's it ... the nice thing is that there are no limits to process the worksheet data ...
- You can simply create a table ...
create table worksheet_data as
select col02 as first_name,
col03 as last_name,
col05 as country
from table( xlsx_parser.parse(
p_xlsx_name => :PX_XLSX_FILE,
p_worksheet_name => :PX_WORKSHEET ) );
- You can apply a few transformations ...
insert into worksheet_data(
select cast( col02 as varchar2(200) ) as first_name,
cast( col03 as varchar2(200) ) as last_name,
case col04
when 'Female' then 'F'
when 'Male' then 'M' end as gender,
cast( col05 as varchar2(200) ) as country,
to_number( to_char( sysdate, 'YYYY') ) - to_number( col06 ) as birth_year
from table( xlsx_parser.parse(
p_xlsx_name => :PX_XLSX_FILE,
p_worksheet_name => :PX_WORKSHEET ) )
where line# != 1)
- You can also create a DBMS_SCHEDULER job to push the task into the background. Note that this job does not have access to APEX session state - so you need to store the BLOB into your own table, then.
Try the sample code out - once the table function returns data, there are no limits for further processing any more. You can do with the data whatever you want: from just displaying over loading into a table to executing complex processing in the background: Everything is possible.
Do you think I could modify this to support CLOBs in the columns? I have a data feed that includes a column that can be up to 8k.
TIA
yes, technically this is possible. The XMLTABLE expression within the "extract_shared_strings" procedure would need to be changed to extract CLOBs instead of VARCHAR2. However, that will have an impact on performance. VARCHAR2 is way faster than CLOB.
Best regards
-Carsten
i have a remaining question though.. when i upload an xlsx file with some empty column values, the algorithm fill them with the same column values in the previous line and so on. How can i change this ?
Thanks
Thank you very much for this.
But, it seems that there is 2 small bugs in the "parse" function :
- l_parsed_row must be reset after each pipe row (or else when a cell is empty on a row, we will get the value of the preceding "non-empty" row)
- there is a missing "pipe row" for the last row.
Best Regards
-) No longer omits the last line
-) Empty cells are now correctly handled
-) Date values are detected ( Thanks and Kudos to André Meier)
Best regards
-Carsten
Dave
firstly, thank you very much Carsten.
I have created a little test app at:
https://apex.oracle.com/pls/apex/f?p=61155
Please also see the Info area there.
Best Regards
Andre
And just in time for our project.
I have one issue:
In the Worksheet Select List appear only generic names (sheet1, sheet2, sheet3, etc.) and not the actual sheet names in the Excel File.
Did i miss something?
indeed - that current version of the package does not extract the display name of a worksheet. That would require some more lookings within the XLSX structure. I might add that later on ...
Best regards
-Carsten
I just have one issue to discuss:
The date columns are not being interpreted correctly from our spreadsheet. For example, we have a date of 2/26/2007 in the Excel spreadsheet, formatted with the *3/14/2001 template for that column. When i run this query:
select *
from table (xlsx_parser.parse(p_xlsx_content => get_blob('BLOB_FROM_DATABASE'),p_worksheet_name => 'sheet2') );
this date is returned as 39139. Another column with the same date formatting, has a date of 8/30/2017, but the query returns 42977. The same happens for all date columns in our spreadsheet. Is there some way I can correct this issue?
Thanks,
Stan
Can you send me a sample of your XLSX privately ({firstname}.{lastname}@oracle.com ...?
it seems that the "Date detection" logic in lines 305 to 310 does not detect your date cells correctly. You might simply adjust these lines to detect your date format as well ... having a sample of your XLSX I can probably tell more ...
Best regards
-Carsten