File Listing/XMLType from a stored procedure

Ever wondered how to get a list of file names in a directory from within the Oracle database? See post for more details on OWB. This is something I've done in 2 cases, one which happens to use the XML/SQL capabilities from within a map, and the other using simple indexing from within a process flow....


For a mapping...

Here you will see how java in the database, along with XML/SQL can be used to provide a useful utility for getting a list of files on the OS. With a small piece of java we can return an XML document containing the files in a directory. The document could also contain other useful information about the file such as the updated/created timestamp (you just jave to extend the java and XML returned) etc. You can then use this in SQL queries.


The following Java operation has a directory name parameter and returns a list of file names in the directory. The files are returned in an XMLType instance which has a top level node <files> and <file> child nodes, where the <file> child has an attribute name with the name of the file.


With the java loaded into the database, a JSP wrapper and some permissions granting, we can now write SQL to get the file names;



SELECT
  extractValue(value(GET_XML_FILE_NODE), '/file/@name') FILE_NAME
FROM
  TABLE ((XMLSEQUENCE(EXTRACT((GETFILELIST('/tmp')), '/files/file')))) GET_XML_FILE_NODE;


Can now get all files with a specific name using regular SQL for example;



select * from (  SELECT
  extractValue(value(GET_XML_FILE_NODE), '/file/@name') FILE_NAME
FROM
  TABLE ((XMLSEQUENCE(EXTRACT((GETFILELIST('/tmp')), '/files/file')))) GET_XML_FILE_NODE) where FILE_NAME like 'STG_FILE%';

For a process flow....

For a process flow rather than using the document, we have one procedure to get the number of files in a directory and another for getting the nth file (so we can process from within a loop):

select getfilecount('/tmp') from dual;

to get the 1st file (index 0):

select getnthfile('/tmp', 0) from dual;


The java stored procedures for both cases...

Here is the java code for both of these examples:



import java.io.File;
import java.io.IOException;
import java.sql.SQLException;
import oracle.xml.parser.v2.*;
import oracle.xdb.XMLType;
import oracle.jdbc.driver.OracleConnection;
import oracle.jdbc.pool.OracleDataSource;


public class UtilFileSystem {
     public static XMLType getDirList(String dir) throws SQLException {
       XMLType xmlt = null;
       try {
         File[] logfiles = new File(dir).listFiles();
         XMLDocument xml = (XMLDocument) new XMLDocument();
         XMLElement docse = (XMLElement) xml.createElement("files");
         XMLElement docs = (XMLElement) xml.appendChild( docse);
         OracleDataSource ods = new OracleDataSource();
         ods.setURL("jdbc:oracle:kprb:");
         OracleConnection conn = (OracleConnection) ods.getConnection();
         for (int f = logfiles.length; f > 0; f--) {
           try {
             XMLElement ele = (XMLElement) xml.createElement("file");
             ele.setAttribute("name", logfiles[f-1].getName());
             XMLElement adoc = (XMLElement) docs.appendChild( ele);
          }catch (ArrayIndexOutOfBoundsException e){
              System.err.println("Usage: java ReadFile filename\n");         
              e.printStackTrace();
              throw e;
          }
        }         
        xmlt = new XMLType(conn,xml);
      }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }catch (SQLException gex){
        gex.printStackTrace();
        throw gex;
      }
      return xmlt;
     }

     public static int getFileCount(String dir) throws SQLException {
       try {
         File[] logfiles = new File(dir).listFiles();
         return logfiles.length;
      }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }
     }
     public static String getNthFile(String dir, int n) throws SQLException {
       try {
         File[] logfiles = new File(dir).listFiles();
         return logfiles[n].getName();
       }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }
     }
}


It was loaded into the database using something like the following:



loadjava -user <TGT_SCH>/<TGT_SCH_PASSWD>@<HOST>:<PORT>:<SID> UtilFileSystem.java -resolve -force -verbose -thin


The PLSQL wrappers...

I also created a PLSQL wrapper for the Java (even this is optional in 10g):



create or replace function getfilelist (dir_path IN VARCHAR2) return XMLType is LANGUAGE JAVA NAME 'UtilFileSystem.getDirList(java.lang.String) return oracle.xdb.XMLType';

/

create or replace function getfilecount (dir_path IN VARCHAR2) return NUMBER is LANGUAGE JAVA NAME 'UtilFileSystem.getFileCount(java.lang.String) return int';
/
create or replace function getnthfile (dir_path IN VARCHAR2, n IN NUMBER) return VARCHAR2 is LANGUAGE JAVA NAME 'UtilFileSystem.getNthFile(java.lang.String, int) return java.lang.String';
/


Granting privs for java in the database...

Finally since this was accessing the file system from within the database, the schema running the java must be granted OS permissions, in the example below I have granted access to SCOTT, this should be your schema where the code executes):



exec dbms_java.grant_permission( 'SCOTT', 'SYS:java.io.FilePermission', '/tmp', 'read' )


In summary...

After this you are ready to go, you can now use the functions in OWB mappings (for example use the function itself via a transformation, or via an expression in a constant or expression operator) or process flows transformation activities/expressions.

Kind of fun and useful at the same time!


...odds and ends...

Filtering....

It is also possible to include filters such that only files with a certain pattern/suffix  etc. are retrieved. For example the snippet below could replace the initialization of the logfiles variable in the functions above to list only files with suffix '.dat' (include import java.io.FilenameFilter; also)

/* This example returns files that end with '.dat' */
FilenameFilter filter = new FilenameFilter() {
  public boolean accept(File dir, String name) {
    return name.endsWith(".dat");
  }
};
String[] logfiles = new File(dir).list(filter);
/* also change getnthfile do not need '.getName()' */

Java DB Permissions
If you use the format <directory_name>- this will grant privileges on all directories and files within a directory. With this you could have additional filters, since this would let you read the file modified times and allow you to get all files modified after a certain timestamp.

Comments:

This is a great post. Thanks for giving us this information.
I am having compiling the java code, as i am not a Java developer
and not really sure on how to put the filtering.. its not real clear
can you please email me what the filtering java code would look like.
thanks

Posted by guest on April 11, 2012 at 04:07 AM PDT #

Hi

The example with the FilenameFilter can be found below;
https://blogs.oracle.com/warehousebuilder/resource/OWB11gR2/UtilFileSystem.java

This is slightly different from the blog post, the suffix is also passed in as a parameter so you can call the functions with different suffix. The java stored procedures are also added as a comment in the java file.

declare
cnt varchar2(2000);
nthfile varchar2(2000);
files XMLType;
begin
files := getfilelist('c:\\staging','.csv');
cnt := getfilecount('c:\\staging','.csv');
nthfile := getnthfile('c:\\staging','.csv', 1);
end;

Cheers
David

Posted by David on April 11, 2012 at 10:24 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today