X

The leading edge of scripting languages and Oracle Database brought to you by the Data Access Development team

Node-oracledb v2 Query Methods and Fetch Tuning

Christopher Jones
Senior Principal Product Manager
Computer screen showing random javascript code (Photo by Markus Spiske on unsplash.com)

 

For our Node.js node-oracledb v2 add-on for Oracle Database we revamped its lower data access layer and made it into a standalone project ODPI-C, which is being reused by several other language APIs. ODPI-C gave us a reason and opportunity to simplify some of internal query handling code in node-oracledb.

To recap, node-oracledb has four ways to execute queries against an Oracle Database. These are the same in version 1 and version 2:

  • Direct Fetches - these are non-ResultSet, non- queryStream() fetches. All rows are returned in one big array, limited to maxRows (v2 allows an unlimited array size).

  • ResultSet getRow() - return one row on each call until all rows are returned.

  • ResultSet getRows(numRows) - return batches of rows in each call until all rows are returned.

  • queryStream() - stream rows until all rows are returned.

The changes in node-oracledb v2 are:

  • Enhanced direct fetches to allow an unlimited number of rows to be fetched, and made this the default. This occurs when maxRows = 0

  • Replaced prefetchRows (previously used for internal fetch buffering and tuning) with a new property fetchArraySize; the default size is 100.

    fetchArraySize affects direct fetches, ResultSet getRow() and queryStream().

  • getRows(numRows,...) internal fetch buffering is now only tuned by the numRows value. Previously prefetchRows could also affect internal buffering.

  • queryStream() now use fetchArraySize for internal buffer sizing.

  • Implemented getRow() in JavaScript for better performance and use fetchArraySize as the way to tune internal buffer sizes.

The change in v2 that I want to discuss is how a 'direct fetch' does its internal buffering.

To make fetching an unlimited number of rows feasible for direct fetches, data is now internally fetched from Oracle Database in batches of size fetchArraySize and then concatenated to form one big array of results that is returned in the callback. This lets you use fetchArraySize to tune fetch performance. In node-oracledb v1, one big array of size maxRows was allocated before the data was fetched from the database. (With node-oracledb v2 you can have the same behavior, if you really want it, by setting fetchArraySize = maxRows, where maxRows > 0).

Let's look at two different scenarios that fetch 50000 rows with fetchArraySize of 100 (the default) and 1000. The code is at the end of the post.

Direct fetch:        rows: 50000, batch size: 100, seconds: 4.054

Direct fetch:        rows: 50000, batch size: 1000, seconds: 1.643

You can see in this case (with a local database) that increasing fetchArraySize improved performance. There may be various factors for this, but the common one is reduced network cost and reduced load on the database because there were fewer 'round trips' between Node.js and the database to get batches of records. Each query and environment will be different, and require its own tuning.

The benefits of using fetchArraySize for direct fetches are:

  • Performance of batching and network transfer of data can be tuned.

  • Memory can incrementally grow when the number of query rows is unknown, or varies from execution to execution. A single large chunk of memory (based on maxRows in v1) doesn't need to pre-allocated to handle the 'worst case' of a large number of rows.

There are two drawbacks with direct fetches:

  • One big array of results is needed. This is the same in v1 and v2.

  • The concatenation of batches of records can use more memory than the final array requires, and can cause fragmentation.

Let's look at timings for all query methods. This is one run; there was expected variability each time I ran the scripts. The 'batch size' number is numRows for getRows(numRows) calls, or fetchArraySize for the other fetch methods.

Direct fetch:        rows: 50000, batch size: 100, seconds: 4.054
ResultSet getRow():  rows: 50000, batch size: 100, seconds: 1.625
ResultSet getRows(): rows: 50000, batch size: 100, seconds: 1.586
queryStream():       rows: 50000, batch size: 100, seconds: 1.691

Direct fetch:        rows: 50000, batch size: 1000, seconds: 1.643
ResultSet getRow():  rows: 50000, batch size: 1000, seconds: 1.471
ResultSet getRows(): rows: 50000, batch size: 1000, seconds: 1.327
queryStream():       rows: 50000, batch size: 1000, seconds: 1.415

The ResultSet and queryStream() methods don't have to store all rows in memory at once so there is less memory management involved. The outlier is obviously the first result: the memory management of concatenating small chunks of memory together is big. We have some ideas on what we can do inside node-oracledb to improve this a bit, but that is a future project to be investigated, and can't eliminate all costs of concatenation, and can't prevent all rows having be held in memory together. [Update: improvements were made to direct fetches in node-oracledb 2.1, and this particular case shows significantly improved performance].

The conclusion is to use ResultSets or streaming for large numbers of rows. This is the same recommendation we gave for v1.

For small numbers of rows, the various query methods perform pretty much the same. The timings are so short that you can treat the differences in the one run shown below as noise. Here each query only returned 1 row:

Direct fetch:        rows: 1, batch size: 100, seconds: 0.011
ResultSet getRow():  rows: 1, batch size: 100, seconds: 0.012
ResultSet getRows(): rows: 1, batch size: 100, seconds: 0.013
queryStream():       rows: 1, batch size: 100, seconds: 0.013

Direct fetch:        rows: 1, batch size: 1, seconds: 0.012
ResultSet getRow():  rows: 1, batch size: 1, seconds: 0.012
ResultSet getRows(): rows: 1, batch size: 1, seconds: 0.013
queryStream():       rows: 1, batch size: 1, seconds: 0.013

Although the timings are small, I suggest using a small fetchArraySize or numRows if you are querying a small handful of rows, particularly if the number of rows is known (such as 1). This reduces the amount of memory that needs to be allocated throughout node-oracledb, the Oracle client libraries, and also in the database.

References

Node.oracledb documentation is here.

If you are currently using node-oracledb v1, you may be interested in the documentation on Migrating from node-oracledb 1.13 to node-oracledb 2.0.

Code

Here are the rudimentary test scripts I used. The ResultSet code originated in the v1 examples from https://jsao.io/2015/07/an-overview-of-result-sets-in-the-nodejs-driver/

The config.js file is at the end. The dbconfig.js file is the same as in the examples.

The timings include statement execution in the DB, though this is not controlled by node-oracledb. With direct fetches there isn't a way in JavaScript to distinguish the query execute cost from the data fetch costs that fetchArraySize controls.

Direct Fetch

// direct fetch

var oracledb = require('oracledb');
var dbConfig = require('./dbconfig.js');
var config = require('./config.js');

oracledb.getConnection(
  dbConfig,
  function(err, connection) {
    if (err) throw err;

    var rowsProcessed = 0;
    var startTime = Date.now();

    connection.execute(
      'select * from all_objects where rownum <= :mr',
      [ config.maxRows ],
      { fetchArraySize: config.batchSize },
      function(err, results) {
        if (err) throw err;

        rowsProcessed = results.rows.length;

        // do work on the rows here

        var t = ((Date.now() - startTime)/1000);
        console.log('Direct fetch:        rows: ' + rowsProcessed +
                    ', batch size: ' + config.batchSize + ', seconds: ' + t);
        
        connection.release(function(err) {
          if (err) console.error(err.message);
        });
      });
  });

ResultSet getRow()

// ResultSet getRow()

var oracledb = require('oracledb');
var dbConfig = require('./dbconfig.js');
var config = require('./config.js');

oracledb.getConnection(
  dbConfig,
  function(err, connection) {
    if (err) throw err;

    var rowsProcessed = 0;
    var startTime = Date.now();

    connection.execute(
      'select * from all_objects where rownum <= :mr',
      [ config.maxRows ],
      {
        resultSet: true,
        fetchArraySize: config.batchSize
      },
      function(err, results) {
        if (err) throw err;

        function processResultSet() {
          results.resultSet.getRow(function(err, row) {
            if (err) throw err;
            if (row) {
              rowsProcessed++;

              // do work on the row here

              processResultSet(); // try to get another row from the result set
              return; // exit recursive function prior to closing result set
            }

            var t = ((Date.now() - startTime)/1000);
            console.log('ResultSet getRow():  rows: ' + rowsProcessed +
                        ', batch size: ' + config.batchSize + ', seconds: ' + t);
            
            results.resultSet.close(function(err) {
              if (err) console.error(err.message);

              connection.release(function(err) {
                if (err) console.error(err.message);
              });
            });
          });
        }

        processResultSet();
      }
    );
  }
);

ResultSet getRows()

// ResultSet getRows()

var oracledb = require('oracledb');
var dbConfig = require('./dbconfig.js');
var config = require('./config.js');

oracledb.getConnection(
  dbConfig,
  function(err, connection) {
    if (err) throw err;

    var rowsProcessed = 0;
    var startTime = Date.now();

    oracledb.fetchArraySize = 1;
    connection.execute(
      'select * from all_objects where rownum <= :mr',
      [ config.maxRows ],
      { resultSet: true },
      function(err, results) {
        var rowsProcessed = 0;

        if (err) throw err;

        function processResultSet() {
          results.resultSet.getRows(config.batchSize, function(err, rows) {
            if (err) throw err;

            if (rows.length) {
              rows.forEach(function(row) {
                rowsProcessed++;

                // do work on the row here
              });

              processResultSet(); // try to get more rows from the result set

              return; // exit recursive function prior to closing result set
            }

            var t = ((Date.now() - startTime)/1000);
            console.log('ResultSet getRows(): rows: ' + rowsProcessed +
                        ', batch size: ' + config.batchSize + ', seconds: ' + t);
            
            results.resultSet.close(function(err) {
              if (err) console.error(err.message);

              connection.release(function(err) {
                if (err) console.error(err.message);
              });
            });
          });
        }
        processResultSet();
      });
  });

queryStream()

// queryStream()

var oracledb = require('oracledb');
var dbConfig = require('./dbconfig.js');
var config = require('./config.js');

oracledb.getConnection(
  dbConfig,
  function(err, connection) {
    if (err) throw err;

    var rowsProcessed = 0;
    var startTime = Date.now();

    var stream = connection.queryStream(
      'select * from all_objects where rownum <= :mr',
      [ config.maxRows ],
      { fetchArraySize: config.batchSize }
    );

    stream.on('data', function (data) {
      // do work on the row here
      rowsProcessed++;
    });

    stream.on('end', function () {
      var t = ((Date.now() - startTime)/1000);
      console.log('queryStream():       rows: ' + rowsProcessed +
                  ', batch size: ' + config.batchSize + ', seconds: ' + t);
      
      connection.close(
        function(err) {
          if (err) { console.error(err.message); }
        });
    });
  });

The Configuration File

// config.js

var maxRows;      // number of rows to query
var batchSize;    // batch size for fetching rows

maxRows = 50000;
batchSize = 1000

exports.maxRows = maxRows;
exports.batchSize = batchSize;

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha