Just a few thoughts I've been meaning to blog for a while:
Very early on in the Exadata Smart Scan implementation a restriction was added to prevent offloading more than 255 columns. This was done because of performance issues observed by customers. Smart Scan works on 1 MB chunks at a time and needs all the row pieces up to the highest #segcol referenced to be present in the 1 MB chunk that is being processed. When rows with more than 255 columns are first inserted all the row pieces are contiguous and hence likely to be present in the 1 MB chunk.
However, if the DBA has not provided sufficient freespace (PCTFREE) in the block when it is first loaded subsequent DMLs may end up splitting row pieces and moving pieces to blocks in the freelist that have more space available. PCTFREE specifies how much empty space to leave for future updates when a block is first populated.
Let’s consider a row’s lifecycle. Imagine an order processing table where various elements are not known when the row is first inserted and the transaction committed: for example, we may not know the processed date, the ship date, the tracking number, or delivery confirmation. Each of these may be filled by subsequent transactions as they become known. A row which was 150 bytes when first written might finally take up 180 bytes i.e. 20% larger. If we had specified a PCTFREE of 20%, each of those updates would have found room in the original block.
Had we left the default PCTFREE 10%, half of the updates would not have fitted in the original block and Oracle would have had to split the row into two row pieces in order to move the additional data to another block. In order to find a home for that new row piece the free list would have been checked and the new row piece may end up on a block stored some distance away from the original one. But why does this matter? Let’s say we now need to retrieve that row and need a column that’s stored in the new row piece, the ‘NRID’ will be passed to a buffer cache get and getting that block may result in another single block disk I/O. But an Exadata storage cell is not able to initiate disk I/Os to retrieve missing row pieces both because the row-piece may be on a different cell and because it has no access to the undo tablespace on RDBMS to roll a block back to a read consistent state. So, if the free block that is found is not in the 1 MB contiguous region then Smart Scan will not be able to process any of the columns that have been moved out causing the block with the first row piece to be returned to RDBMS for the block to be processed there.
When Hybrid Columnar Compression was introduced as an Exadata-only feature in 188.8.131.52 it didn't have the limitation of needing to store separate row pieces for each set of 255 columns since the entire compression unit is stored in a superblock made up of sufficient consecutive standard blocks and guaranteed not to cross any 1 MB alignment boundaries. Smart Scan on HCC never needs to skip any blocks because of unavailable data and consequently offers a big performance benefit on wide tables and queries that need to reference more than 255 columns.
When Big Data SQL (BDS) Smart Scan was introduced the 255 column limit was left in place partly because of memory requirements of retrieving large numbers of columns in the early versions of BDS offload. Recent versions of BDS have greatly reduced the memory requirements of retrieving large numbers of columns and have greatly improved the performance of very wide scans.
Which leads to "BUG 30043610 - REMOVE 255-COLUMN LIMIT FOR BDSQL SMART-SCAN". The limit has now been lifted and is currently being back-ported to Big Data SQL 19.0. When this eventually arrives in a DBRU I expect wide queries to show a significant performance improvement.
Big Data SQL supports copying read-only Oracle tablespaces to HDFS and accessing them on the HDFS system. These are known inside the query engine as "External-Internal" (EXTINT) scans. For the time being, the 255 column limit has been left in place but I expect that limit to be lifted soon for Hybrid Columnar Compressed tables on HDFS tablespaces to match their behaviour on Exadata.
There is a limited amount of tracing available in the external table row-source that covers events such as granule generation and fallback from Smart Scan to regular scan. This can be accessed via various keywords:
SQL> alter session set "_xt_trace"="execution","compilation","low","time"
but in practice everyone simply uses:
SQL> alter session set "_xt_trace"=all;
to make life easier.