Data Pump includes the capability to compress the metadata associated with a Data Pump job. Data Pump related compression was extended so that table data can also be compressed on export (this extended capability is a feature of Advanced Compression). Note that Data Pump compression is completely independent of Advanced Row Compression (and Hybrid Columnar Compression) – so it does not matter how (or even if) a table is compressed within the database.
Data Pump compression only compresses the dumpfile. Data Pump will execute a query to retrieve the table data from the database, and the database will decompress the data (if compressed) before delivering it to Data Pump. If you have set COMPRESS=ALL or COMPRESS=DATA-ONLY, then Data Pump will compress the table data in its export file, using an algorithm that is different from any of the table compression algorithms used by Oracle Database.
The following Data Pump options determine which parts of a dump file set should be compressed:
- ALL – Enables compression for the entire export operation.
- DATA-ONLY – Results in all data being written to the dump file in compressed format.
- METADATA-ONLY – Results in all metadata being written to the dump file in compressed format. This is the default.
- NONE – Disables compression for the entire export operation.
Data Pump compression is an inline operation, so the reduced dump file size means a significant savings in disk space. Unlike operating system or file system compression utilities, Data Pump compression is fully inline on the import side as well, so there is no need to decompress a dump file before importing it. The compressed dump file sets are automatically decompressed during import without any additional steps by the Database Administrator.
An expdp command-line option for Oracle Data Pump Export can control the degree of compression used (BASIC, LOW, MEDIUM or HIGH) for an Oracle Data Pump dump file – the same options can be specified to the PL/SQL DBMS_DATAPUMP package.
The higher the degree of compression, the higher the latency incurred but the better compression ratio achieved. That is, the HIGH option will likely incur more overhead, but should compress the data better. These options enable the DBA to trade off time spent compressing data against the size of the Oracle Data Pump dump file.
Compression can really improve performance of data pump operations whenever the export or import is IO-bound. This is often the case when transferring over a dblink using the NETWORK_LINK parameter (a 10GB Ethernet is limited to 4TB/hr maximum – although we have seen customers move 6TB/hr over such a network by applying compression).
IO can also be the bottleneck when using high degrees of PARALLEL over a limited number of spindles or IO controllers. Data pump is built to use all the hardware you can throw at it, and performance is all about removing bottlenecks. Compression can really help performance because it can apply CPU to reduce IO when IO is the bottleneck, which is so often the case.
The reduction in dump file size will vary based on data types and other factors.
Note that when importing using Data Pump, the CREATE TABLE statements will have compression clauses that match the definition in the export file. If a compression clause is missing, then the table inherits the COMPRESSION attributes of the tablespace where the table is stored. When Data Pump loads a table, the metadata for the table determines whether the data being loaded is compressed for the table. It does not matter whether the data was compressed in the export database. All that matters is the compression characteristics of the table in the import database.
If you have not thought about how compression can help your backup storage requirements it is probably worth some time to look into it, but as usual, it is always best to test on your systems with your data.