In my previous blogs that you can see here and here, I wrote about what GoldenGate for Big Data Pluggable Formatters are and how JSON & DelimitedText formatter can be used in various event handlers.

Pluggable Formatters help us with enhancing the file generated by File Writer Handler. Their usage is not only limited to File Writer Handler. They can be used with Kafka Handler, Kafka Connect Handler, HDFS Handler and Kinesis Streams Handler.

In this blog, I’ll continue with Pluggable Formatters focusing on Avro OCF Formatter.

As discussed earlier, GoldenGate for Big Data is made of different pieces that can be used together. File Writer Handler is used to generate files from trail files and various event handler can be used to these files to desired target systems. You can find more details about how File Writer Handler works in my previous blog.

Avro OCF is a simple object container file format. A schema is embedded to a file and all objects in the file are written according to that schema using binary coding. Avro ocf handles schema evolution more efficiently than other formats. The Avro OCF Formatter also supports compression and decompression to allow more efficient use of disk space.

GoldenGate for Big Data supports both avro_row_ocf and avro_op_ocf. Avro_row_ocf is a flat structure. It contains only the after image for inserts and update. Avro_op_ocf is a more nested file model. It contains both before and after images. For update operations, both before and after images are written to the file. 

For this blog post, I’ll be using File Writer Handler which generates files on a local directory in GoldenGate for Big Data node. I’ll continue with avro_op_ocf formatting. 

Let’s configure File Writer Handler for avro_op_ocf and see how it looks before any formatters are applied.

Even though the message is not deserialized, you can notice the before and after images in the message. Also, you can notice that schema information is contained in the same message.
 

Avro OCF Formatter

Avro OCF Formatter

Adding source operation indicators

“avro_op_ocf” format gives the ability of writing both before and after images in the same output message. Using the “opKey” properties, avro_op_ocf formatter can also output the type of the source operation. Let’s add these properties into our sample and see how they change the output message.

To make it clearer, I’ve marked the same message. Now, you can see that “U” is added to the begging of the message which indicates that there is an update operation. In other parts of the message, you can also see “I” for insert operations. 
 

Avro OCF Formatter

Avro OCFR Formatter

All Columns as Strings

By default, Avro OCF formatter attempts to map Oracle GoldenGate types to the corresponding Avro type. When set to true, “treatAllColumnsAsStrings” treats all the data as strings in the generated Avro messages and schemas. 

Avro OCF Formatter

Avro OCF

Handling PK Updates

When there is an update in the source PK, by default Avro_OCF formatter will abend the replicat. This behaviour can be controlled by “pkUpdateHandling” parameter. 

When gg.handler.name.format.pkUpdateHandling is set to;

  • Abend: The process will terminate
  • Update: The process handles it as a normal update
  • Delete & Insert: The process handles this operation as a delete and an insert. The full before image is required for this feature to work properly. This can be achieved by using full supplemental logging in Oracle. Without full before and after row images the insert data will be incomplete.

For more details on Avro_OCF formatter, you can refer to GoldenGate for Big Data product documentation from this link.