Welcome Everyone….
In this blog we would talk about how to fix an error that you would hit while using the file writer handler and your source table has a special character ($, #) in schema name, table name or column name.
ERROR 2023-09-15 12:00:33.000873 [main] - The HDFS Avro Row Formatter formatOp operation failed.
org.apache.avro.SchemaParseException: Illegal character in: WEBSITE$
We are using Goldengate for Bigdata 21.9 installed on a VM Server for this blog.
How to resolve the error:
If you are using Goldengate for Bigdata file writer handler to generate avro or parquet files you would hit the below error in case of your source table has a special character ($, #) in schema name, table name or column name.
ERROR 2023-09-15 12:00:33.000873 [main] - The HDFS Avro Row Formatter formatOp operation failed.
org.apache.avro.SchemaParseException: Illegal character in: WEBSITE$
In my case the source table I am using has two special characters $ and # and I am generating parquet files in Google Cloud Storage.
(
CUSTOMER_ID NUMBER ,
NAME VARCHAR2 (255) NOT NULL ,
ADDRESS VARCHAR2 (255) ,
WEBSITE$ VARCHAR2 (255) ,
CREDIT_LIMIT# NUMBER (8,2)
)
TABLESPACE DATA
LOGGING
;
ALTER TABLE SOURCE.CUSTOMERS_SPECIALCHARTABLE
ADD PRIMARY KEY ( CUSTOMER_ID )
USING INDEX LOGGING ;
We need to use the properties
# Properties file for Replicat RSPCHAR
#File Writer Handler Template
gg.handlerlist=filewriter
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.mode=op
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.fileNameMappingTemplate=${fullyQualifiedTableName}_${currentTimestamp}.avro
gg.handler.filewriter.finalizeAction=delete
gg.handler.filewriter.fileRollInterval=0
gg.handler.filewriter.inactivityRollInterval=0
gg.handler.filewriter.format=avro_row_ocf
gg.handler.filewriter.format.metaColumnsTemplate=${optype},${position}
gg.handler.filewriter.fileWriteActiveSuffix=.tmp
gg.handler.filewriter.partitionByTable=true
gg.handler.filewriter.rollOnShutdown=true
gg.handler.filewriter.format.pkUpdateHandling=delete-insert
#Selecting the GCS Event Handler
gg.handler.filewriter.eventHandler=parquet
#The Parquet Event Handler
gg.eventhandler.parquet.type=parquet
gg.eventhandler.parquet.pathMappingTemplate=./dirparquet
gg.eventhandler.parquet.writeToHDFS=false
gg.eventhandler.parquet.finalizeAction=delete
#Selecting the S3 Event Handler
gg.eventhandler.parquet.eventHandler=gcs
gg.eventhandler.parquet.fileNameMappingTemplate=${tableName}_${currentTimestamp}.parquet
#The GCS Event handler
gg.eventhandler.gcs.type=gcs
gg.eventhandler.gcs.pathMappingTemplate=${fullyQualifiedTableName}
#TODO: Edit the GCS bucket name
gg.eventhandler.gcs.bucketMappingTemplate=<bucket_name>
#TODO: Edit the GCS credentialsFile
gg.eventhandler.gcs.credentialsFile=/dir/gcs_credential_file.json
gg.eventhandler.gcs.finalizeAction=none
gg.schemareplaceregex=[$]
gg.schemareplacestring=
gg.schemareplaceregex1=[#]
gg.schemareplacestring1=
#TODO: Edit to include the GCS Java SDK .
gg.classpath=/gcs_1.113.9/*:/parquet_1.13.1/*:/hadoop_3.3.6/*
jvm.bootoptions=-Xmx512m -Xms32m
Conclusion
In this blog , we learned how we can fix the Goldengate for Bigdata replicat when source table with special character ($, #) in schema name, table name or column name.
