The File Writer Handler is designed to stage data to the local file system and then to load completed data files to respective targets, such as HDFS and S3.

The parameters described in this article may impact the performance while working with FileWriter handler.
gg.handler.filewriter.maxFileSize=1gb
gg.handler.filewriter.fileRollInterval=7m
- In these parameters, either the files will be rolled (or created) only after 7 minutes or once the file size has reached its maximum capacity (1GB here), and then the data is loaded to S3 for the following sample parameter values.
- Increasing the value of these parameters improves the performance. If the values are reduced, then it takes a direct toll on the handler performance, thereby making it more latent.
gg.handler.filewriter.bufferSize
- This parameter sets the size of the BufferedOutputStream for each active write stream.
- Setting this parameter to a larger value may improve the performance, especially when there are a few active write streams.
- However, if a large number of operations are written to the active streams, then DO NOT increase the value of this parameter at it can result in an out-of-memory exception by exhausting the Java heap.
gg.log=info
- Log levels trace and debug are expensive and when used slows the performance: Oracle recommends to use log level as info: gg.log=info.
GROUPTRANSOPS
- One of the best ways to get better performance while using FW handler is via the GROUPTRANSOPS parameter. The GROUPTRANSOPS parameter is part of the replicat param file.
- Setting GROUPTRANSOPS to a low value has a significant effect on the performance.
- The handler does flush at transaction commit to ensure write durability. This is expensive. So, if GROUPTRANSOPS is set to a higher value, then we flush less often resulting in better performance.
The FW Handler stages data to local files, and therefore a large amount of the processing time is spent on the file IO. File IO tends to be a very linear resource whereby performance may not show much or even any improvement by parallelizing processing. Users might consider hardware changed, which would improve the file IO speed including solid state disks or RAID.
Want to Learn More?
Would like to sincerely thank Sabareesh Babu, Principal Member of Technical Staff, Sanal Vasudevan, Principal Member of Technical Staff and Tom Campbell, Consulting Member of Technical Staff for all the data points, inputs, and approval.
