OIC has the capability of processing huge CSV files (up to 1GB) using Stage "Read File In Segments" activity. In some use-cases, customers require to calculate the sum of a column or count of rows.
Consider the following payload. Say, we want to calculate the sum of all the values in the column "Amount1". Customers tend to declare a variable upstream to Stage action and keep on updating this variable with the computation done in each chunk of the Stage.
Updating the upstream variables inside the Stage "Read File in Segments" will cause performance impact and this pattern does not allow stage processing in parallel (you will observe a warning message "Stage File Read File in Segments includes action that will result in segments being processed sequentially" in integration canvas).
This blog describes one of the optimal approaches to use the aggregate functions Sum, Count while processing larger files.
1) Read the file in Stage "Read File in Segments".
2) Create the Stage "Write" with "Append to Existing File" inside Stage "Read File In Segments". Get the sum and count of each segment & write the result into it.
count ($StageSegments/.../Record )
3) After completion of Stage "File in Segments", create a Stage "List" to list the file which is created in the previous step. Stage "List" response contains file size. We will use the file size to decide on the next actions.
4) If the file size is less than 10 MB then read the file using Stage "Read Entire File" and create Assign action with Variables to store the grand totals.
Otherwise, repeat steps 1,2,3 with the new file.