# HDL Batch Size

Center of Excellence at Oracle

Very often I get this question about what is the concurrency or thread count to be used or recommended batch size while loading data using HCM Data Loader. I find it challenging to provide a generic answer simply because no two customers are same. Recommendations provided to one customer for loading say 1000 employees doesn't have to match recommendations given to another who is also loading 1000 employees.

Let’s say there are 4 customers who are trying to load 1000 employees using HDL. But each of them wants to bring different amount of historical information per employee and hence the number of physical rows is different.

Customer A Customer B Customer C Customer D
• 1000 Employees
• 1000 Employees
• 1000 employees

• 1000 employees

• 1 year of history, physical rows = 1,000
• 3 years of history, physical rows ~ 10,000
• 10 years of history, physical rows ~ 50,000
• 20 years of history, physical rows ~ 100,000

Naturally if everyone follows the same rule while loading their data, it's not going to be efficient. So if you run the program using 50 load threads for HDL and batch size or chunk size = 100  here is what may happen-

Customer A Customer B Customer C Customer D
• 1 year of history, physical rows = 1,000
• 3 years of history, physical rows ~ 10,000
• 10 years of history, physical rows ~ 50,000
• 20 years of history, physical rows ~ 100,000
• Number of batches 10=1000/100 to be divided against max 50 threads.
• (Total objects are 1000 employees)
• Number of batches 10=1000/100 to be divided against max 50 threads.
• (Total objects are 1000 employees)
• Number of batches 10=1000/100 to be divided against max 50 threads.
• (Total objects are 1000 employees)
• Number of batches 10=1000/100 to be divided against max 50 threads.
• (Total objects are 1000 employees)
• 10 threads are being used
• Excellent system performance
• 10 threads are being used
• Should expect reasonable performance
• 10 threads are being used
• Should expect performance issues
• 10 threads are being used
• Should expect worst case system performance

What went wrong?

If you notice, in all the above cases no one is using all of the 50 threads, because there are only 10 batches to be processed. We need to review the chunk size parameter to be able to load this data in reasonable amount of time.

Customer A Customer B Customer C Customer D
• 1 year of history, physical rows = 1,000
• 3 years of history, physical rows ~ 10,000
• 10 years of history, physical rows ~ 50,000
• 20 years of history, physical rows ~ 100,000
• Number of batches 10=1000/100 to be divided against 10 threads.
• Each thread will process 100 objects or 100 employees
• Number of batches 17=1000/60 to be divided against 17 threads.
• Each thread will process 60 objects or 60 employees
• Number of batches 25=1000/40 to be divided against 25 threads.
• Each thread will process 100 objects or 100 employees.
• Number of batches 100=1000/10 to be divided against 50 threads.
• Each thread will process 2 batches of 100 objects each.
• 10 threads are being used
• 17 threads are being used
• 25 threads are being used
• 50 threads are being used

This example should provide far better performance as compared to prior use-case where chunk size was defaulted to 100 irrespective of amount of historical data that was being converted. So remember, there is on one solution that fits all but hopefully this article gives you some insight into the batch size a.k.a. chunk size for HDL data loading.

There are several system level parameters which would affect the data loading performance, I will discuss those in next article.