FairScheduling Conventions in Hadoop
By dan.mcclary on Jun 27, 2012
While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things:
- Organizations are still determining what their dataflow and analysis workloads will comprise
- Small deployments under tests aren't likely to show the signs of strains that would send someone looking for resource allocation options
- The default scheduling options -- the FairScheduler and the CapacityScheduler -- are not placed in the most prominent position within the Hadoop documentation.
However, for production deployments, it's wise to start with at least the foundations of scheduling in place so that you can tune the cluster as workloads emerge. To do that, we have to ask ourselves something about what the off-the-rack scheduling options are. We have some choices:
- The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis.
- The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis.
- Writing your own implementation of the abstract class org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually overkill.
To enable fair scheduling, we're going to need to do a couple of things. First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. We do this by adding the following to the mapred-site.xml file in HADOOP_HOME/conf:
What we've done here is simply tell the JobTracker that we'd like to task scheduling to use the FairScheduler class rather than a single FIFO queue. Moreover, we're going to be defining our resource pools and allocations in a file called allocations.xml For reference, the allocation file is read every 15s or so, which allows for tuning allocations without having to take down the JobTracker.
Our allocation file is now going to look a little like this
<?xml version="1.0"?><allocations><pool name="dan"><minMaps>5</minMaps><minReduces>5</minReduces><maxMaps>25</maxMaps><maxReduces>25</maxReduces><minSharePreemptionTimeout>300</minSharePreemptionTimeout></pool><mapreduce.job.user.name="dan"><maxRunningJobs>6</maxRunningJobs></user><userMaxJobsDefault>3</userMaxJobsDefault><fairSharePreemptionTimeout>600</fairSharePreemptionTimeout></allocations>
In this case, I've explicitly set my username to have upper and lower bounds on the maps and reduces, and allotted myself double the number of running jobs. Now, if I run hive or pig jobs from either the console or via the Hue web interface, I'll be treated "fairly" by the JobTracker. There's a lot more tweaking that can be done to the allocations file, so it's best to dig down into the description and start trying out allocations that might fit your workload.