By Orgad Kimchi-Oracle on Jul 20, 2015
Apache Hadoop is an open source distributed computing framework designed to process very large unstructured
data sets. It is composed of two main subsystems: data storage and data analysis. Apache Hadoop was developed
to address four system principles: the ability to reliably scale processing across multiple physical (or virtual)
nodes, moving code to data, dealing gracefully with node failures, and abstracting the complexity of distributed
and concurrent applications.
OpenStack is free and open source cloud software typically deployed as an infrastructure as a service (IaaS) solution.
The OpenStack platform exists as a combination of several interrelated subprojects that control the provisioning of
compute, storage, and network resources in a data center.
OpenStack technology simplifies the deployment of data center resources while at the same time providing
a unified resource management tool.
Oracle Solaris 11 includes a complete OpenStack distribution called Oracle OpenStack for Oracle Solaris.
This article starts with a brief overview of Hadoop and OpenStack,and follows with an example
of setting up a Hadoop cluster that has two NameNodes, a Resource Manager, a History Server, and three DataNodes.
As a prerequisite, you should have a basic understanding of Oracle SolarisZones and network administration.