In this blog post, I'd like to briefly review the high availability functionality in Oracle Big Data Appliance and Big Data Cloud Service. The good news on all of this is that most of these features are always available out of the box on your systems, and that no extra steps are required from your end. One of the key value-adds of leveraging a hardened system from Oracle.
A special shout-out to Sandra and Ravi from our team, for helping with this blog post.
For this post on HA, we'll subdivide the content into the following topics:
1. High Availability in Hardware Components
When we are talking about an on-premise solution, it is important to understand the fault tolerance and HA built into the actual hardware you have on the floor. Based on Oracle Exadata and the experience we have in managing mission critical systems, a BDA is built out of components to handle hardware faults and simply stay up and running. Networking is redundant, power supplies in the racks are redundant, ILOM software tracks the health of the system and ASR pro-actively logs SRs if needed on hardware issues. You can find a lot more information here.
2. High availability within a single node
Talking about high availability within a single node, I'd like to focus on disk failures. In large clusters, disk failures do occur but should - in general - nor cause any issues for BDA and BDCS customers. First let's have a look at the disk representation (minus data directories) for the Oracle system:
[root@bdax72bur09node02 ~]# df -h|grep -v "/u"
Filesystem Size Used Avail Use% Mounted on
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 8.0K 126G 1% /dev/shm
tmpfs 126G 67M 126G 1% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/md2 961G 39G 874G 5% /
/dev/md6 120G 717M 113G 1% /ssddisk
/dev/md0 454M 222M 205M 53% /boot
/dev/sda1 191M 16M 176M 9% /boot/efi
/dev/sdb1 191M 0 191M 0% /boot/rescue-efi
cm_processes 126G 309M 126G 1% /run/cloudera-scm-agent/process
Next, let's take a look where critical services store their data.
- Name Node. Aparently most critical HDFS component. It stores FSimage file and edits on the hard disks, let's check where:

[root@bdax72bur09node02 ~]# df -h /opt/hadoop/dfs/nn
Filesystem Size Used Avail Use% Mounted on
/dev/md2 961G 39G 874G 5% /
- Journal Node:
[root@bdax72bur09node02 ~]# df /opt/hadoop/dfs/jn
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md6 124800444 733688 117704132 1% /ssddisk
[root@bdax72bur09node02 ~]# ls -l /opt/hadoop/dfs/jn
lrwxrwxrwx 1 root root 15 Jul 15 22:58 /opt/hadoop/dfs/jn -> /ssddisk/dfs/jn
- Zookeeper:
[root@bdax72bur09node02 ~]# df /var/lib/zookeeper
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md6 124800444 733688 117704132 1% /ssddisk
all these services store their data on RAIDs /dev/md2 and /dev/md6.
Let's take a look on what it consist of:
[root@bdax72bur09node02 ~]# mdadm --detail /dev/md2
/dev/md2:
...
Array Size : 1023867904 (976.44 GiB 1048.44 GB)
Used Dev Size : 1023867904 (976.44 GiB 1048.44 GB)
Raid Devices : 2
Total Devices : 2
...
Active Devices : 2
...
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
so, md2 is one terabyte mirror RAID. We are save if one of the disks will fail.
[root@bdax72bur09node02 ~]# mdadm --detail /dev/md6
/dev/md6:
...
Array Size : 126924800 (121.04 GiB 129.97 GB)
Used Dev Size : 126924800 (121.04 GiB 129.97 GB)
Raid Devices : 2
Total Devices : 2
...
Active Devices : 2
...
Number Major Minor RaidDevice State
0 8 195 0 active sync /dev/sdm3
1 8 211 1 active sync /dev/sdn3
so, md6 is mirror SSD RAID. We are save if one of the disks will fail. fine, let's go next!
3. High Availability of Hadoop Components
3.1 Default service distribution on BDA/BDCS
We briefly took a look at the hardware layout of BDA/BDCS and how we layout data on disk. In this section, let's look at the Hadoop software details. By default, when you deploy BDCS or configure and create a BDA cluster, you will have the following service distribution by default:
| Node01 | Node02 | Node03 | Node04 | Node05 to nn |
|---|---|---|---|---|
|
Balancer |
- |
Cloudera Manager Server |
- |
- |
|
Cloudera Manager Agent |
Cloudera Manager Agent |
Cloudera Manager Agent |
Cloudera Manager Agent |
Cloudera Manager Agent |
|
DataNode |
DataNode |
DataNode |
DataNode |
DataNode |
|
Failover Controller |
Failover Controller |
- |
Oozie |
- |
|
JournalNode |
JournalNode |
JournalNode |
- |
- |
|
- |
MySQL Backup |
MySQL Primary |
- |
- |
|
NameNode |
NameNode |
Navigator Audit Server and Navigator Metadata Server |
- |
- |
|
NodeManager (in clusters of eight nodes or less) |
NodeManager (in clusters of eight nodes or less) |
NodeManager |
NodeManager |
NodeManager |
|
- |
- |
SparkHistoryServer |
Oracle Data Integrator Agent |
- |
|
- |
- |
ResourceManager |
ResourceManager |
- |
|
ZooKeeper |
ZooKeeper |
ZooKeeper |
- |
- |
|
Big Data SQL (if enabled) |
Big Data SQL (if enabled) |
Big Data SQL (if enabled) |
Big Data SQL (if enabled) |
Big Data SQL (if enabled) |
|
Kerberos KDC (if MIT Kerberos is enabled and on-BDA KDCs are being used) |
Kerberos KDC (if MIT Kerberos is enabled and on-BDA KDCs are being used) |
JobHistory |
- |
- |
|
Sentry Server (if enabled) |
Sentry Server (if enabled) |
- |
- |
- |
|
Hive Metastore |
- |
- |
Hive Metastore |
- |
|
Active Navigator Key Trustee Server (if HDFS Transparent Encryption is enabled) |
Passive Navigator Key Trustee Server (if HDFS Transparent Encryption is enabled) |
- |
- |
- |
|
- |
HttpFS |
- |
- |
- |
|
Hue Server |
- |
- |
Hue Server |
- |
|
Hue Load Balancer |
- |
- |
Hue Load Balancer |
- |
let me talk about High Availability implementation of some of this services. This configuration may change in the future, you could check some updates here.
3.2 Service with configured High Availability by default
As of today (November 2018) we support high availability features for certain Hadoop components:
1) Name Node
2) YARN
3) Kerberos Distribution Center
4) Sentry
5) Hive Metastore Service
6) HUE
3.2.1 Name Node High Availability
As you may know Oracle Solutions based on Cloudera Hadoop distribution. Here you could find detailed explanation about how HDFS high availability is achieved, but the good news that all those configuration steps done by default on BDA and BDCS and you simply have it by default. Let me show a small demo for NameNode high availability. First, let's check list of the nodes, which runs this service:
[root@bdax72bur09node01 ~]# hdfs getconf -namenodes
bdax72bur09node01.us.oracle.com bdax72bur09node02.us.oracle.com

[root@bdax72bur09node01 ~]# for i in seq {1..100}; do hadoop fs -ls hdfs://gcs-lab-bdax72-ns|tail -1; done;
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
18/11/01 19:53:53 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over bdax72bur09node02.us.oracle.com/192.168.8.171:8020 after 1 fail over attempts. Trying to fail over immediately.
...
18/11/01 19:54:16 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over bdax72bur09node02.us.oracle.com/192.168.8.171:8020 after 5 fail over attempts. Trying to fail over after sleeping for 11022ms.
java.net.ConnectException: Call From bdax72bur09node01.us.oracle.com/192.168.8.170 to bdax72bur09node02.us.oracle.com:8020 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:786)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2167)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1265)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1261)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1261)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1715)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
Caused by: java.net.ConnectException: Connection timed out
...
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
drwxr-xr-x - root root 0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks
so, as we can see due unavailability of one of the name nodes, second one take over it responsibility. Customer will experience short service outage. In Cloudera Manager we can see that NameNode service on node02 is not available:

but despite on this, users could keep continue to work with the cluster without outages or any extra actions.
3.2.2 YARN High Availability
YARN is another key Hadoop component and it's also highly available by default within Oracle Solution. Cloudera Requires to make some configuration, but with BDA and BDCS all these steps done after service deployment. Let's do the same test with YARN resource manager. In Cloudera Manager we define nodes, which run YARN resource manager service and try to reboot active one (reproduce hardware fail):
I'll run some MapReduce code and will restart bdax72bur09node04 node (which contains active resource manager).
[root@bdax72bur09node01 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 1 1
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
18/11/01 20:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm16
18/11/01 20:08:03 INFO input.FileInputFormat: Total input paths to process : 1
18/11/01 20:08:04 INFO mapreduce.JobSubmitter: number of splits:1
18/11/01 20:08:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541115989562_0002
18/11/01 20:08:04 INFO impl.YarnClientImpl: Submitted application application_1541115989562_0002
18/11/01 20:08:04 INFO mapreduce.Job: The url to track the job: http://bdax72bur09node04.us.oracle.com:8088/proxy/application_1541115989562_0002/
18/11/01 20:08:04 INFO mapreduce.Job: Running job: job_1541115989562_0002
18/11/01 20:08:07 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm16. Trying to fail over immediately.
java.io.EOFException: End of File Exception between local host is: "bdax72bur09node01.us.oracle.com/192.168.8.170"; destination host is: "bdax72bur09node04.us.oracle.com":8032; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy13.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:154)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:323)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:423)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:698)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:621)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1366)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1328)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
18/11/01 20:08:09 INFO mapreduce.Job: Job job_1541115989562_0002 running in uber mode : false
18/11/01 20:08:09 INFO mapreduce.Job: map 0% reduce 0%
18/11/01 20:08:23 INFO mapreduce.Job: map 100% reduce 0%
18/11/01 20:08:29 INFO mapreduce.Job: map 100% reduce 100%
18/11/01 20:08:29 INFO mapreduce.Job: Job job_1541115989562_0002 completed successfully
well, in the logs we clearly can see that we were failing over to second resource manager. In Cloudera Manager we can see that node03 took over active role:

so, looking entire node, which contain Resource Manager users will not lose ability to submit their jobs.
3.2.3 Kerberos Distribution Center (KDC)
In fact majority of production Hadoop Clusters running in secure mode, which means Kerberized Clusters. Kerberos Distribution Center is the key component for it. The good news when we install Kerberos with BDA or BDCS, you automatically get standby on your BDA/BDCS.
3.2.4 Sentry High Availability
If Kerberos is authentication method (define who you are), that quite frequently users want to use some Authorization tool in couple with it. In case of Cloudera almost default tool is Sentry. Since BDA4.12 software release we have support of Sentry High Availability out of the box. Cloudera has detailed documentation, which explains how it works.
3.2.5 Hive Metastore Service High Availability
When we are talking about hive, it's very important to keep in mind that it consist of many components. it's easy to see in Cloudera Manager:
and whenever you deal with some hive tables, you have to go through many logical layers:

for keep it simple, let's consider one case, when we use beeline to query some hive tables. So, we need to have HiveServer2, Hive Metastore Service and Metastore backend RDBMS available. Let's connect and make sure that data is available:
0: jdbc:hive2://bdax72bur09node04.us.oracle.c (closed)> !connect jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;
Connecting to jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;
Enter username for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;:
Enter password for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;:
Connected to: Apache Hive (version 1.1.0-cdh5.14.2)
Driver: Hive JDBC (version 1.1.0-cdh5.14.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://bdax72bur09node04.us.oracle.c> show databases;
...
+----------------+--+
| database_name |
+----------------+--+
| csv |
| default |
| parq |
+----------------+--+
Now, let's shut down HiveServer2 and will make sure that we can't connect to database:

1: jdbc:hive2://bdax72bur09node04.us.oracle.c> !connect jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;
Connecting to jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;
Enter username for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;:
Enter password for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;:
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
1: jdbc:hive2://bdax72bur09node04.us.oracle.c>

[root@bdax72bur09node06 ~]# yum -y install haproxy
Loaded plugins: langpacks
Resolving Dependencies
--> Running transaction check
---> Package haproxy.x86_64 0:1.5.18-7.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==========================================================================================================================================================================================================
Package Arch Version Repository Size
==========================================================================================================================================================================================================
Installing:
haproxy x86_64 1.5.18-7.el7 ol7_latest 833 k
Transaction Summary
==========================================================================================================================================================================================================
Install 1 Package
Total download size: 833 k
Installed size: 2.6 M
Downloading packages:
haproxy-1.5.18-7.el7.x86_64.rpm | 833 kB 00:00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : haproxy-1.5.18-7.el7.x86_64 1/1
Verifying : haproxy-1.5.18-7.el7.x86_64 1/1
Installed:
haproxy.x86_64 0:1.5.18-7.el7
Complete!
[root@bdax72bur09node06 ~]# vi /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend main *:5000
acl url_static path_beg -i /static /images /javascript /stylesheets
acl url_static path_end -i .jpg .gif .png .css .js
use_backend static if url_static
#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
backend static
balance roundrobin
server static 127.0.0.1:4331 check
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
listen hiveserver2 :10005
mode tcp
option tcplog
balance source
server hiveserver2_1 bdax72bur09node04.us.oracle.com:10000 check
server hiveserver2_2 bdax72bur09node05.us.oracle.com:10000 check

beeline> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;
...
INFO : OK
+----------------+--+
| database_name |
+----------------+--+
| csv |
| default |
| parq |
+----------------+--+
3 rows selected (2.08 seconds)
Great! it work. Try to shutdown one of the HiveServer2:

0: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;
...
1: jdbc:hive2://bdax72bur09node06.us.oracle.c> show databases;
...
+----------------+--+
| database_name |
+----------------+--+
| csv |
| default |
| parq |
+----------------+--+
this is works!
Now let's move on and let's have a look what do we have for Hive Metastore Service high availability. The really great news, that we do have enable it by default with BDA and BDCS:
for showing this, I'll try to shutdown one by one service consequently and will see you connection to beeline would work.
Shutdown service on node01 and try to connect/query through beeline:
1: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;
...
INFO : OK
+----------------+--+
| database_name |
+----------------+--+
| csv |
| default |
| parq |
+----------------+--+
works, now i'm going to startup service on node01 and shutdown on the node04:

1: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;
...
INFO : OK
+----------------+--+
| database_name |
+----------------+--+
| csv |
| default |
| parq |
+----------------+--+
it works again! so, we are safe with Hive Metastore service.
BDA and BDCS use MySQL RDBMS as database layout. As of today there is no High Availability for MySQL database, so we are using Master - Slave replication (in future we hope to have HA for MySQL), which allows us switch to Slave in case of Master failing. Today, you will need to perform node migration in case of failing master node (node03 by default), I'll explain this later in this blog.
To find out where is MySQL Master, run this:
[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_NODE /opt/oracle/bda/install/state/config.json
bdax72bur09node03
[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_BACKUP_NODE /opt/oracle/bda/install/state/config.json
bdax72bur09node02
Hue is quite a popular tool for working with Hadoop data. It's also possible to run Hue in HA mode, Cloudera explains it here, but with BDA and BDCS you will have it out of the box since 4.12 software version. By default you have HUE and HUE balancer available on node01 and node04:

in case of unavailability of Node01 or Node04, users could easily, without any extra actions keep using HUE, just for switching to different balancer URL.
3.3 Migrate Critical Nodes
One of the greatest features of Big Data Appliance is the capability to migrate all roles of critical services. For example, some nodes may contain many critical services, like node03 (Cloudera Manager, Resource Manager, MySQL store...). Fortunately, BDA has the simple way to migrate all roles from critical to a non-critical node. All details you may find in MOS (Node Migration on Oracle Big Data Appliance V4.0 OL6 Hadoop Cluster to Manage a Hardware Failure (Doc ID 1946210.1)).
Let's consider a case, when we lose (because of Hardware failing, for example) one of the critical server - node03, which contains MySQL Active RDBMS and Cloudera Manager. For fix this we need to migrate all roles of this node to some other server. For perform migration all roles from node03, just run:
[root@bdax72bur09node01 ~]# bdacli admin_cluster migrate bdax72bur09node03
all details you could find in the MOS note, but briefly:
1) This is two major type of migrations:
- Migration of critical nodes
- Reprovisioning of non-critical nodes
2) When you migrate critical nodes, you couldn't choose non-critical on which you will migrate services (mammoth will do this for you, generally it will be the first available non-critical node)
3) after hardware server will be back to cluster (or new one will be added), you should reprovision it as non critical.
4) You don't need to switch services back, just leave it as it is
after migration done, the new node will take over all roles from failing one. In my example, I've migrated one of the critical node, which has Active MySQL RDBMS and Cloudera Manager. To check where is active RDBMS, you may run:
[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_NODE /opt/oracle/bda/install/state/config.json
bdax72bur09node05
Note: for find slave RDBMS, run:
[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_BACKUP_NODE /opt/oracle/bda/install/state/config.json
bdax72bur09node02
and Cloudera Manager runs on node05:

Resource Manager also was migrated to the node05:

Migration process does decommission of the node.
After failing node will come back to the cluster, we will need to reprovision it (deploy non-critical services). In other words, we need to make re-commision of the node.
3.4 Redundant Services
there are certain Hadoop services, which configured on BDA in redundant way. You shouldn't worry about high availability of these services:
- Data Node. By default, HDFS configured for being 3 times redundant. If you will lose one node, you will have tow more copies.
- Journal Node. By default, you have 3 instances of JN configured. Missing one is not a big deal.
- Zookeeper. By default, you have 3 instances of JN configured. Missing one is not a big deal.
4. Services with no configured High Availability by default
There are certain services on BDA, which doesn't have High Availability Configuration by default:
- Oozie. If you need to have High Availability for Oozie, you may check Cloudera's documentation
- Cloudera Manager. It's also possible to config Cloudera Manager for High Availability, like it's explained here, but I'd recommend use node migration, like I show above
- Impala. By default, neither BDA nor BDCS don't have Impala configured by default (yet), but it's quite important. All detailed information you could find here, but briefly for config HA for Impala, you need:
a. Config haproxy (I've extend existing haproxy config, doen for HiveServer2), by adding:
listen impala :25003
mode tcp
option tcplog
balance leastconn
server symbolic_name_1 bdax72bur09node01.us.oracle.com:21000 check
server symbolic_name_2 bdax72bur09node02.us.oracle.com:21000 check
server symbolic_name_3 bdax72bur09node03.us.oracle.com:21000 check
server symbolic_name_4 bdax72bur09node04.us.oracle.com:21000 check
server symbolic_name_5 bdax72bur09node05.us.oracle.com:21000 check
server symbolic_name_6 bdax72bur09node06.us.oracle.com:21000 check
b. Go to Cloudera Manager -> Impala - Confing -> search for "Impala Daemons Load Balancer" and add haproxy host there:
c. Login into Impala, using haproxy host:port:
[root@bdax72bur09node01 bin]# impala-shell -i bdax72bur09node06:25003
...
Connected to bdax72bur09node06:25003
...
[bdax72bur09node06:25003] >
talking about Impala, it's worth to mention that there are two more services - Impala Catalog Service and Impala State Store. It's not mission critical services. From Cloudera's documentation:
The Impala component known as the statestore checks on the health of Impala daemons on all the DataNodes in a cluster, and continuously relays its findings to each of those daemons.
and
The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the Impala daemons in a cluster. ...
Because the requests are passed through the statestore daemon, it makes sense to run the statestored and catalogd services on the same host.
I'll make a quick test:
- I've disabled Impala daemon on the node01, disable StateStore, and Catalog id
- Connect to loadbalancer and run the query:
[root@bdax72bur09node01 ~]# impala-shell -i bdax72bur09node06:25003
....
[bdax72bur09node06:25003] > select count(1) from test_table;
...
+------------+
| count(1) |
+------------+
| 6659433869 |
+------------+
Fetched 1 row(s) in 1.76s
[bdax72bur09node06:25003] >
so, as we can see Impala may work even without StateStore and Catalog Service
Appendix A.
Despite on that BDA has multiple High Availability features, its always useful to make a backup before any significant operations, like un upgrade. In order to get detailed information, please follow My Oracle Support (MOS) note:
How to Backup Critical Metadata on Oracle Big Data Appliance Prior to Upgrade V2.3.1 and Higher Releases (Doc ID 1623304.1)