X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Use Big Data Appliance and Big Data Cloud Service High Availability, or You'll Blame Yourself Later

Alexey Filanovskiy
Product Manager

In this blog post, I'd like to briefly review the high availability functionality in Oracle Big Data Appliance and Big Data Cloud Service. The good news on all of this is that most of these features are always available out of the box on your systems, and that no extra steps are required from your end. One of the key value-adds of leveraging a hardened system from Oracle.

A special shout-out to Sandra and Ravi from our team, for helping with this blog post.

For this post on HA, we'll subdivide the content into the following topics:

  1. High Availability in the Hardware Components of the system
  2. High Availability within a single node
  3. Hadoop Components High Availability

1. High Availability in Hardware Components

When we are talking about an on-premise solution, it is important to understand the fault tolerance and HA built into the actual hardware you have on the floor. Based on Oracle Exadata and the experience we have in managing mission critical systems, a BDA  is built out of components to handle hardware faults and simply stay up and running. Networking is redundant, power supplies in the racks are redundant, ILOM software tracks the health of the system and ASR pro-actively logs SRs if needed on hardware issues. You can find a lot more information here.

2. High availability within a single node

Talking about high availability within a single node, I'd like to focus on disk failures. In large clusters, disk failures do occur but should - in general - nor cause any issues for BDA and BDCS customers. First let's have a look at the disk representation (minus data directories) for the Oracle system:

[root@bdax72bur09node02 ~]# df -h|grep -v "/u"

Filesystem      Size  Used Avail Use% Mounted on

devtmpfs        126G     0  126G   0% /dev

tmpfs           126G  8.0K  126G   1% /dev/shm

tmpfs           126G   67M  126G   1% /run

tmpfs           126G     0  126G   0% /sys/fs/cgroup

/dev/md2        961G   39G  874G   5% /

/dev/md6        120G  717M  113G   1% /ssddisk

/dev/md0        454M  222M  205M  53% /boot

/dev/sda1       191M   16M  176M   9% /boot/efi

/dev/sdb1       191M     0  191M   0% /boot/rescue-efi

cm_processes    126G  309M  126G   1% /run/cloudera-scm-agent/process

Next, let's take a look where critical services store their data.

- Name Node. Aparently most critical HDFS component. It stores FSimage file and edits on the hard disks, let's check where:

[root@bdax72bur09node02 ~]# df -h /opt/hadoop/dfs/nn

Filesystem      Size  Used Avail Use% Mounted on

/dev/md2        961G   39G  874G   5% /

- Journal Node:

[root@bdax72bur09node02 ~]# df /opt/hadoop/dfs/jn

Filesystem     1K-blocks   Used Available Use% Mounted on

/dev/md6       124800444 733688 117704132   1% /ssddisk

[root@bdax72bur09node02 ~]# ls -l /opt/hadoop/dfs/jn

lrwxrwxrwx 1 root root 15 Jul 15 22:58 /opt/hadoop/dfs/jn -> /ssddisk/dfs/jn

- Zookeeper:

[root@bdax72bur09node02 ~]# df /var/lib/zookeeper

Filesystem     1K-blocks   Used Available Use% Mounted on

/dev/md6       124800444 733688 117704132   1% /ssddisk

all these services store their data on RAIDs /dev/md2 and /dev/md6.

Let's take a look on what it consist of:

[root@bdax72bur09node02 ~]# mdadm --detail /dev/md2

/dev/md2:

...

     Array Size : 1023867904 (976.44 GiB 1048.44 GB)

     Used Dev Size : 1023867904 (976.44 GiB 1048.44 GB)

      Raid Devices : 2

     Total Devices : 2

   ...

    Active Devices : 2

...

    Number   Major   Minor   RaidDevice State

       0       8        3        0      active sync   /dev/sda3

       1       8       19        1      active sync   /dev/sdb3

so, md2 is one terabyte mirror RAID. We are save if one of the disks will fail.

[root@bdax72bur09node02 ~]# mdadm --detail /dev/md6

/dev/md6:

...

     Array Size : 126924800 (121.04 GiB 129.97 GB)

     Used Dev Size : 126924800 (121.04 GiB 129.97 GB)

      Raid Devices : 2

     Total Devices : 2

...

   Active Devices : 2

...

   Number   Major   Minor   RaidDevice State

       0       8      195        0      active sync   /dev/sdm3

       1       8      211        1      active sync   /dev/sdn3

so, md6 is mirror SSD RAID. We are save if one of the disks will fail. fine, let's go next!

3. High Availability of Hadoop Components

3.1 Default service distribution on BDA/BDCS

We briefly took a look at the hardware layout of BDA/BDCS and how we layout data on disk. In this section, let's look at the Hadoop software details. By default, when you deploy BDCS or configure and create a BDA cluster, you will have the following service distribution by default:

Node01 Node02 Node03 Node04 Node05 to nn

Balancer

-

Cloudera Manager Server

-

-

Cloudera Manager Agent

Cloudera Manager Agent

Cloudera Manager Agent

Cloudera Manager Agent

Cloudera Manager Agent

DataNode

DataNode

DataNode

DataNode

DataNode

Failover Controller

Failover Controller

-

Oozie

-

JournalNode

JournalNode

JournalNode

-

-

-

MySQL Backup

MySQL Primary

-

-

NameNode

NameNode

Navigator Audit Server and Navigator Metadata Server

-

-

NodeManager (in clusters of eight nodes or less)

NodeManager (in clusters of eight nodes or less)

NodeManager

NodeManager

NodeManager

-

-

SparkHistoryServer

Oracle Data Integrator Agent

-

-

-

ResourceManager

ResourceManager

-

ZooKeeper

ZooKeeper

ZooKeeper

-

-

Big Data SQL (if enabled)

Big Data SQL (if enabled)

Big Data SQL (if enabled)

Big Data SQL (if enabled)

Big Data SQL (if enabled)

Kerberos KDC (if MIT Kerberos is enabled and on-BDA KDCs are being used)

Kerberos KDC (if MIT Kerberos is enabled and on-BDA KDCs are being used)

JobHistory

-

-

Sentry Server (if enabled)

Sentry Server (if enabled)

-

-

-

Hive Metastore

-

-

Hive Metastore

-

Active Navigator Key Trustee Server (if HDFS Transparent Encryption is enabled)

Passive Navigator Key Trustee Server (if HDFS Transparent Encryption is enabled)

-

-

-

-

HttpFS

-

-

-

Hue Server

-

-

Hue Server

-

Hue Load Balancer

-

-

Hue Load Balancer

-

let me talk about High Availability implementation of some of this services. This configuration may change in the future, you could check some updates here.

3.2 Service with configured High Availability by default

As of today (November 2018) we support high availability features for certain Hadoop components:

1) Name Node

2) YARN

3) Kerberos Distribution Center

4) Sentry

5) Hive Metastore Service

6) HUE

3.2.1 Name Node High Availability

As you may know Oracle Solutions based on Cloudera Hadoop distribution. Here you could find detailed explanation about how HDFS high availability is achieved, but the good news that all those configuration steps done by default on BDA and BDCS and you simply have it by default. Let me show a small demo for NameNode high availability. First, let's check list of the nodes, which runs this service:

[root@bdax72bur09node01 ~]# hdfs getconf -namenodes

bdax72bur09node01.us.oracle.com bdax72bur09node02.us.oracle.com

the easiest way to determine which node is active is to go to Cloudera Manager -> HDFS -> Instances:
in my case bdax72bur09node02 node is active. I'll run hdfs list command in the cycle and reboot active namenode and we will take a look on how will system behave:

[root@bdax72bur09node01 ~]# for i in seq {1..100}; do hadoop fs -ls hdfs://gcs-lab-bdax72-ns|tail -1; done;

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

18/11/01 19:53:53 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over bdax72bur09node02.us.oracle.com/192.168.8.171:8020 after 1 fail over attempts. Trying to fail over immediately.

...

18/11/01 19:54:16 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over bdax72bur09node02.us.oracle.com/192.168.8.171:8020 after 5 fail over attempts. Trying to fail over after sleeping for 11022ms.

java.net.ConnectException: Call From bdax72bur09node01.us.oracle.com/192.168.8.170 to bdax72bur09node02.us.oracle.com:8020 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)

    at org.apache.hadoop.ipc.Client.call(Client.java:1508)

    at org.apache.hadoop.ipc.Client.call(Client.java:1441)

    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)

    at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)

    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:786)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)

    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)

    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)

    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2167)

    at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1265)

    at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1261)

    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1261)

    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)

    at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272)

    at org.apache.hadoop.fs.Globber.glob(Globber.java:151)

    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1715)

    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)

    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)

    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)

    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)

    at org.apache.hadoop.fs.shell.Command.run(Command.java:165)

    at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

    at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)

Caused by: java.net.ConnectException: Connection timed out

...

 

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

drwxr-xr-x   - root root          0 2018-09-11 17:21 hdfs://gcs-lab-bdax72-ns/user/root/benchmarks

so, as we can see due unavailability of one of the name nodes, second one take over it responsibility. Customer will experience short service outage. In Cloudera Manager we can see that NameNode service on node02 is not available:

but despite on this, users could keep continue to work with the cluster without outages or any extra actions.

3.2.2 YARN High Availability

YARN is another key Hadoop component and it's also highly available by default within Oracle Solution. Cloudera Requires to make some configuration, but with BDA and BDCS all these steps done after service deployment. Let's do the same test with YARN resource manager. In Cloudera Manager we define nodes, which run YARN resource manager service and try to reboot active one (reproduce hardware fail):

I'll run some MapReduce code and will restart bdax72bur09node04 node (which contains active resource manager).

[root@bdax72bur09node01 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 1 1

Number of Maps  = 1

Samples per Map = 1

Wrote input for Map #0

Starting Job

18/11/01 20:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm16

18/11/01 20:08:03 INFO input.FileInputFormat: Total input paths to process : 1

18/11/01 20:08:04 INFO mapreduce.JobSubmitter: number of splits:1

18/11/01 20:08:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541115989562_0002

18/11/01 20:08:04 INFO impl.YarnClientImpl: Submitted application application_1541115989562_0002

18/11/01 20:08:04 INFO mapreduce.Job: The url to track the job: http://bdax72bur09node04.us.oracle.com:8088/proxy/application_1541115989562_0002/

18/11/01 20:08:04 INFO mapreduce.Job: Running job: job_1541115989562_0002

18/11/01 20:08:07 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm16. Trying to fail over immediately.

java.io.EOFException: End of File Exception between local host is: "bdax72bur09node01.us.oracle.com/192.168.8.170"; destination host is: "bdax72bur09node04.us.oracle.com":8032; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)

    at org.apache.hadoop.ipc.Client.call(Client.java:1508)

    at org.apache.hadoop.ipc.Client.call(Client.java:1441)

    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)

    at com.sun.proxy.$Proxy13.getApplicationReport(Unknown Source)

    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)

    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)

    at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)

    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)

    at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)

    at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:154)

    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:323)

    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:423)

    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:698)

    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)

    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:422)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)

    at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)

    at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:621)

    at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1366)

    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1328)

    at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)

    at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

    at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)

    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)

    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.io.EOFException

    at java.io.DataInputStream.readInt(DataInputStream.java:392)

    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)

    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

18/11/01 20:08:07 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm15

18/11/01 20:08:09 INFO mapreduce.Job: Job job_1541115989562_0002 running in uber mode : false

18/11/01 20:08:09 INFO mapreduce.Job:  map 0% reduce 0%

18/11/01 20:08:23 INFO mapreduce.Job:  map 100% reduce 0%

18/11/01 20:08:29 INFO mapreduce.Job:  map 100% reduce 100%

18/11/01 20:08:29 INFO mapreduce.Job: Job job_1541115989562_0002 completed successfully

well, in the logs we clearly can see that we were failing over to second resource manager. In Cloudera Manager we can see that node03 took over active role:

so, looking entire node, which contain Resource Manager users will not lose ability to submit their jobs.

3.2.3 Kerberos Distribution Center (KDC)

In fact majority of production Hadoop Clusters running in secure mode, which means Kerberized Clusters. Kerberos Distribution Center is the key component for it. The good news when we install Kerberos with BDA or BDCS, you automatically get standby on your BDA/BDCS.

3.2.4 Sentry High Availability

If Kerberos is authentication method (define who you are), that quite frequently users want to use some Authorization tool in couple with it. In case of Cloudera almost default tool is Sentry. Since BDA4.12 software release we have support of Sentry High Availability out of the box. Cloudera has detailed documentation, which explains how it works. 

3.2.5 Hive Metastore Service High Availability

When we are talking about hive, it's very important to keep in mind that it consist of many components. it's easy to see in Cloudera Manager:

and whenever you deal with some hive tables, you have to go through many logical layers:

for keep it simple, let's consider one case, when we use beeline to query some hive tables. So, we need to have HiveServer2, Hive Metastore Service and Metastore backend RDBMS available. Let's connect and make sure that data is available:

0: jdbc:hive2://bdax72bur09node04.us.oracle.c (closed)> !connect jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;

Connecting to jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;

Enter username for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: 

Enter password for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: 

Connected to: Apache Hive (version 1.1.0-cdh5.14.2)

Driver: Hive JDBC (version 1.1.0-cdh5.14.2)

Transaction isolation: TRANSACTION_REPEATABLE_READ

1: jdbc:hive2://bdax72bur09node04.us.oracle.c> show databases;

...

+----------------+--+

| database_name  |

+----------------+--+

| csv            |

| default        |

| parq           |

+----------------+--+

 

Now, let's shut down HiveServer2 and will make sure that we can't connect to database:

1: jdbc:hive2://bdax72bur09node04.us.oracle.c> !connect jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;

Connecting to jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;

Enter username for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: 

Enter password for jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: 

Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.

Error: Could not open client transport with JDBC Uri: jdbc:hive2://bdax72bur09node04.us.oracle.com:10000/default;: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

1: jdbc:hive2://bdax72bur09node04.us.oracle.c> 


as we expected we couldn't perform connection. We have to go to Cloudera Manager -> Hive -> Instances -> Add Role and add extra HiveServer2 (add it to node05):
After this we will need to install balancer:

[root@bdax72bur09node06 ~]# yum -y install haproxy

Loaded plugins: langpacks

Resolving Dependencies

--> Running transaction check

---> Package haproxy.x86_64 0:1.5.18-7.el7 will be installed

--> Finished Dependency Resolution

 

Dependencies Resolved

 

==========================================================================================================================================================================================================

 Package                                        Arch                                          Version                                             Repository                                         Size

==========================================================================================================================================================================================================

Installing:

 haproxy                                        x86_64                                        1.5.18-7.el7                                        ol7_latest                                        833 k

 

Transaction Summary

==========================================================================================================================================================================================================

Install  1 Package

 

Total download size: 833 k

Installed size: 2.6 M

Downloading packages:

haproxy-1.5.18-7.el7.x86_64.rpm                                                                                                                                                    | 833 kB  00:00:01     

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : haproxy-1.5.18-7.el7.x86_64                                                                                                                                                            1/1 

  Verifying  : haproxy-1.5.18-7.el7.x86_64                                                                                                                                                            1/1 

 

Installed:

  haproxy.x86_64 0:1.5.18-7.el7                                                                                                                                                                           

 

Complete!

now we will need to config haproxy. Go to configuration file:

[root@bdax72bur09node06 ~]# vi /etc/haproxy/haproxy.cfg

this is example of my haproxy.cfg:

global

    log         127.0.0.1 local2

 

    chroot      /var/lib/haproxy

    pidfile     /var/run/haproxy.pid

    maxconn     4000

    user        haproxy

    group       haproxy

    daemon

 

    # turn on stats unix socket

    stats socket /var/lib/haproxy/stats

 

#---------------------------------------------------------------------

# common defaults that all the 'listen' and 'backend' sections will

# use if not designated in their block

#---------------------------------------------------------------------

defaults

    mode                    http

    log                     global

    option                  httplog

    option                  dontlognull

    option http-server-close

    option forwardfor       except 127.0.0.0/8

    option                  redispatch

    retries                 3

    timeout http-request    10s

    timeout queue           1m

    timeout connect         10s

    timeout client          1m

    timeout server          1m

    timeout http-keep-alive 10s

    timeout check           10s

    maxconn                 3000

 

#---------------------------------------------------------------------

# main frontend which proxys to the backends

#---------------------------------------------------------------------

frontend  main *:5000

    acl url_static       path_beg       -i /static /images /javascript /stylesheets

    acl url_static       path_end       -i .jpg .gif .png .css .js

    use_backend static          if url_static

#---------------------------------------------------------------------

# static backend for serving up images, stylesheets and such

#---------------------------------------------------------------------

backend static

    balance     roundrobin

    server      static 127.0.0.1:4331 check

 

#---------------------------------------------------------------------

# round robin balancing between the various backends

#---------------------------------------------------------------------

listen hiveserver2 :10005

    mode tcp

    option tcplog

    balance source

server hiveserver2_1 bdax72bur09node04.us.oracle.com:10000 check

server hiveserver2_2 bdax72bur09node05.us.oracle.com:10000 check

 
Then go to Cloudera Manager and setup balancer hostname/port (accordingly how we config it in our previous step):
after all these changes been done try to connect again:

beeline> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;

...

INFO  : OK

+----------------+--+

| database_name  |

+----------------+--+

| csv            |

| default        |

| parq           |

+----------------+--+

3 rows selected (2.08 seconds)

Great! it work. Try to shutdown one of the HiveServer2:

0: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;

...

1: jdbc:hive2://bdax72bur09node06.us.oracle.c> show databases;

...

+----------------+--+

| database_name  |

+----------------+--+

| csv            |

| default        |

| parq           |

+----------------+--+

this is works!

Now let's move on and let's have a look what do we have for Hive Metastore Service high availability. The really great news, that we do have enable it by default with BDA and BDCS:

for showing this, I'll try to shutdown one by one service consequently and will see you connection to beeline would work.

Shutdown service on node01 and try to connect/query through beeline:

1: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;

...

INFO  : OK

+----------------+--+

| database_name  |

+----------------+--+

| csv            |

| default        |

| parq           |

+----------------+--+

works, now i'm going to startup service on node01 and shutdown on the node04:

1: jdbc:hive2://bdax72bur09node06.us.oracle.c> !connect jdbc:hive2://bdax72bur09node06.us.oracle.com:10005/default;

...

INFO  : OK

+----------------+--+

| database_name  |

+----------------+--+

| csv            |

| default        |

| parq           |

+----------------+--+

it works again! so, we are safe with Hive Metastore service.

BDA and BDCS use MySQL RDBMS as database layout. As of today there is no High Availability for MySQL database, so we are using Master - Slave replication (in future we hope to have HA for MySQL), which allows us switch to Slave in case of Master failing. Today, you will need to perform node migration in case of failing master node (node03 by default), I'll explain this later in this blog.

To find out where is MySQL Master, run this:

[root@bdax72bur09node01 tmp]#  json-select --jpx=MYSQL_NODE /opt/oracle/bda/install/state/config.json

bdax72bur09node03

to find out slave, run:

[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_BACKUP_NODE /opt/oracle/bda/install/state/config.json

bdax72bur09node02

 
3.2.6 HUE High Availability

Hue is quite a popular tool for working with Hadoop data. It's also possible to run Hue in HA mode, Cloudera explains it here, but with BDA and BDCS you will have it out of the box since 4.12 software version. By default you have HUE and HUE balancer available on node01 and node04:

in case of unavailability of Node01 or Node04, users could easily, without any extra actions keep using HUE, just for switching to different balancer URL.

3.3 Migrate Critical Nodes

One of the greatest features of Big Data Appliance is the capability to migrate all roles of critical services. For example, some nodes may contain many critical services, like node03 (Cloudera Manager, Resource Manager, MySQL store...). Fortunately, BDA has the simple way to migrate all roles from critical to a non-critical node. All details you may find in MOS (Node Migration on Oracle Big Data Appliance V4.0 OL6 Hadoop Cluster to Manage a Hardware Failure (Doc ID 1946210.1)).

Let's consider a case, when we lose (because of Hardware failing, for example) one of the critical server - node03, which contains MySQL Active RDBMS and Cloudera Manager. For fix this we need to migrate all roles of this node to some other server. For perform migration all roles from node03, just run:

[root@bdax72bur09node01 ~]# bdacli admin_cluster migrate bdax72bur09node03

all details you could find in the MOS note, but briefly:

1) This is two major type of migrations:

- Migration of critical nodes

- Reprovisioning of non-critical nodes

2) When you migrate critical nodes, you couldn't choose non-critical on which you will migrate services (mammoth will do this for you, generally it will be the first available non-critical node)

3) after hardware server will be back to cluster (or new one will be added), you should reprovision it as non critical.

4) You don't need to switch services back, just leave it as it is

after migration done, the new node will take over all roles from failing one. In my example, I've migrated one of the critical node, which has Active MySQL RDBMS and Cloudera Manager. To check where is active RDBMS, you may run:

[root@bdax72bur09node01 tmp]#  json-select --jpx=MYSQL_NODE /opt/oracle/bda/install/state/config.json

bdax72bur09node05

Note: for find slave RDBMS, run:

[root@bdax72bur09node01 tmp]# json-select --jpx=MYSQL_BACKUP_NODE /opt/oracle/bda/install/state/config.json

bdax72bur09node02

and Cloudera Manager runs on node05:

Resource Manager also was migrated to the node05:

Migration process does decommission of the node.

After failing node will come back to the cluster, we will need to reprovision it (deploy non-critical services). In other words, we need to make re-commision of the node.

3.4 Redundant Services

there are certain Hadoop services, which configured on BDA in redundant way. You shouldn't worry about high availability of these services:

- Data Node. By default, HDFS configured for being 3 times redundant. If you will lose one node, you will have tow more copies.

- Journal Node. By default, you have 3 instances of JN configured. Missing one is not a big deal.

- Zookeeper. By default, you have 3 instances of JN configured. Missing one is not a big deal.

4. Services with no configured High Availability by default

There are certain services on BDA, which doesn't have High Availability Configuration by default:

- Oozie. If you need to have High Availability for Oozie, you may check Cloudera's documentation

- Cloudera Manager. It's also possible to config Cloudera Manager for High Availability, like it's explained here, but I'd recommend use node migration, like I show above

- Impala. By default, neither BDA nor BDCS don't have Impala configured by default (yet), but it's quite important. All detailed information you could find here, but briefly for config HA for Impala, you need:

a. Config haproxy (I've extend existing haproxy config, doen for HiveServer2), by adding:

listen impala :25003

    mode tcp

    option tcplog

    balance leastconn

 

    server symbolic_name_1 bdax72bur09node01.us.oracle.com:21000 check

    server symbolic_name_2 bdax72bur09node02.us.oracle.com:21000 check

    server symbolic_name_3 bdax72bur09node03.us.oracle.com:21000 check

    server symbolic_name_4 bdax72bur09node04.us.oracle.com:21000 check

    server symbolic_name_5 bdax72bur09node05.us.oracle.com:21000 check

    server symbolic_name_6 bdax72bur09node06.us.oracle.com:21000 check

b. Go to Cloudera Manager -> Impala - Confing -> search for "Impala Daemons Load Balancer" and add haproxy host there:

c. Login into Impala, using haproxy host:port:

[root@bdax72bur09node01 bin]# impala-shell -i bdax72bur09node06:25003

...

Connected to bdax72bur09node06:25003

...

[bdax72bur09node06:25003] >

talking about Impala, it's worth to mention that there are two more services - Impala Catalog Service and Impala State Store. It's not mission critical services. From Cloudera's documentation:

The Impala component known as the statestore checks on the health of Impala daemons on all the DataNodes in a cluster, and continuously relays its findings to each of those daemons. 

and

The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the Impala daemons in a cluster. ...

Because the requests are passed through the statestore daemon, it makes sense to run the statestored and catalogd services on the same host.

I'll make a quick test:

- I've disabled Impala daemon on the node01, disable StateStore, and Catalog id

- Connect to loadbalancer and run the query:

[root@bdax72bur09node01 ~]# impala-shell -i bdax72bur09node06:25003

....

[bdax72bur09node06:25003] > select count(1) from test_table;

...

+------------+

| count(1)   |

+------------+

| 6659433869 |

+------------+

Fetched 1 row(s) in 1.76s

[bdax72bur09node06:25003] >

so, as we can see Impala may work even without StateStore and Catalog Service

Appendix A.

Despite on that BDA has multiple High Availability features, its always useful to make a backup before any significant operations, like un upgrade. In order to get detailed information, please follow My Oracle Support (MOS) note:

How to Backup Critical Metadata on Oracle Big Data Appliance Prior to Upgrade V2.3.1 and Higher Releases (Doc ID 1623304.1)    

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.