X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Big Data SQL Quick Start. Kerberos - Part 26

Alexey Filanovskiy
Product Manager

In Hadoop world Kerberos is de facto standard of securing cluster and it's even not a question that Big Data SQL should support Kerberos. Oracle has good documentation about how to install  Big Data SQL over Kerberized cluster, but today, I'd like to show couple typical steps how to test and debug Kerberized installation. First of all, let me tell about test environment. it's a 4 nodes: 3 nodes for Hadoop cluster (vm0[1-3]) and one for Database (vm04). Kerberos tickets should be initiated from keytab file, which should be on the database side (in case of RAC on each database node) and on each Hadoop node.

Let's check that on the database node we have valid Kerberos ticket:

[oracle@vm04 ~]$ id

uid=500(oracle) gid=500(oinstall) groups=500(oinstall),501(dba)

[oracle@scaj0602bda09vm04 ~]$ klist 

Ticket cache: FILE:/tmp/krb5cc_500

Default principal: oracle/martybda@MARTYBDA.ORACLE.COM

 

Valid starting     Expires            Service principal

07/23/18 01:15:58  07/24/18 01:15:58  krbtgt/MARTYBDA.ORACLE.COM@MARTYBDA.ORACLE.COM

    renew until 07/30/18 01:15:01

let's check that we have access to HDFS from database host:

[oracle@vm04 ~]$ cd $ORACLE_HOME/bigdatasql

[oracle@vm04 bigdatasql]$ ls -l|grep hadoop*env

-rw-r--r-- 1 oracle oinstall 2249 Jul 12 15:41 hadoop_martybda.env

[oracle@vm04 bigdatasql]$ source hadoop_martybda.env 

[oracle@vm04 bigdatasql]$ hadoop fs -ls

...

Found 4 items

drwx------   - oracle hadoop          0 2018-07-13 06:00 .Trash

drwxr-xr-x   - oracle hadoop          0 2018-07-12 05:10 .sparkStaging

drwx------   - oracle hadoop          0 2018-07-12 05:17 .staging

drwxr-xr-x   - oracle hadoop          0 2018-07-12 05:14 oozie-oozi

[oracle@vm04 bigdatasql]$ 

seems everything is ok. let's do the same from Hadoop node:

[root@vm01 ~]# su - oracle

[oracle@scaj0602bda09vm01 ~]$ id

uid=1000(oracle) gid=1001(oinstall) groups=1001(oinstall),127(hive),1002(dba)

[oracle@vm01 ~]$ klist 

Ticket cache: FILE:/tmp/krb5cc_1000

Default principal: oracle/martybda@MARTYBDA.ORACLE.COM

 

Valid starting     Expires            Service principal

07/23/18 01:15:02  07/24/18 01:15:02  krbtgt/MARTYBDA.ORACLE.COM@MARTYBDA.ORACLE.COM

    renew until 07/30/18 01:15:02

let's check that we have assess for the environment and also create test file on HDFS:

[oracle@vm01 ~]$ echo "line1" >> test.txt

[oracle@vm01 ~]$ echo "line2" >> test.txt

[oracle@vm01 ~]$ hadoop fs -mkdir /tmp/test_bds

[oracle@vm01 ~]$ hadoop fs -put test.txt /tmp/test_bds

 
now, let's jump to Database node and create external table for this file:

[root@vm04 bin]# su - oracle

[oracle@vm04 ~]$ . oraenv <<< orcl

ORACLE_SID = [oracle] ? The Oracle base has been set to /u03/app/oracle

[oracle@vm04 ~]$ sqlplus / as sysdba

 

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jul 23 06:39:06 2018

 

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

 

SQL> alter session set container=PDBORCL;

 

Session altered.

 

SQL> CREATE TABLE bds_test (line VARCHAR2(4000)) 

  ORGANIZATION EXTERNAL

 (   TYPE ORACLE_HDFS 

     DEFAULT DIRECTORY 

     DEFAULT_DIR LOCATION ('/tmp/test_bds') 

 ) 

  REJECT LIMIT UNLIMITED;   

 

Table created.

 

SQL> 

and for sure this is our two row file which we created on the previous step:

SQL> select * from bds_test;

 

LINE

------------------------------------

line1

line2

Now let's go through some typical cases with Kerberos and let's talk about how to catch it.
 
Kerberos ticket missed on the database side
Let's simulate case when Kerberos ticket is missed on the database side. it's pretty easy and for doing this we will use kdestroy command:

[oracle@vm04 ~]$ kdestroy 

[oracle@vm04 ~]$ klist 

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_500)

extproc cache Kerberos ticket, so to apply our changes, you will need to restart extproc. First, we will need to obtain name of the extproc:

[oracle@vm04 admin]$ cd $ORACLE_HOME/hs/admin

[oracle@vm04 admin]$ ls -l

total 24

-rw-r--r-- 1 oracle oinstall 1170 Mar 27 01:04 extproc.ora

-rw-r----- 1 oracle oinstall 3112 Jul 12 15:56 initagt.dat

-rw-r--r-- 1 oracle oinstall  190 Jul 12 15:41 initbds_orcl_martybda.ora

-rw-r--r-- 1 oracle oinstall  489 Mar 27 01:04 initdg4odbc.ora

-rw-r--r-- 1 oracle oinstall  406 Jul 12 15:11 listener.ora.sample

-rw-r--r-- 1 oracle oinstall  244 Jul 12 15:11 tnsnames.ora.sample

name consist of database SID and Hadoop Cluster name. So, seems our extproc name is bds_orcl_martybda. let's stop and start it:

[oracle@vm04 admin]$ mtactl stop bds_orcl_martybda

 

ORACLE_HOME = "/u03/app/oracle/12.1.0/dbhome_orcl"

MTA init file = "/u03/app/oracle/12.1.0/dbhome_orcl/hs/admin/initbds_orcl_martybda.ora"

 

oracle 16776 1 0 Jul12 ? 00:49:25 extprocbds_orcl_martybda -mt

Stopping MTA process "extprocbds_orcl_martybda -mt"...

 

MTA process "extprocbds_orcl_martybda -mt" stopped!

 

[oracle@vm04 admin]$ mtactl start bds_orcl_martybda

 

ORACLE_HOME = "/u03/app/oracle/12.1.0/dbhome_orcl"

MTA init file = "/u03/app/oracle/12.1.0/dbhome_orcl/hs/admin/initbds_orcl_martybda.ora"

 

MTA process "extprocbds_orcl_martybda -mt" is not running!

 

Checking MTA init parameters...

 

[O]  INIT_LIBRARY=$ORACLE_HOME/lib/libkubsagt12.so

[O]  INIT_FUNCTION=kubsagtMTAInit

[O]  BDSQL_CLUSTER=martybda

[O]  BDSQL_CONFIGDIR=/u03/app/oracle/12.1.0/dbhome_orcl/bigdatasql/databases/orcl/bigdata_config

 

MTA process "extprocbds_orcl_martybda -mt" started!

oracle 19498 1 4 06:58 ? 00:00:00 extprocbds_orcl_martybda -mt

now we reset Kerberos ticket cache. Let's try to query HDFS data:

[oracle@vm04 admin]$ sqlplus / as sysdba

 

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jul 23 07:00:26 2018

 

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

 

SQL> alter session set container=PDBORCL;

 

Session altered.

 

SQL> select * from bds_test;

select * from bds_test

*

ERROR at line 1:

ORA-29913: error in executing ODCIEXTTABLEOPEN callout

ORA-29400: data cartridge error

KUP-11504: error from external driver: java.lang.Exception: Error initializing JXADProvider: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by

GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "m04.vm.oracle.com/192.168.254.5"; destination host is:

"vm02.vm.oracle.com":8020;

remember this error. If you see it it means that you don't have valid Kerberos ticket on the database side. Let's bring everything back and make sure that our environment again works properly.

[oracle@vm04 admin]$ crontab -l

15 1,7,13,19 * * * /bin/su - oracle -c "/usr/bin/kinit oracle/martybda@MARTYBDA.ORACLE.COM -k -t /u03/app/oracle/12.1.0/dbhome_orcl/bigdatasql/clusters/martybda/oracle.keytab"

[oracle@vm04 admin]$ /usr/bin/kinit oracle/martybda@MARTYBDA.ORACLE.COM -k -t /u03/app/oracle/12.1.0/dbhome_orcl/bigdatasql/clusters/martybda/oracle.keytab

[oracle@vm04 admin]$ klist 

Ticket cache: FILE:/tmp/krb5cc_500

Default principal: oracle/martybda@MARTYBDA.ORACLE.COM

 

Valid starting     Expires            Service principal

07/23/18 07:03:46  07/24/18 07:03:46  krbtgt/MARTYBDA.ORACLE.COM@MARTYBDA.ORACLE.COM

    renew until 07/30/18 07:03:46

[oracle@vm04 admin]$ mtactl stop bds_orcl_martybda

 

...

 

[oracle@vm04 admin]$ mtactl start bds_orcl_martybda

 

...

 

[oracle@scaj0602bda09vm04 admin]$ sqlplus / as sysdba

 

...

 

SQL> alter session set container=PDBORCL;

 

Session altered.

 

SQL> select * from bds_test;

 

LINE

----------------------------------------

line1

line2

 

SQL> 

 
Kerberos ticket missed on the Hadoop side
Another case when Kerberos ticket is misses on the Hadoop side (for Oracle user). Let's take a look what is going to be if we have such case. For this I also will use kdestroy command tool on each Hadoop node:

[oracle@vm01 ~]$ id

uid=1000(oracle) gid=1001(oinstall) groups=1001(oinstall),127(hive),1002(dba)

[oracle@vm01 ~]$ kdestroy

after perform all this steps, let's go to the database side and run the query again:

[oracle@vm04 bigdata_config]$ sqlplus / as sysdba

 

...

 

 

SQL> alter session set container=PDBORCL;

 

Session altered.

 

SQL> select * from bds_test;

 

LINE

----------------------------------------

line1

line2

 

SQL> 

from the first look everything looks ok, but, let's take a look what is the execution statistics:

 

SQL> select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic#;

 

 

NAME                                      VALUE

---------------------------------------------------------------- ----------

cell XT granules requested for predicate offload                1

cell XT granule bytes requested for predicate offload           12

cell interconnect bytes returned by XT smart scan               8192

cell XT granule predicate offload retries                       3

cell XT granule IO bytes saved by storage index                 0

cell XT granule IO bytes saved by HDFS tbs extent map scan      0

and we see that "cell XT granule predicate offload retries" is not equal to 0, which means that all real processing in happens on the database side. If you query 10TB table on HDFS, you will briuse Multi-user ng back all 10TB and will process it all on the database side. Not good. So, if you missed Kerberos ticket on the Hadoop side query will finish, but SmartScan will not work.
 
Renewal of Kerberos tickets
One of the key Kerberos pillar is that tickets have expiration time and user have to renew it. During installation Big Data SQL creates crontab job, which does this on the database side as well as on the Hadoop side. If you miss it for some reasons you could use this one as an example:

[oracle@vm04 ~]$ crontab -l

15 1,7,13,19 * * * /bin/su - oracle -c "/usr/bin/kinit oracle/martybda@MARTYBDA.ORACLE.COM -k -t /u03/app/oracle/12.1.0/dbhome_orcl/bigdatasql/clusters/martybda/oracle.keytab"

one note, that you always will use Oracle principal for Big Data SQL, but if you want to have fine grained  control over access to HDFS, you have to use Multiuser Authorization feature, as explained here.
 
Conclusion
1) Big Data SQL works over Kerberized clusters
2) You have to have Kerberos tickets on the Database side as well as on the Hadoop side
3) If you miss Kerberos ticket on the Database side query will fail
4) If you miss Kerberos ticket on the Hadoop side, query will not fail, but it will work on failback mode, when you move all blocks over the wire on the database node and process it there. You don't want to do so :)

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha