X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Big Data SQL Quick Start. Security - Part4.

Alexey Filanovskiy
Product Manager
Big Data SQL is the way to access data that stored in HDFS through Oracle RDBMS, using Oracle external table mechanism. In a context of security "table" is the key word, which means that you may apply standard security approaches to those tables. Today I want to give you couple examples with:

- Oracle Virtual Private Database (VPD)

and

- Oracle Data Redaction features. 

Oracle Data Redaction

I don't want to replace documentation of the Oracle Data Redaction within this blog, all available information you could find there, but just let me remind you the main idea of this feature. Very often databases contain sensitive data, like credit card number, SSN or other personal information. It could be useful to have this data in unchanged format for resolving different issue with billing department, but at the same time another department (like call center) may need only part of this information (like 4 last digits of the credit cards) and for security compliance you are not able to show them original data.

For  meet this requirement you may:

- Create tokenized clone of the data

- Use Data Redaction feature, which will transform data on the fly, whenever your unprivileged user query the data.

let me give you an example.

1) on HDFS I store sensitive data, which contain personal data (email)

$ hadoop fs -cat /user/root/benchmarks/bigbench/data/customer/customer_39.dat|tail -2
5320266|AAAAAAAAAAALQSFQ|336572|4638|2730942|87062|53717|Dr.|Denice|Gist|N|23|10|1948|ESTONIA|aaNfxigjuuod4|Denice.Gist@mailmaniacs.com|R
5320267|AAAAAAAAAAALQSFR|22724|6447|3284921|62068|25487|Sir|Robert|Roy|N|28|9|1926|ANGOLA|BJ89W9eug0z|Robert.Roy@moose-mail.com|q

2) In Oracle RDBMS I created the external table, that link on this HDFS directory (through hive table):

SQL> create table customer_csv (
c_customer_sk number(10,0),
c_customer_id char(16),
c_current_cdemo_sk number(10,0),
c_current_hdemo_sk number(10,0),
c_current_addr_sk number(10,0),
c_first_shipto_date_sk number(10,0),
c_first_sales_date_sk number(10,0),
c_salutation char(10),
c_first_name char(20),
c_last_name char(30),
c_preferred_cust_flag char(1),
c_birth_day number(10,0),
c_birth_month number(10,0),
c_birth_year number(10,0),
c_birth_country varchar(20),
c_login char(13),
c_email_address char(50),
c_last_review_date char(10)
)
organization external (TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(com.oracle.bigdata.cluster=cluster1
 com.oracle.bigdata.tablename=tpcds.customer
)
)
PARALLEL REJECT LIMIT UNLIMITED;

 3) Now let's try to query data from this table:

so, we are able to see all data about a user, including emails. Let's mask it. 

4) For this only one what I need is create policy and add the rule (fortunately, email masking is the pre-created function).

5) and after this try to query data again:

Emails are hidden. It's only one example how to use data redaction feature. All details you could find in the documentation, but it's important to understand that all Oracle Data Redaction features could be used over HDFS data and you could use all mature and proven feature of Oracle database for protecting your Hadoop data.

Virtual Private Database.

Again I'm not pretending to replace VPD documentation, all necessary details you could find there, I just repeat that this feature also available for an external table (which means over HDFS) as well as over permanent table. The main idea of VPD is to show different part of the table and different data set depend on user privileges. For example, if I have few departments and few roles (like manager/not manager) I can show different data from a table, that contains salary of the employees, depending on their role. If user Bob and Anne I'll show salaries of their directs. For Tim, who are not a manager I'll show only his salary.

let's have another example with our previous table customer. Let's imagine, that we have two guys - Mark, who could see only the data of Sweden customers and Jose, who are able to see only Mexican customers. 

Both of this guys want to work over the single table without any extra movements. Only what they want to do is log in as Mark or Jose and see only Sweden or Mexican customers. Virtual Private Database could do this. let me give you an example how it possible to implement this

1)  Create support table, that will store a list of the users and filter conditions for each one:

SQL> create table employee (
 ID NUMBER,
 country_name VARCHAR2(256),
 name    varchar2(30)
);

 2) Add data there (Mark and Jose information)

SQL> insert into employee values (1,'SWEDEN', 'mark');
SQL> insert into employee values (2,'MEXICO','jose'); 

3) Create Oracle users and give them all necessary permissions:

SQL> create user mark identified by welcome1 default tablespace users temporary tablespace temp;
SQL> create user jose identified by welcome1 default tablespace users temporary tablespace temp;
SQL> grant all on bds.customer_csv to mark;
SQL> grant all on bds.customer_csv to jose;
SQL> grant all on bds.employee to mark;
SQL> grant all on bds.employee to jose;
SQL> grant create session to mark;
SQL> grant create session to jose;

4)  Create PL/SQL package that will generate predicate depend on the user login. if the user missing in the table (employee) will be generated condition "1=1", that means show all data. Depends on application logics, you could generate "1=0" predicate to hide all data.

SQL> CREATE OR REPLACE PACKAGE pck_vpd AS
   p_country_name employee.country_name%TYPE;
   PROCEDURE set_country(v_country_name employee.country_name%TYPE);
   FUNCTION predicate(obj_schema VARCHAR2,
                      obj_name   VARCHAR2) RETURN VARCHAR2;
END pck_vpd;

SQL> create or replace package body pck_vpd as
  procedure set_country(v_country_name employee.country_name%TYPE) is
  begin
    p_country_name := v_country_name;
  end set_country;
   function predicate (obj_schema varchar2, obj_name VARCHAR2) return varchar2 is
  BEGIN
    IF p_country_name IS NOT NULL THEN 
    return 'c_birth_country = ' ||''''|| p_country_name||'''';
    ELSE
      RETURN '1=1';
    END IF;
  end predicate;
end pck_vpd;

5) Create VPD policy over protected table, where we specify previous PL/SQL package as predicate generator:

BEGIN    
dbms_rls.add_policy(object_schema => 'bds',
  object_name     => 'CUSTOMER_CSV',
  policy_name     => 'CHOOSE_CONTRY',
  function_schema => 'BDS',
  policy_function => 'pck_vpd.predicate',
  statement_types => 'select,update,delete');
END;

6) Create login trigger, that  allows us to catch the user name and define proper context:

create or replace trigger trg_vpd
  after logon on database
declare  
v_country_name bds.employee.country_name%type;
BEGIN    
select country_name into v_country_name  
from bds.employee
where upper(NAME) = sys_context('userenv','session_user');  
bds.pck_vpd.set_country(v_country_name);  
EXCEPTION    
WHEN NO_DATA_FOUND
THEN      
v_country_name:=NULL;
END;

7) And now time to test! Let's run the same query as different users and compare results:

And we got exactly what we expected. Mark could see only Sweden customers (and total our company has 33168 customers from Sweden), Jose could see only Mexican customers (total 32939) and other users could see all customers. VPD generate context predicate depend on who are login into a database, for example when Mark runs query:

SQL> select * from customer_csv

VPD exchange it to:

SQL> select * from customer_csv
where
c_birth_country='SWEDEN'

Hopefully today's introduction was valuable and let you understand that you have pretty wide security capabilities with Big Data SQL

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content

Oracle

Integrated Cloud Applications & Platform Services