X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

  • September 7, 2018

See How Easily You Can Query Object Store with Big Data Cloud Service (BDCS)

Alexey Filanovskiy
Product Manager

What is Object Store?

Object Store become more and more popular storage type especially in a Cloud. It provides some benefits, such as:

- Elasticity. Customers don't have to plan ahead how many space to they need. Need some extra space? Simply load data into Object Store

- Scalability. It's infinitely scale. At least theoretically :)

- Durability and Availability. Object Store is first class citizen in all Cloud Stories, so all vendors do all their best to maintain 100% availability and durability. If some diet will go down, it shouldn't worry you. If some node with OSS software will go down, it shouldn't worry you. As user you have to put data there and read data from Object Store. 

- Cost. In a Cloud Object Store is most cost efficient solution.

Nothing comes for free and as downside I may highlight:

- Performance in comparison with HDFS or local block devices. Whenever you read data from Object Store, you read it over the network.

- Inconsistency of performance. You are not alone on object store and obviously under the hood it uses physical disks, which have own throughput. If many users will start to read and write data to/from Object Store, you may get performance which is different with what you use to have a day, week, month ago

- Security. Unlike filesystems Object Store has not file grain permissions policies and customers will need to reorganize and rebuild their security standards and policies.

based on conclusions that we made above we may conclude, that Object Store is well suitable as way to share data across many systems as well as historical layer for certain Information management systems. If we will compare Object Store with HDFS (they are both are Schema on Read system, which simply store data and define schema on runtime, when user run a query), I'm personally could differentiate it like HDFS is "Write once, read many", Object Store is "Write once, read few". So, it's more historical (cheaper and slower) than HDFS. 

In context of Information Data Management Object Store may make place on the bottom of Pyramid:

How to copy data to Object Store

Well, let's imagine that we do have Big Data Cloud Service (BDCS) and want to archive some data from HDFS to Object Store (for example, because we run out of capacity on HDFS). There are multiple ways to do this (I've wrote about this earlier here), but I'll pick ODCP - oracle build tool for coping data between multiple sources, including HDFS and Object Store. Full documentation you could find here, but I only show a brief example how I did it on my test cluster.

First we will need to define Object store on client node(in my case it's one of BDCS node), where we will run a client:

[opc@node01 ~]$ export CM_ADMIN=admin

[opc@node01 ~]$ export CM_PASSWORD=Welcome1!

[opc@node01 ~]$ export CM_URL=https://cmhost.us2.oraclecloud.com:7183

[opc@node01 ~]$ bda-oss-admin add_swift_cred --swift-username "storage-a424392:alexey@oracle.com" --swift-password "MyPassword-" --swift-storageurl "https://storage-a422222.storage.oraclecloud.com/auth/v2.0/tokens" --swift-provider bdcstorage

After this we may check, that it appears:

[opc@node01 ~]$ bda-oss-admin list_swift_creds -t

PROVIDER  USERNAME                                                    STORAGE URL                             

bdcstorage storage-a424392:alexey.filanovskiy@oracle.com               https://storage-a422222.storage.oraclecloud.com/auth/v2.0/tokens

after we will need to copy data from HDFS to Object Store:

[opc@node01 ~]$ odcp hdfs:///user/hive/warehouse/parq.db/ swift://tpcds-parq.bdcstorage/parq.db

...

[opc@node01 ~]$ odcp hdfs:///user/hive/warehouse/csv.db/ swift://tpcds-parq.bdcstorage/csv.db

now we have data in Object store:

[opc@node01 ~]$  hadoop fs -du -h  swift://tpcds-parq.bdcstorage/parq.db

...

74.2 K   74.2 K   swift://tpcds-parq.bdcstorage/parq.db/store

14.4 G   14.4 G   swift://tpcds-parq.bdcstorage/parq.db/store_returns

272.8 G  272.8 G  swift://tpcds-parq.bdcstorage/parq.db/store_sales

466.1 K  466.1 K  swift://tpcds-parq.bdcstorage/parq.db/time_dim

...

 
good time to define table in Hive Metastore, I'll show example with only one table, rest I did with script:

0: jdbc:hive2://node03:10000/default> CREATE EXTERNAL TABLE store_sales

( ss_sold_date_sk           bigint

, ss_sold_time_sk           bigint

, ss_item_sk                bigint

, ss_customer_sk            bigint

, ss_cdemo_sk               bigint

, ss_hdemo_sk               bigint

, ss_addr_sk                bigint

, ss_store_sk               bigint

, ss_promo_sk               bigint

, ss_ticket_number          bigint

, ss_quantity               int

, ss_wholesale_cost         double

, ss_list_price             double

, ss_sales_price            double

, ss_ext_discount_amt       double

, ss_ext_sales_price        double

, ss_ext_wholesale_cost     double

, ss_ext_list_price         double

, ss_ext_tax                double

, ss_coupon_amt             double

, ss_net_paid               double

, ss_net_paid_inc_tax       double

, ss_net_profit             double

)

STORED AS PARQUET

LOCATION 'swift://tpcds-parq.bdcstorage/parq.db/store_sales'

 
Make sure that you have required libraries in place for Hive and for Spark:

[opc@node01 ~]$ dcli -C cp /opt/oracle/bda/bdcs/bdcs-rest-api-app/current/lib-hadoop/hadoop-openstack-spoc-2.7.2.jar /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/hadoop-mapreduce/

[opc@node01 ~]$ dcli -C cp /opt/oracle/bda/bdcs/bdcs-rest-api-app/current/lib-hadoop/hadoop-openstack-spoc-2.7.2.jar /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/jars/

 
Now we are ready for test!
 
Why should you use smart data formats? Predicate Push Down
In Big Data world there are type of file formats called Smart (for example, ORC files and Parquetfiles). They have metadata in file, which allow dramatically speed up query performance for some queries. The most powerful feature is Predicate Push Down. This feature allows to filter data in place where it actually is without moving over the network. Each Parquet file page have Minimum and Maximum value, which allows us to skip entire page. Follow SQL predicates could be used for filtering data:
  • <
  • <=
  • =
  • !=
  • >=
  • >
so, it's better to see once rather than heat many times. 

0: jdbc:hive2://node03:10000/default> select count(1) from parq_swift.store_sales;

...

+-------------+--+

|     _c0     |

+-------------+--+

| 6385178703  |

+-------------+--+

1 row selected (339.221 seconds)

 
We could take a look on the resource utilization and we could note, that Network quite heavily utilized.
Now, let's try to do the same with csv files:

0: jdbc:hive2://node03:10000/default> select count(1) from csv_swift.store_sales;

+-------------+--+

|     _c0     |

+-------------+--+

| 6385178703  |

+-------------+--+

1 row selected (762.38 seconds)

As we can see all the same - high network utilization, but query takes even longer. It's because CSV is row row format and we could not do column pruning.
 
so, let's try to feel power of Predicate Push Down and let's use some equal predicate in the query:

0: jdbc:hive2://node03:10000/default> select count(1) from parq_swift.store_sales where ss_ticket_number=50940847;

...

+------+--+

| _c0  |

+------+--+

| 6    |

+------+--+

1 row selected (74.689 seconds)

 
Now we can see that in case of parquet files we almost don't utilize network. Let's see how it's gonna be in case of csv files.

0: jdbc:hive2://node03:10000/default> select count(1) from csv_swift.store_sales where ss_ticket_number=50940847;

...

+------+--+

| _c0  |

+------+--+

| 6    |

+------+--+

1 row selected (760.682 seconds)

well, as assumed, csv files don't get any benefits out of WHERE predicate.
But, not all functions could be offloaded. To illustrate this I run query with cast function over parquet files:

0: jdbc:hive2://node03:10000/default> select count(1) from parq_swift.store_sales where cast(ss_promo_sk as string) like '%3303%';

...

+---------+--+

|   _c0   |

+---------+--+

| 959269  |

+---------+--+

1 row selected (133.829 seconds)

 
as we can see, we move part of data set to the BDCS instance and process it there.
 
Column projection
another feature of Parquetfiles is column format, which means that then less columns we are using, then less data we bring back to the BDCS. Let me illustrate this by running same query with one column and with 24 columns (I'll use cast function, which is not pushed down).

0: jdbc:hive2://node03:10000/default> select ss_ticket_number from parq_swift.store_sales

. . . . . . . . . . . . . . . . . . . . > where

. . . . . . . . . . . . . . . . . . . . > cast(ss_ticket_number as string) like '%50940847%';

...

127 rows selected (128.887 seconds)

now I run the query over same data, but request 24 columns:

0: jdbc:hive2://node03:10000/default> select 

. . . . . . . . . . . . . . . . . . . . > ss_sold_date_sk           

. . . . . . . . . . . . . . . . . . . . > ,ss_sold_time_sk           

. . . . . . . . . . . . . . . . . . . . > ,ss_item_sk                

. . . . . . . . . . . . . . . . . . . . > ,ss_customer_sk            

. . . . . . . . . . . . . . . . . . . . > ,ss_cdemo_sk               

. . . . . . . . . . . . . . . . . . . . > ,ss_hdemo_sk               

. . . . . . . . . . . . . . . . . . . . > ,ss_addr_sk                

. . . . . . . . . . . . . . . . . . . . > ,ss_store_sk               

. . . . . . . . . . . . . . . . . . . . > ,ss_promo_sk               

. . . . . . . . . . . . . . . . . . . . > ,ss_ticket_number          

. . . . . . . . . . . . . . . . . . . . > ,ss_quantity               

. . . . . . . . . . . . . . . . . . . . > ,ss_wholesale_cost         

. . . . . . . . . . . . . . . . . . . . > ,ss_list_price             

. . . . . . . . . . . . . . . . . . . . > ,ss_sales_price            

. . . . . . . . . . . . . . . . . . . . > ,ss_ext_discount_amt       

. . . . . . . . . . . . . . . . . . . . > ,ss_ext_sales_price        

. . . . . . . . . . . . . . . . . . . . > ,ss_ext_wholesale_cost     

. . . . . . . . . . . . . . . . . . . . > ,ss_ext_list_price         

. . . . . . . . . . . . . . . . . . . . > ,ss_ext_tax                

. . . . . . . . . . . . . . . . . . . . > ,ss_coupon_amt             

. . . . . . . . . . . . . . . . . . . . > ,ss_net_paid               

. . . . . . . . . . . . . . . . . . . . > ,ss_net_paid_inc_tax       

. . . . . . . . . . . . . . . . . . . . > ,ss_net_profit             

. . . . . . . . . . . . . . . . . . . . > from parq_swift.store_sales

. . . . . . . . . . . . . . . . . . . . > where

. . . . . . . . . . . . . . . . . . . . > cast(ss_ticket_number as string) like '%50940847%';

...

127 rows selected (333.641 seconds)

 
​I think after seen these numbers you will always put only columns that you need.
 
Object store vs HDFS performance
Now, I'm going to show example of performance numbers for Object Store and for HDFS. It's not official benchmark, just numbers, which could give you idea how compete performance over Object store vs HDFS.

Querying Object Store with Spark SQL

as s bonus I'd like to show who to query object store with Spark SQL.

 

[opc@node01 ~]$ spark2-shell 

....

scala> import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.SparkSession

 

scala> val warehouseLocation = "file:${system:user.dir}/spark-warehouse"

warehouseLocation: String = file:${system:user.dir}/spark-warehouse

 

scala> val spark = SparkSession.builder().appName("SparkSessionZipsExample").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()

18/07/09 05:36:32 WARN sql.SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.

spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@631c244c

 

scala> spark.catalog.listDatabases.show(false)

+----------+---------------------+----------------------------------------------------+

|name      |description          |locationUri                                         |

+----------+---------------------+----------------------------------------------------+

|csv       |null                 |hdfs://bdcstest-ns/user/hive/warehouse/csv.db       |

|csv_swift |null                 |hdfs://bdcstest-ns/user/hive/warehouse/csv_swift.db |

|default   |Default Hive database|hdfs://bdcstest-ns/user/hive/warehouse              |

|parq      |null                 |hdfs://bdcstest-ns/user/hive/warehouse/parq.db      |

|parq_swift|null                 |hdfs://bdcstest-ns/user/hive/warehouse/parq_swift.db|

+----------+---------------------+----------------------------------------------------+

 

 

scala> spark.catalog.listTables.show(false)

+--------------------+--------+-----------+---------+-----------+

|name                |database|description|tableType|isTemporary|

+--------------------+--------+-----------+---------+-----------+

|customer_demographic|default |null       |EXTERNAL |false      |

|iris_hive           |default |null       |MANAGED  |false      |

+--------------------+--------+-----------+---------+-----------+

 

 

scala> val resultsDF = spark.sql("select count(1) from parq_swift.store_sales where cast(ss_promo_sk as string) like '%3303%' ")

resultsDF: org.apache.spark.sql.DataFrame = [count(1): bigint]

 

scala> resultsDF.show()

[Stage 1:==>                                                  (104 + 58) / 2255]

 
in fact there is no difference for Spark SQL between SWIFT and HDFS. All performance considerations which I've motion above. 
 
Parquet files. Warning!
After looking on these results, you may want to convert everything in parquet files, but don't rush to do so. Parquet files is schema-on-write, which means that you do ETL when convert data to it. ETL means optimization as well as probability to make a mistake during this transformation.
This is the example. I have table which has timestamps, which obviously couldn't be less than 0:

hive> create table tweets_parq

 ( username  string,

   tweet     string,

   TIMESTAMP smallint

   ) 

STORED AS PARQUET;

 

hive> INSERT OVERWRITE TABLE tweets_parq select * from  tweets_flex; 

we defined timestamp as smallint, which is not enough for some data:

hive> select TIMESTAMP from tweets_parq

...

------------ 

1472648470

-6744

and as consequence we got overflow and get negative timestamp. Smart files such as parquet is transformation and during this transformation you could make a mistake. it's why it's better to preserve data in original format.

Conclusion

1) Object Store is not competitor for HDFS. HDFS is schema on read system, which could give you good performance (but definitely lower than schema on write system, such as database). Object Store itself could give you elasticity. It's a good option for historical data, which you plan to use really not frequently.

2) Object Store add significant startup overhead, so it's not suitable for interactive queries.

3) If you put data on Object Store consider to use smart file formats such as Parquet. It could give you benefits of Predicates Push Down as well as column projection

4) Smart files such as parquet is transformation and during this transformation you could make a mistake. it's why it's better to preserve data in original format.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.