Friday Jun 24, 2016

Updates to Big Data Lite 4.5

There are some updates to Big Data Lite 4.5 that you will want to take advantage of:
  • Updated Big Data SQL configuration enables predicate pushdown to engage - which can have pretty dramatic performance benefits
  • Update to the Big Data Spatial framework (fixes a configuration issue)
  • New samples for the Oracle Shell for Hadoop Loaders

To get the latest updates, simply double click on the Refresh Samples icon on the desktop:
refresh-samples

As an example of the BDS performance improvement (and - this is a BDL VM - so the actual numbers aren't nec. interesting.  But, it gets the performance concepts across) - I ran a few tests with the following query:

select count(*)  from table
where custid = 1185972
and activity=6
and genreid=7;

I ran this query 3 times against different tables:

  • Avro - not partitioned - 17s
  • Avro - genre partitions - illustrates partition pruning - 1.1s
  • Parquet - not partitioned - illustrates predicate pushdown into parquet - 0.4s

The VM doesn't have Big Data SQL Cells - so you don't see smartscan and storage indexes - which would show even more performance benefits.

BDS 3.0.x is a great release.  Take it for a spin!

Monday Apr 25, 2016

SQL Pattern Matching Deep Dive - Part 3, greedy vs. reluctant quantifiers

Welcome to the third post in this deep-dive series on SQL pattern matching using the MATCH_RECOGNIZE feature that is part of Database 12c. 

In the first part of this series we looked at a wide range of topics including ensuring query consistency, how to correctly use predicates and how to manage sorting. In the second part we looked at using the built-in measures to understand how a data set is matched to a pattern. 

In this post I am going to review the concepts of greedy and reluctant quantifiers. I will breakdown this down into a number of areas: 1) Overview of regular expressions, 2) understanding quantifiers, and 3) greedy vs. reluctant quantifiers. The examples in this post use the built-in measures to help show the difference between greedy and reluctant matching. If you are not familiar with the MATCH_NUMBER() function or the CLASSIFIER() function then please take some time to read the second post in this series

Overview of regular expressions

[Read More]

Tuesday Apr 12, 2016

SQL Pattern Matching Deep Dive - Part 2, using MATCH_NUMBER() and CLASSIFIER()

Welcome to the second post in this deep dive series on SQL pattern matching using the new MATCH_RECOGNIZE feature that is part of Database 12c. In the first part of this series we looked at the areas of ensuring query consistency, how to correctly use predicates and how to manage sorting.

In this post I am going to review the two built-in measures that we have provided to help you understand how your data set is mapped to the pattern that you have defined. This post will breakdown into three areas: 1) a review of the built-in measures, 2) understanding how to control the output (number of rows returned) and lastly I will bring these two topics together with some examples...

[Read More]

Monday Mar 21, 2016

SQL Pattern Matching deep dive - Part 1

There has been quite a lot of interest in the new 12c MATCH_RECOGNIZE feature for SQL pattern matching. Therefore, I thought now would be the perfect time to start a series of quick deep dive posts that explain how SQL pattern matching works. Over the coming weeks I will cover the following topics in a series of posts.

This is the start of a series of posts based on a presentation that I put together for the recent annual BIWA conference at Oracle HQ. The Oracle BI, DW and Analytics user community always puts on a great conference and this year was the best yet. You can download any or all of the presentations from this year’s conference by following this link. My pattern matching deep dive presentation started life about a year ago as a post covering some of the new keywords in the explain plan that are linked to pattern matching, see here. It has now expanded to cover a much wider range of topics.
The aim of this group of posts is to help you understand the underlying mechanics of the MATCH_RECOGNIZE clause. During these posts we will explore key concepts such as: how to get consistent results, using built-in debugging functions, deterministic vs. non-deterministic state machines, back-tracking (what is it and how to identify when it is occurring), and finally greedy vs. reluctant quantifiers. If you need a quick refresher on how MATCH_RECOGNIZE works then I would recommend that you take a look at the following links[Read More]

Thursday Mar 17, 2016

Big Data SQL 3.0 is now available!

Oracle Big Data SQL 3.0 is now available!  This is an exciting milestone for Oracle.  With support for Cloudera CDH (both on Big Data Appliance and non-Big Data Appliance), Hortonworks HDP and Oracle Database 12c (both Exadata and non-Exadata) - the benefits derived from unified queries across relational, Hadoop and NoSQL stores can now be achieved across a wide breadth of big data deployments.

Hadoop and NoSQL are rapidly becoming key components of today's data management platform, and many Oracle Database customers use Hadoop or NoSQL in their organization. Using multiple data management solutions typically lead to data silos, where different people and applications can only access a subset of the data needed. Big Data SQL offers an industry leading solution to deliver one fast, secure SQL query on all data: in Hadoop, Oracle Database, and NoSQL. Big Data SQL leverages both innovations from Oracle Exadata and specific Hadoop features to push processing down to Hadoop, resulting in minimized data movement and extreme performance for SQL on Hadoop. 

In summary, Big Data SQL 3.0: 

  • Expands support for Hadoop platforms - covering Hortonworks HDP, Cloudera CDH on commodity hardware as well as on Oracle Big Data Appliance
  • Expands support for database platforms - covering Oracle Database 12c on commodity hardware as well as on Oracle Exadata
  • Improves performance through new features like Predicate Push-Down on top of Smart Scan and Storage Indexes 

 

To learn more:

 

Tuesday Mar 15, 2016

Big Data SQL Quick Start. Parallel Query - Part3.

[Read More]

Tuesday Jan 19, 2016

Big Data SQL Quick Start. Introduction - Part1.

Today I am going to explain steps that required to start working with Big Data SQL. It’s really easy!  I hope after this article you all will agree with me. First, if you want to get caught up on what Big Data SQL is, I recommend that you read these blogs: Oracle Big Data SQL: One Fast Query, Big Data SQL 2.0 - Now Available.

The above blogs cover design goals of Big Data SQL. One of the goals of Big Data SQL is transparency. You just define table that links to some directory in HDFS or some table in HCatalog and continue working with it like with general Oracle Database table.It’s also useful to read the product documentation.

Your first query with Big Data SQL

Let’s start with simplest one example and query data that is actually stored in HDFS via Oracle Database using Big Data SQL. I’m going to begin this example by checking of the data that actually lies into HDFS. To accomplish this, I run the hive console and check hive table DDL:

[Read More]

Wednesday Jan 13, 2016

BIWA 2016 - here's my list of must-attend sessions and labs

It’s almost here - the 2016 BIWA conference at the Oracle Conference Center. The conference starts on January 26 with a welcome by the conference leaders at 8:30am. The BIWA summit is dedicated to providing all the very latest information and best practices for data warehousing, big data, spatial analytics and BI. This year the conference has expanded to include the most important query language on the planet: SQL. There will be a whole track dedicated to YesSQL! The full agenda is available here

Unfortunately I won’t be able to attend this year’s conference but if I was going to be there, then this would be my list of must-attend sessions and hands-on labs.

[Read More]

Thursday Dec 24, 2015

Oracle Big Data Lite 4.3.0 is Now Available on OTN

Big Data Lite 4.3.0 is now available on OTN



This latest release is packed with new features - here's the inventory of what's included:

  • Oracle Enterprise Linux 6.7
  • Oracle Database 12c Release 1 Enterprise Edition (12.1.0.2) - including Oracle Big Data SQL-enabled external tables, Oracle Multitenant, Oracle Advanced Analytics, Oracle OLAP, Oracle Partitioning, Oracle Spatial and Graph, and more.
  • Cloudera Distribution including Apache Hadoop (CDH5.4.7)
  • Cloudera Manager (5.4.7)
  • Oracle Big Data Spatial and Graph 1.1
  • Oracle Big Data Discovery 1.1.1
  • Oracle Big Data Connectors 4.3
    • Oracle SQL Connector for HDFS 3.4.0
    • Oracle Loader for Hadoop 3.5.0
    • Oracle Data Integrator 12c
    • Oracle R Advanced Analytics for Hadoop 2.5.1
    • Oracle XQuery for Hadoop 4.2.1
  • Oracle NoSQL Database Enterprise Edition 12cR1 (3.4.7)
  • Oracle Table Access for Hadoop and Spark 1.0
  • Oracle JDeveloper 12c (12.1.3)
  • Oracle SQL Developer and Data Modeler 4.1.2 with Oracle REST Data Services 3.0
  • Oracle Data Integrator 12cR1 (12.2.1)
  • Oracle GoldenGate 12c
  • Oracle R Distribution 3.2.0
  • Oracle Perfect Balance 2.5.0
Also, this release is using github as the repository for all of our sample code (https://github.com/oracle/BigDataLite).  This gives us a great mechanism for updating the samples/demos between releases.  Users simply double click the "Refresh Samples" icon on the desktop to download the latest collateral.

Tuesday Dec 01, 2015

My highlights from DOAG 2015...

Last week I was in Nuremburg at the excellent annual German Oracle user group conference - DOAG15. This was my first visit to DOAG and I was amazed at the size of the event. It is huge but most importantly it is very well organized and definitely one of the best, possibly the best, user conference that I have attended. 

..and then there was the chance to drink some gluhwein with the database PM team on the way to the speaker dinner.

Nuremburg

[Read More]

Friday Nov 20, 2015

Review of Data Warehousing and Big Data at #oow15

DW BigData Review Of OOW15

This year OpenWorld was bigger, more exciting and packed with sessions about the very latest technology and product features. Most importantly, both data warehousing and Big Data were at the heart of this year’s conference across a number of keynotes and a huge number of general sessions. Our hands-on labs were all completely full as people got valuable hands-on time with our most important new features. The key focus areas at this year’s conference were:
  • Database 12c for Data Warehousing
  • Big Data and the Internet of Things 
  • Cloud from on-premise to on the Cloud to running hybrid Cloud systems
  • Analytics and SQL continues to evolve to enable more and more sophisticated analysis. 
All these topics appeared across the main keynote sessions including live on-stage demonstrations of how many of our news features can be used to increase the performance and analytical capability of your data warehouse and big data management system - checkout the on-demand videos for the keynotes and executive interviews....
[Read More]

Tuesday Oct 13, 2015

Big Data SQL 2.0 - Now Available

With the release of Big Data SQL 2.0 it is probably time to do a quick recap and introduce the marquee features in 2.0. The key goals of Big Data SQL are to expose data in its original format, and stored within Hadoop and NoSQL Databases through high-performance Oracle SQL being offloaded to Storage resident cells or agents. The architecture of Big Data SQL closely follows the architecture of Oracle Exadata Storage Server Software and is built on the same proven technology.

Retrieving Data With data in HDFS stored in an undetermined format (schema on read), SQL queries require some constructs to parse and interpret data for it to be processed in rows and columns. For this Big Data SQL leverages all the Hadoop constructs, notably InputFormat and SerDe Java classes optionally through Hive metadata definitions. Big Data SQL then layers the Oracle Big Data SQL Agent on top of this generic Hadoop infrastructure as can be seen below.

Accessing HDFS data through Big Data SQL

Because Big Data SQL is based on Exadata Storage Server Software, a number of benefits are instantly available. Big Data SQL not only can retrieve data, but can also score Data Mining models at the individual agent, mapping model scoring to an individual HDFS node. Likewise querying JSON documents stored in HDFS can be done with SQL directly and is executed on the agent itself.

Smart Scan

Within the Big Data SQL Agent, similar functionality exists as is available in Exadata Storage Server Software. Smart Scans apply the filter and row projections from a given SQL query on the data streaming from the HDFS Data Nodes, reducing the data that is flowing to the Database to fulfill the data request of that given query. The benefits of Smart Scan for Hadoop data are even more pronounced than for Oracle Database as tables are often very wide and very large. Because of the elimination of data at the individual HDFS node, queries across large tables are now possible within reasonable time limits enabling data warehouse style queries to be spread across data stored in both HDFS and Oracle Database.

Storage Indexes

Storage Indexes - new in Big Data SQL 2.0 - provide the same benefits of IO elimination to Big Data SQL as they provide to SQL on Exadata. The big difference is that in Big Data SQL the Storage Index works on an HDFS block (on BDA – 256MB of data) and span 32 columns instead of the usual 8. Storage Index is fully transparent to both Oracle Database and to the underlying HDFS environment. As with Exadata, the Storage Index is a memory construct managed by the Big Data SQL software and invalidated automatically when the underlying files change.

Concepts for Storage Indexes

Storage Indexes work on data exposed via Oracle External tables using both the ORACLE_HIVE and ORACLE_HDFS types. Fields are mapped to these External Tables and the Storage Index is attached to the Oracle (not the Hive) columns, so that when a query references the column(s), the Storage Index - when appropriate - kicks in. In the current version, Storage Index does not support tables defined with Storage Handlers (ex: HBase or Oracle NoSQL Database).

Compound Benefits

The Smart Scan and Storage Index features deliver compound benefits. Where Storage Indexes reduces the IO done, Smart Scan then enacts the same row filtering and column projection. This latter step remains important as it reduces the data transferred between systems.

To learn more about Big Data SQL, join us at Open World in San Francisco at the end of the month.

Friday Apr 24, 2015

Managing overflows in LISTAGG

This is an interesting problem that has come up a few times in discussions (and I think it has been mentioned on the SQL forums as well).  When using LISTAGG on very large data sets you can sometimes create a list that is too long and consequently get an ORA-01489: result of string concatenation is too long  error.

Obviously, it is possible to determine in advance if the error is going to occur using some bespoke PL/SQL code and then take appropriate action within your application to manage the rows that contain stings that exceed the VARCHAR2 limit.

Many customers have implemented workarounds to overcome the ORA-01489 error, however, this has mostly involved the use of complex code which has impacted performance. Wouldn’t it be great if there was a simple yet elegant way to resolve this issue? Actually there is and we can use a few of the most recent analytical SQL functions. If you are using Database 12c you can make use of the MATCH_RECOGNIZE function to effectively create chunks of strings that do not exceed the VARCHAR2 limit...

[Read More]

Tuesday Apr 14, 2015

Statement of Direction -- Big Data Management System

Click here to start reading the Full Statement of Direction. 

Introduction: Oracle Big Data Management System Today 

As today's enterprises embrace big data, their information architectures must evolve. Every enterprise has data warehouses today, but the best-practices information architecture embraces emerging technologies such as Hadoop and NoSQL. Today’s information architecture recognizes that data not only is stored in increasingly disparate data platforms, but also in increasingly disparate locations: on-premises and potentially multiple cloud platforms. The ideal of a single monolithic ‘enterprise data warehouse’ has faded as a new more flexible architecture has emerged. Oracle calls this new architecture the Oracle Big Data Management System, and today it consists of three key components

  • The data warehouse, running on Oracle Database and Oracle Exadata Database Machine, is the primary analytic database for storing much of a company’s core transactional data: financial records, customer data, point- of-sale data and so forth. Despite now being part of a broader architecture, the requirements on the RDBMS for performance, scalability, concurrency and workload management are in more demand than ever; Oracle Database 12c introduced Oracle Database In-Memory (with columnar tables, SIMD processing, and advanced compression schemes) as latest in a long succession of warehouse-focused innovations. The market-leading Oracle Database is the ideal starting point for customers to extend their architecture to the Big Data Management System.
  • The ‘data reservoir’, hosted on Oracle Big Data Appliance, will augment the data warehouse as a repository for the new sources of large volumes of data: machine-generated log files, social-media data, and videos and images -- as well as a repository for more granular transactional data or older transactional data which is not stored in the data warehouse. Oracle’s Big Data Management System embraces complementary technologies and platforms, including open-source technologies: Oracle Big Data Appliance includes Cloudera’s Distribution of Hadoop and Oracle NoSQL Database for data management.
  • A ‘franchised query engine,’ Oracle Big Data SQL, enables scalable, integrated access in situ to the entire Big Data Management System. SQL is the accepted language for day-to-day data access and analytic queries, and thus SQL is the primary language of the Big Data Management System.  Big Data SQL enables users to combine data from Oracle Database, Hadoop and NoSQL sources within a single SQL statement.  Leveraging the architecture of Exadata Storage Software and the SQL engine of the Oracle Database, Big Data SQL delivers high-performance access to all data in the Big Data Management System.

Using this architecture, the Oracle Big Data Management System combines the performance of Oracle’s market-leading relational database, the power of Oracle’s SQL engine, and the cost-effective, flexible storage of Hadoop and NoSQL. The result is an integrated architecture for managing Big Data, providing all of the benefits of Oracle Database, Exadata, and Hadoop, without the drawbacks of independently-accessed data repositories.  

Note that the scope of this statement of direction is the data platform for Big Data. An enterprise Big Data solution would also be comprised of big data tools and big data applications built upon this data platform. 

Read the full Statement of Direction -- Big Data Management System here.


Friday Mar 27, 2015

Why SQL Part 4 - Intelligent and continuous evolution

In the third part of this series of posts on (Why SQL Part 3 - Simple optimization makes life simplerof this series of blog posts I explained that by letting the database perform the optimisations at source, where the data is located, it is possible to share tuning improvements across a wide spectrum of front-end systems (BI reports, dashboards, applications etc). The ability of the database to “own” the optimization and processing characteristics of how a SQL statement is executed is an important element in the success of SQL as a data analysis language. In the whitepaper that forms the basis for this series of blog posts this section is quite brief. This was done for reasons that will become obvious as you work through this blog post

Now that we have looked at the powerful framework that supports SQL and the simplified optimization layer we can move on to look at how SQL has evolved over time to support ever more sophisticated features and functions and compare this with some of the newer open source frameworks.

[Read More]
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« July 2016
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today