Thursday May 29, 2014

Oracle Data Warehouse and Big Data Magazine MAY Edition for Customers + Partners

Follow us on Facebook Twitter Blogger

The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. The content for this magazine is taken from the various data warehouse and big data Oracle product management blogs, Oracle press releases, videos posted on Oracle Media Network and Oracle Facebook pages. Click here to view the May Edition

Please share this link to our magazine with your customers and partners

This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store

Thursday May 15, 2014

DBAs Guide to Sandboxes vs. Data Marts

I had an interesting response to my first post on the topic of sandboxing (DBA's Guide to Deploying Sandboxes in the Cloud). The following question was asked: what is the difference between a data mart and sandbox?

This is actually a great question so I thought it would be useful to convert my answer into a short blog post. I am sure there will be lots of different opinions on this topic just as there are alternative names for "sandbox environment" (from analytical sandbox, to analytical appliance to discovery zone etc etc) but here is my attempt at an answer:

OLAP1 IndustryDataModels

In my experience data marts tend to be a single subject area data repository and/or linked to a specific corporate application (such as finance, HR, CRM, ERP, logistics, sales tracking etc). The source data is pushed to a specific line of business for analysis. The push and loading processes implements all the necessary data cleansing and transformation routines so the data arrives into its destination schema ready for use. Most importantly, the data push happens on a regular basis and is driven by the needs of the business.  Many customers implement a data lifecycle management workflow to ensure that sufficient historical data is available to support the required analysis. In many cases the life of the data mart is largely open-ended and the IT team will ensure that regular backups and all the usual patching and maintenance operations are performed on a regular basis. Where the mart is seen as a mission critical system then high-availability features can be implemented and in extreme cases a disaster recovery site is setup and data synchronised between the production and DR systems.

The schema itself is typically organized to support reporting requirements and will be based around the standard relational models such as star and/or snowflake schemas although this is not a mandatory requirements. Sometimes a 3NF approach is required to support the particular needs of the business. The majority of queries within the mart are "well-defined" and "well-known" and subject to tuning and monitoring by the DBA team. 

A "sandbox" is generally meant as a non-operational environment where business analysts and data scientists can test ideas, manipulate data and model "what if" scenarios without placing an excessive computational load on the core operational processes. It has a finite life expectancy so that when timer runs out the sandbox is deleted and the associated discoveries are either incorporated into the enterprise warehouse, or data mart, or simply abandoned. The primary driver from an organisational perspective is to use a 'fail-fast" approach. At any one point in time an organization might be running any number of analytical experiments spread across hundreds of sandboxes. However, at some point in time those experiments will be halted and evaluated and the "hardware" resources being consumed will be returned to a general pool for reuse by existing projects or used to create environments for new projects. 

In general terms a sandbox environment is never patched or upgraded, except in exceptional circumstances. There is never a great urgency to apply software or operating system patches so the ITM team will  just incorporated these tasks into the normal cycles.

A sandbox should never be considered mission critical so there is no need to implement high availability features or build and manage a DR environment. If a sandbox becomes unavailable due to a fault (hardware or software) there is no pressing urgency to resolve the issue - in Oracle parlance a "sandbox going down" is not a P1 issue for support.

Below is a summary of how I view the differences between marts and sandboxes:

Business Centric Attributes

Data Mart


Business scope

Single subject area

Potentially a mixing pot of data sourced from multiple systems

Core objective

Managing the business

Discovery of new products, markets and/or customer segments

Query scope

Batch reporting for dashboards and pre-configured reports along with limited ad-hoc analysis

Ad –hoc data discovery

Frequency of update

Regular, scheduled data loads

One-off and ad-hoc loads as required

Data Volumes

Driven by external factors such as GRC requirements

Driven by needs of project

Data Quality

Very important – data is fully cleansed and transformed during load process

Raw data is loaded, transformations, cleansing and enhancements are incorporated into discovery process

Typical Output

Historical reports, KPIs, scorecards, multi-dimensional analysis

Data mining models: forecasts, predictions, scoring

Sophisticated analytics (aggregations, spatial, graph etc)

Typical query patterns

Pre-defined patterns returning small data sets, easily tuned

Complex ad hoc queries over massive data volumes

IT Centric Attributes

Data Mart


Mission critical



Performance SLAs



Individual/LOB chargeback for resource usage



HA features/DR Site




Full + incremental



As required to resolve specific issues along with scheduled maintenance programs

Only when necessary

Use of beta software


Possibly if project needs specific features

Life expectancy


Limited – typically 90 days or less

Number of instances

Most companies choose to
implent a very small number of subject-specific marts

Large companies likely to have many hundreds of sandoxes running at any given point in time

Table: Key differences between data mart and sandbox

As you can see from this table, in some ways sandboxes are similar to data marts and in other ways they are not. For me, the key difference is in the life expectancy - a sandbox should never outstay its welcome. The best sandbox environments that I have come across are those where strict time limits are enforced on their duration. If you let a sandbox live on too long then you run the danger of it morphing into a shadow data mart and that is a very dangerous situation if you look at the attributes and descriptions listed in the first table (Business Centric Attributes). 

So why is there so much confusion about the differences between marts and sandboxes? Much of this is down to niche vendors trying to jump on specific marketing bandwagons. At the moment the latest marketing bandwagon is the concept of "analytical databases" which in reality is nothing more than a data mart (and in many cases these vendors are simple peddling highly specialised data silos). These niche vendor platforms are simply not designed to run hundreds of environments with resources being continually returned to a centralised pool for redistribution to existing or new projects - which is a core requirement for effective sandboxing.

Over the next 2-3 months I am will share why and how the unique features of Oracle Database 12c provide the perfect platform for supporting environments running hundreds of thousands of sandbox-driven projects.

Tuesday May 13, 2014

Announcing: Big Data Lite VM 3.0

Last week we released Big Data Lite VM 3.0. It contains the latest update on the VM for the entire stack.

Oracle Enterprise Linux 6.4
Oracle Database 12c Release 1 Enterprise Edition ( with Oracle Advanced Analytics, Spatial & Graph, and more
Cloudera’s Distribution including Apache Hadoop (CDH 5.0)
Oracle Big Data Connectors 3.0
        Oracle SQL Connector for HDFS 3.0.0
        Oracle Loader for Hadoop 3.0.0
        Oracle Data Integrator 12c
        Oracle R Advanced Analytics for Hadoop 2.4.0
        Oracle XQuery for Hadoop 3.0.0
Oracle NoSQL Database Enterprise Edition 12cR1 (3.0.5)
Oracle JDeveloper 11g
Oracle SQL Developer 4.0
Oracle Data Integrator 12cR1
Oracle R Distribution 3.0.1

The download page is on OTN in its usual place.

Friday May 09, 2014

New 12c sessionization analytics workshop now available on OLL

I have just uploaded a new workshop on sessionization analytics using the 12c pattern matching feature, MATCH_RECOGNIZE, to the Oracle Learning Library. The workshop is based on analysis of the log files generated by our the Big Data Lite Movieplex application, which is part of our Big Data Lite virtual machine. Oracle Movieplex is a fictitious on-line movie streaming company. Customers log into Oracle MoviePlex where they are presented with a targeted list of movies based on their past viewing behavior. Because of this personalised experience and reliable and fast performance, customers spend a lot of money with the company and it has become extremely profitable. 

All the activity from our application is captured in a log file and we are going to analyze the data captured in that file by using SQL pattern matching to create a sessionization result set for our business users and data scientists to explore and analyze. The sections in the workshop (I have recorded a video of this workshop, see links below) will step you through the process of creating our sessionization result set using the Database 12c pattern matching features.

BD VM sessionization

The workshop and video are available on the Oracle Learning Library using the following links:

For more information (whitepapers, multi-media Apple iBooks, tutorials etc) about SQL pattern matching and analytical SQL then checkout our home page on OTN:



The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« May 2014 »