Friday Dec 20, 2013

BAE Systems Choose Big Data Appliance for Critical Projects

Here's another great story about how to use data warehousing and big data technologies to solve real world problems using diverse sets of data using Oracle technology. BAE Systems is taking unstructured, semi-structured, operational and social media data and using it to solve complex problems such as financial crime, cyber security and digital transformation. The volumes of data that BAE deals with are very large and this creates its own set of challenges and problems in terms of optimising hardware and software to work efficiently and effectively together. Although BAE had their own in-house Hadoop experts they chose Oracle Big Data Appliance for their Hadoop cluster because it’s easier, cheaper, and faster to operate.

BAE is working with many telco customers to explore the new areas that are being opened up by the use of big data to manage browsing data and call record data. These data sources are being transformed to provide additional insight for the network operations teams, analysis of customer quality and to drive marketing campaigns.

 

BAE

 

Click on the image to watch the video, or click here: http://medianetwork.oracle.com/video/player/2940549413001

Thursday Dec 19, 2013

SQL Analytics Part 2- Key Concepts

This post continues on from my first post on analytical SQL "introduction to SQL for reporting and analysis" which looked at the reasons why it makes sense to use analytical SQL in your data warehouse and operational projects.  In this post we are going to examine the key processing concepts behind analytical SQL.  

One of the main advantages of Oracle's SQL analytics is that the key concepts are shared across all functions - in effect we have created a unified SQL framework for delivering analytics. These concepts build on existing SQL features to provide developers and business users with a framework that is both flexible and powerful in terms of its ability to support sophisticated calculations. There are four key concepts that you need to understand when implementing features and functions relating to SQL analytics:

  1. Process order
  2. Result-set Partitions
  3. Windows
  4. Current Row

Let's look at each of these topics in more detail.

1) Processing order.

The execution workflow for SQl statements containing analytical SQL is relatively simple:  first all the HAVING, GROUP BY and JOIN predicates are processed. The output from this step is then passed to the analytical functions so all the calculations can be applied. This typically involves the use of window functions which are applied based on the partitions that have been defined with analytic functions applied to each row in each partition. Finally the ORDER BY clause is processed to provide control over the final output. It is useful to keep this workflow in your mind when you are building your analytical SQL because it will help you understand the inputs flowing into your analytical functions and the resulting output.  

2) Result-set partitions

Oracle's analytic functions allow the input data set to be divided into groups of rows which are referred to as "partitions". It is important to note that in this context the term "partition" is completely unrelated to the table partition feature.

These analytical partitions are created after the groups defined with GROUP BY clauses and are can be used by any analytical aggregate functions such as sums and averages. The partitions can be based on any column that is part of the the input data  set and individual partitions can be any size. It is quite possible to create a single partition contain all the rows from the initial query result set or create a small number of very large partitions or a large number of very small partitions where each partition just contains a few rows.

3) Windows

For each row in a partition it is possible to define a window over the data which determines the range of rows used to perform the calculations for the current row (the next section will explain the concept of the "current row")/ The size of a window can be based on either a physical number of rows or a logical interval, which is typically time-based. The window has a starting row and an ending row and depending on how the window is defined it may move at only one end or, in some cases, both ends.

Physical windows

For example a cumulative sum function would have its starting row fixed at the first row in the partition and the ending row would then slide from the starting row all the way to the last row of the partition to create a running total over the rows in the partition. 

SELECT Qtrs
, Months
, Channels
, Revenue
, SUM(Revenue) OVER (PARTITION BY Qtrs) AS Qtr_Sales
, SUM(Revenue) OVER () AS Total_Sales
FROM sales_table


Window Fixed 1

Logical windows

f the data set contains a date column then it is possible to use logical windows by taking advantage of Oracle’s built-in time awareness.  A good example of window where the start row changes is the calculation of a moving average. In this case both the starting and end points slide so that a constant physical or logical range is maintained during the processing. The example below creates a four-period moving average and the images show the current-row, which is identified by the arrow, and the moving window, which is marked as the pink area :

Window 1 Window 2
Window 3 Window 4
Window 5 Window 6

The concept of a "window" is very powerful and provides a lot of flexibility in terms of being able to interact with the data. A window can be set as large as all the rows in a partition. At the other extreme it could be just a single row. Users may specify a window containing a constant number of rows, or a window containing all rows where a column value is in a specified numeric range. Windows may also be defined to hold all rows where a date value falls within a certain time period, such as the prior month.

When using window functions, the current row is included during calculations, so you should only specify (n-1) when you are dealing with n items - see the next section for more information….

4) Current Row

Each calculation performed with an analytic function is based on a current row within a partition. The current row serves as the reference point and during processing it begins at the starting row, moves throw the following rows until the end row of the window is reached. For instance, a centered moving average calculation could be defined with a window that holds the current row, the six preceding rows, and the following six rows. In the example below the calculation of a running total would be the result of the current row plus the values from the preceding two rows. At the end of the window the running total will be reset. The example shown below creates running totals within a result set showing the total sales for each channel within a product category within year:

SELECT calendar_year
, prod_category_desc
, channel_desc
, country_name
, sales
, units
, SUM(sales) OVER (PARTITION BY calendar_year, prod_category_desc, channel_desc order by country_name) sales_tot_cat_by_channel
FROM . . .

SQL A Current Row

Summary

This post has outlined the four main processing concepts behind analytical SQL. The next series of posts will provide an overview of the key analytical features and functions that use these concepts. In the next blog post we will review the analytical SQL features and techniques that are linked to enhanced reporting which includes: windowing, lag-lead, reporting aggregate functions, pivoting operations and data densification for reporting and time series calculations. Although these topics will be presented in terms of data warehousing, they are actually applicable to any activity needing analysis and reporting. 

If you have any questions or comments about analytical SQL then feel free to contact me via this blog.

Technorati Tags: , , , ,

Friday Dec 13, 2013

New! Oracle DW-Big Data Monthly Roundup Magazine

Are you interested in learning how Oracle customers are taking advantage of Oracle's Data Warehousing and Big Data Platform?  Want to keep up on the latest product releases and how they might impact your organization?  Looking for best practices that describe how to most effectively apply Oracle DW-Big Data technology?

Check out Oracle's new monthly magazine - Oracle DW-Big Data Monthly Roundup.  Powered by Flipboard - you can now view all this content on your favorite device. 

Let us know what you think!


Thursday Dec 12, 2013

Oracle releases Exadata X4 with optimizations for data warehousing

Exadata top closed 0056

Support Quote

”Oracle Exadata Database Machine is the best platform on which to run the Oracle Database and the X4 release extends that value proposition,” said Oracle President Mark Hurd. “As private database clouds grow in popularity, the strengths of Oracle Exadata around performance, availability and quality of service set it apart from all alternatives.”

We have just announced the release of the fifth-generation of our flagship database machine: Oracle Exadata Database Machine X4. This latest release introduces new hardware and software to accelerate performance, increase capacity, and improve efficiency and quality-of-service for enterprise data warehouse deployments.

Performance of all data warehousing workloads is accelerated by new flash caching algorithms that focus on table and partition scan workloads that are common in Data Warehouses. Tables that are larger than flash are now automatically partially cached in flash and read concurrently from both flash and disk to speed throughput.

Other key highlights are:

1) Improved workload management
Exadata X4-2 includes new workload management features that will improve the management of data warehouse workloads. Exadata now has the unique ability to transparently prioritize requests as they flow from database servers, through network adapters and network switches, to storage, and back.

We are using a new generation of InfiniBand network protocols to ensure that network-intensive workloads such as reporting, batch and backups do not delay response-time sensitive interactive workloads. Which is great news for IT teams that have to define and manage service level agreements.

2) Bigger flash cache for even faster performance
We have increased the amount of physical flash within a full rack to 44 TB per full rack. However, the capacity of the logical flash cache has increased by 100% to 88 TB per full rack.

3) Hardware driven compression/decompression

A feature that is unique to Exadata is the Flash Cache Compression. This transparently compresses database data into flash using hardware acceleration to compress and decompress data with zero performance overhead.

4) In-memory processing
For in-memory workloads we increased maximum memory capacity by 100% to 4TB in full rack (using memory expansion kits) which means more workloads will be able to run in-memory with extremely fast response times.

5) Increased support for big data
To support big data projects we increased the capacity of the high performance disks to over 200 TB per full rack and for high capacity disks the storage capacity is now 672 TB per full rack. Once you factor in Oracle Exadata's compression technologies then a full rack is capable of storing petabytes of user data. 

The full press release is here: http://www.oracle.com/us/corporate/press/2079925

Wednesday Dec 11, 2013

dunnhumby increases customer loyalty with Oracle Big Data

dunnhumby presented at this year's OpenWorld where they outlined the how and why of data warehousing on Exadata.  Our engineered system delivered a performance improvement of more than 24x. dunnhumby pushes its data warehouse platform really hard with more than 280 billion fact rows and 250 million dimension rows for one large retailer client alone, dunnhumby’s massive data requires the best performance the industry has to offer.

In Oracle Exadata, dunnhumby has found that solution. Using Oracle Exadata’s advanced Smart Scan technology and robust Oracle Database features. This new environment has empowered its analysts to perform complex ad hoc queries across billions of fact rows and hundreds of millions of dimension rows in minutes or seconds, compared to hours or even days on other platforms. 

You can download the presentation by Philip Moore - Exadata Datawarehouse Architect, Dunnhumby USA LLC -  from the OpenWorld site, see here: https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=3412.

If you missed Philip's session at OpenWorld then we have just released a new video interview with Chris Wones, Director of Data Solutions at dunnhumby. During the interview Chris outlines some of the challenges his team faced when trying to do joined up analytics across disparate and disconnected data sets and how Exadata allowed them to bring everything together so that they could run advanced analytical queries that were just not possible before and that meant being able to bid on completely new types of contracts. The combination of Exadata and Oracle Advanced Analytics are delivering real business benefit to dunnhumby and its customers.

For more information about Oracle's Advanced Analytics option checkout Charlie Berger's advanced analytics blog: http://blogs.oracle.com/datamining and Charlie's twitter feed: https://twitter.com/CharlieDataMine

To watch the video click on the image: 

Dunnhumby

If the video does not start follow this link: http://medianetwork.oracle.com/video/player/2889835899001

Tuesday Dec 03, 2013

Big Data - Real and Practical Use Cases: Expand the Data Warehouse

As a follow-on to the previous post (here) on use cases, follow the link below to a recording that explains how to go about expanding the data warehouse into a big data platform.

The idea behind it all is to cover a best practice on adding to the existing data warehouse and expanding the system (shown in the figure below) to deal with:

  • Real-Time ingest and reporting on large volumes of data
  • A cost effective strategy to analyze all data across types and at large volumes
  • Deliver SQL access and analytics to all data 
  • Deliver the best query performance to match the business requirements

Roles of the Components

To access the webcast:

  • Register on this page
  • Scroll down and find the session labeled "Best practices for expanding the data warehouse with new big data streams"
  • Have fun
you want the slides, go to slideshare.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« December 2013 »
SunMonTueWedThuFriSat
1
2
4
5
6
7
8
9
10
14
15
16
17
18
21
22
23
24
25
26
27
28
29
30
31
    
       
Today