Tuesday Jan 31, 2017

How to intelligently aggregate approximations

Simple abacus with coloured beads

The growth of low-cost storage platforms has allowed many companies to actively seeking out new external data sets and combine them with internal historical data that goes back over a very long time frame. Therefore, as both the type of data and the volume of data continue to grow the challenge for many businesses is how to process this every expanding pool of data and at the same time, make timely decisions based on all the available data.
(Image above courtesy of http://taneszkozok.hu/)
In previous posts I have discussed whether an approximate answer is just plain wrong and whether approximate answers really are the best way to analyze big data. As with the vast majority of data analysis at some point there is going to be a need to aggregate a data set to get a higher level view across various dimensions. When working with results from approximate queries, dealing with aggregations can get a little complicated because it is not possible to “reuse” an aggregated approximate result as a basis for aggregating the results to an even higher level across the various dimensions of the original query. To obtain a valid approximate result set requires a query to rescan the source data and compute the required analysis for the given combination of levels. Just because I have a result set that contains a count of the number of unique products sold this week at the county level does not mean that I can simply reuse that result set to determine the number of distinct products sold this week at the state level. In many cases you cannot just rollup aggregations of aggregations. Or maybe you can...

[Read More]

Friday Apr 24, 2015

Managing overflows in LISTAGG

This is an interesting problem that has come up a few times in discussions (and I think it has been mentioned on the SQL forums as well).  When using LISTAGG on very large data sets you can sometimes create a list that is too long and consequently get an ORA-01489: result of string concatenation is too long  error.

Obviously, it is possible to determine in advance if the error is going to occur using some bespoke PL/SQL code and then take appropriate action within your application to manage the rows that contain stings that exceed the VARCHAR2 limit.

Many customers have implemented workarounds to overcome the ORA-01489 error, however, this has mostly involved the use of complex code which has impacted performance. Wouldn’t it be great if there was a simple yet elegant way to resolve this issue? Actually there is and we can use a few of the most recent analytical SQL functions. If you are using Database 12c you can make use of the MATCH_RECOGNIZE function to effectively create chunks of strings that do not exceed the VARCHAR2 limit...

[Read More]

Friday Mar 27, 2015

Why SQL Part 4 - Intelligent and continuous evolution

In the third part of this series of posts on (Why SQL Part 3 - Simple optimization makes life simplerof this series of blog posts I explained that by letting the database perform the optimisations at source, where the data is located, it is possible to share tuning improvements across a wide spectrum of front-end systems (BI reports, dashboards, applications etc). The ability of the database to “own” the optimization and processing characteristics of how a SQL statement is executed is an important element in the success of SQL as a data analysis language. In the whitepaper that forms the basis for this series of blog posts this section is quite brief. This was done for reasons that will become obvious as you work through this blog post

Now that we have looked at the powerful framework that supports SQL and the simplified optimization layer we can move on to look at how SQL has evolved over time to support ever more sophisticated features and functions and compare this with some of the newer open source frameworks.

[Read More]

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« March 2017