Monday Apr 25, 2016

SQL Pattern Matching Deep Dive - Part 3, greedy vs. reluctant quantifiers

Welcome to the third post in this deep-dive series on SQL pattern matching using the MATCH_RECOGNIZE feature that is part of Database 12c. 

In the first part of this series we looked at a wide range of topics including ensuring query consistency, how to correctly use predicates and how to manage sorting. In the second part we looked at using the built-in measures to understand how a data set is matched to a pattern. 

In this post I am going to review the concepts of greedy and reluctant quantifiers. I will breakdown this down into a number of areas: 1) Overview of regular expressions, 2) understanding quantifiers, and 3) greedy vs. reluctant quantifiers. The examples in this post use the built-in measures to help show the difference between greedy and reluctant matching. If you are not familiar with the MATCH_NUMBER() function or the CLASSIFIER() function then please take some time to read the second post in this series

Overview of regular expressions

[Read More]

Tuesday Apr 12, 2016

SQL Pattern Matching Deep Dive - Part 2, using MATCH_NUMBER() and CLASSIFIER()

Welcome to the second post in this deep dive series on SQL pattern matching using the new MATCH_RECOGNIZE feature that is part of Database 12c. In the first part of this series we looked at the areas of ensuring query consistency, how to correctly use predicates and how to manage sorting.

In this post I am going to review the two built-in measures that we have provided to help you understand how your data set is mapped to the pattern that you have defined. This post will breakdown into three areas: 1) a review of the built-in measures, 2) understanding how to control the output (number of rows returned) and lastly I will bring these two topics together with some examples...

[Read More]

Monday Mar 21, 2016

SQL Pattern Matching deep dive - Part 1

There has been quite a lot of interest in the new 12c MATCH_RECOGNIZE feature for SQL pattern matching. Therefore, I thought now would be the perfect time to start a series of quick deep dive posts that explain how SQL pattern matching works. Over the coming weeks I will cover the following topics in a series of posts.

This is the start of a series of posts based on a presentation that I put together for the recent annual BIWA conference at Oracle HQ. The Oracle BI, DW and Analytics user community always puts on a great conference and this year was the best yet. You can download any or all of the presentations from this year’s conference by following this link. My pattern matching deep dive presentation started life about a year ago as a post covering some of the new keywords in the explain plan that are linked to pattern matching, see here. It has now expanded to cover a much wider range of topics.
The aim of this group of posts is to help you understand the underlying mechanics of the MATCH_RECOGNIZE clause. During these posts we will explore key concepts such as: how to get consistent results, using built-in debugging functions, deterministic vs. non-deterministic state machines, back-tracking (what is it and how to identify when it is occurring), and finally greedy vs. reluctant quantifiers. If you need a quick refresher on how MATCH_RECOGNIZE works then I would recommend that you take a look at the following links[Read More]

Friday Feb 19, 2016

Big Data Lite 4.4.0 is now available on OTN

big data lite


It's now available for download on OTN.  Check out this VM to help you learn about Oracle's big data platform.[Read More]

Friday Nov 20, 2015

Review of Data Warehousing and Big Data at #oow15

DW BigData Review Of OOW15

This year OpenWorld was bigger, more exciting and packed with sessions about the very latest technology and product features. Most importantly, both data warehousing and Big Data were at the heart of this year’s conference across a number of keynotes and a huge number of general sessions. Our hands-on labs were all completely full as people got valuable hands-on time with our most important new features. The key focus areas at this year’s conference were:
  • Database 12c for Data Warehousing
  • Big Data and the Internet of Things 
  • Cloud from on-premise to on the Cloud to running hybrid Cloud systems
  • Analytics and SQL continues to evolve to enable more and more sophisticated analysis. 
All these topics appeared across the main keynote sessions including live on-stage demonstrations of how many of our news features can be used to increase the performance and analytical capability of your data warehouse and big data management system - checkout the on-demand videos for the keynotes and executive interviews....
[Read More]

Tuesday Apr 14, 2015

Statement of Direction -- Big Data Management System

Click here to start reading the Full Statement of Direction. 

Introduction: Oracle Big Data Management System Today 

As today's enterprises embrace big data, their information architectures must evolve. Every enterprise has data warehouses today, but the best-practices information architecture embraces emerging technologies such as Hadoop and NoSQL. Today’s information architecture recognizes that data not only is stored in increasingly disparate data platforms, but also in increasingly disparate locations: on-premises and potentially multiple cloud platforms. The ideal of a single monolithic ‘enterprise data warehouse’ has faded as a new more flexible architecture has emerged. Oracle calls this new architecture the Oracle Big Data Management System, and today it consists of three key components

  • The data warehouse, running on Oracle Database and Oracle Exadata Database Machine, is the primary analytic database for storing much of a company’s core transactional data: financial records, customer data, point- of-sale data and so forth. Despite now being part of a broader architecture, the requirements on the RDBMS for performance, scalability, concurrency and workload management are in more demand than ever; Oracle Database 12c introduced Oracle Database In-Memory (with columnar tables, SIMD processing, and advanced compression schemes) as latest in a long succession of warehouse-focused innovations. The market-leading Oracle Database is the ideal starting point for customers to extend their architecture to the Big Data Management System.
  • The ‘data reservoir’, hosted on Oracle Big Data Appliance, will augment the data warehouse as a repository for the new sources of large volumes of data: machine-generated log files, social-media data, and videos and images -- as well as a repository for more granular transactional data or older transactional data which is not stored in the data warehouse. Oracle’s Big Data Management System embraces complementary technologies and platforms, including open-source technologies: Oracle Big Data Appliance includes Cloudera’s Distribution of Hadoop and Oracle NoSQL Database for data management.
  • A ‘franchised query engine,’ Oracle Big Data SQL, enables scalable, integrated access in situ to the entire Big Data Management System. SQL is the accepted language for day-to-day data access and analytic queries, and thus SQL is the primary language of the Big Data Management System.  Big Data SQL enables users to combine data from Oracle Database, Hadoop and NoSQL sources within a single SQL statement.  Leveraging the architecture of Exadata Storage Software and the SQL engine of the Oracle Database, Big Data SQL delivers high-performance access to all data in the Big Data Management System.

Using this architecture, the Oracle Big Data Management System combines the performance of Oracle’s market-leading relational database, the power of Oracle’s SQL engine, and the cost-effective, flexible storage of Hadoop and NoSQL. The result is an integrated architecture for managing Big Data, providing all of the benefits of Oracle Database, Exadata, and Hadoop, without the drawbacks of independently-accessed data repositories.  

Note that the scope of this statement of direction is the data platform for Big Data. An enterprise Big Data solution would also be comprised of big data tools and big data applications built upon this data platform. 

Read the full Statement of Direction -- Big Data Management System here.


Wednesday Mar 18, 2015

Production workloads blend Cloud and On-Premise Capabilities

Prediction #7 - blending production workloadsacross cloud and on-premise in Oracle's Enterprise Big Data Predictions 2015 is a tough nut to crack. Yet, we at Oracle think this is really the direction we all will go. Sure we can debate the timing, and whether or not this happens in 2015, but it is something that will come to all of us who are looking towards that big data future. So let’s discuss what we think is really going to happen over the coming years in the big data and cloud world.

Reality #1 – Data will live both in the cloud and on-premise

We see this today. Organizations run Human Capital Management systems in the cloud, integrate these with data from outside cloud based systems (think for example LinkedIn, staffing agencies etc.) while their general data warehouses and new big data systems are all deployed as on-premise systems. We also see the example in the prediction where various auto dealer systems uplink into the cloud to enable the manufacturer to consolidate all of their disparate systems. This data may be translated into data warehouse entries and possibly live in two worlds – both in the cloud and on-premise for deep diving analytics or in aggregated form.

Reality #2 – Hybrid deployments are difficult to query and optimize

We also see this today and it is one of the major issues of living in the hybrid world of cloud and on-premise. A lot of the issues are driven by low level technical limitations, specifically in network bandwidth and upload / download capacity into and out of the cloud environment. The other challenges are really (big) data management challenges in that they go into the art of running queries across two ecosystems with very different characteristics. We see a trend to use engineered systems on-premise, which delivers optimized performance for the applications, but in the cloud we often see virtualization pushing the trade-off towards ease of deployment and ease of management. These completely different ecosystems make optimization of queries across them very difficult.

Solution – Equality brings optimizations to mixed environments

As larger systems like big data and data warehouse systems move to the cloud, better performance becomes a key success criterion. Oracle is uniquely positioned to drive both standardization and performance optimizations into the cloud by deploying on engineered systems like Oracle Exadata and Oracle Big Data Appliance. Deploying engineered systems enables customers to run large systems in the cloud delivering performance as they see today in on-premise deployments. This then means that we do not live in a world divided in slow and fast, but in a world of fast and fast.
This equivalence also means that we have the same functionality in both worlds, and here we can sprinkle in some – future – Oracle magic, where we start optimizing queries to take into account where the data lives, how fast we can move it around (the dreaded networking bandwidth issue) and where we need to execute code. Now, how are we going to do this? That is a piece magic, and you will just need to wait a bit… suffice it to say we are hard at work at solving this challenging topic.

Monday Mar 09, 2015

Why SQL Part 2 - It has a powerful framework

In the first part (Why SQL is the natural language for data analysis) of this series of blog posts I explained why analytics is important to the business, how the volume of data along with the types of data is continuing to expand and why that makes it vital that you select the right language for data analysis. Many of us work with SQL every day and take for granted many of its unique features, its power, flexibility and the richness of the analytics.

This familiarity with SQL means that sometimes we are a bit slow at preventing some of our projects investing in other big data languages such as MapReduce, Impala, Spark, Java and many of the new generation of SQL-like open source projects. While many of these new languages are considered “cool”, it is very easy to end up building new proprietary data silos or investing in a language that eventually is replaced by another open source project or investing time and effort in code that eventually fails to deliver the required analytics.

One of the aims of this series of blog posts is to refresh your memory about why SQL has been so successful in the area of analytics. SQL has four unique features that make it perfect for data analysis...

    [Read More]

    Tuesday Mar 03, 2015

    Why SQL is the natural language for data analysis

    Analytics is a must-have component of every corporate data warehousing and big data project. It is the core driver for the business: the development of new products, better targeting of customers with promotions, hiring of new talent and retention of existing key talent. Yet the analysis of especially “big data environments”, data stored and processed outside of classical relational systems, continues to be a significant challenge for the majority companies. According to Gartner, 72% of companies are planning to increase their expenditure on big data yet 55% state they don’t have the necessary skills to make use of it. 

    The objective of this series of articles, which will appear over the coming weeks, is to explain why SQL is the natural language for amy kind of data analysis including big data and the benefits that this brings for application developers, DBAs and business users. 

    [Read More]

    Friday Jan 16, 2015

    Deploying SAS High Performance Analytics on Big Data Appliance

    Oracle and SAS have an ongoing commitment to our joint customers to deliver value-added technology integrations through engineered systems such as Exadata, Big Data Appliance, SuperCluster,  Exalogic and ZFS Storage Appliance.  Dedicated resources manage and execute on joint SAS/Oracle Database, Fusion Middleware, and Oracle Solaris integration projects; providing customer support, including sizing and IT infrastructure optimization and consolidation.  Oracle support teams are onsite at SAS Headquarters in Cary, NC (USA); and in the field on a global basis.

    The latest in this effort is to enable our joint customers to deploy SAS High Performance Analytics on Big Data Appliance. This effort enables SAS users to leverage the lower cost infrastructure Hadoop offers in a production ready deployment on Oracle Big Data Appliance. Here from Paul Kent (VP Big Data, SAS) on some of the details.

    Read more on deploying SAS High Performance Analytics on www.oracle.com/SAS. Don't miss the deployment guide and best practices here.

    Friday Oct 10, 2014

    Review of Data Warehousing and Big Data at #OOW14

    Data Warehousing and Big Data were at the heart of this year’s OpenWorld conference being across in a number of keynotes and a huge number of general sessions. Our hands-on labs were all completely full as people got valuable hands-on time with our most important new features. The key areas at this year’s conference were:

    • Big Data SQL - One Fast SQL Query for All Your Data
    • Database In-Memory - Powering the Real-Time Enterprise
    • Mutitenant - Plug your data warehousing Into the Cloud
    DW 4 DW 3 DW 3

    All these topics appeared in the main keynote sessions including live on-stage demonstrations of how each feature can be used to increased the performance and analytical capability of your data warehouse.

    If you want to revisit the most important sessions, or if simply missed this year’s conference and want to catch up on all the most important topics, then I have put together a book of the highlights from this year’s conference. The booklet is divided into the following sections:

    • Key Messages
    • Overview of Oracle Database 12c
    • Session Catalogue
    • Your Oracle Presenters
    • Links
    • OpenWorld 2015

    PDF-iBook

    You can download my review in PDF format by clicking here. Hope this proves useful and if I missed anything then let me know. 

    Thursday Oct 09, 2014

    One of the ways Oracle is using Big Data

    Today, Oracle is using big data technology and concepts to significantly improve the effectiveness of its support operations, starting with its hardware support group. While the company is just beginning this journey, the initiative is already delivering valuable benefits.

    In 2013, Oracle’s hardware support group began to look at how it could use automation to improve support quality and accelerate service request (SR) resolution. Its goal is to use predictive analytics to automate SR resolution within 80% to 95% accuracy.

    Oracle’s support group gathers a tremendous amount of data. Each month, for example, it logs 35,000 new SRs and receives nearly 6 TB of telemetry data via automated service requests (ASRs)—which represent approximately 18% of all SRs. Like many organizations, Oracle had a siloed view of this data, which hindered analysis. For example, it could look at SRs but could not analyze the associated text, and it could review SRs and ASRs separately, but not together.

    Oracle was conducting manual root-cause analysis to identify which types of SRs were the best candidates for automation. This was a time-consuming, difficult, and costly process, and the company looked to introduce big data and predictive analytics to automate insight.

    The team knew that it had to walk before it could run. It started by taking information from approximately 10 silos, such as feeds from SRs and ASRs, parts of databases, and customer experience systems, and migrating the information to an Oracle Endeca Information Discovery environment. Using the powerful Oracle Endeca solution, Oracle could look at SRs, ASRs, and associated notes in a single environment, which immediately yielded several additional opportunities for automation. On the first day of going live with the solution, Oracle identified 4% more automation opportunities.

    Next, Oracle focused its efforts on gaining insight in near real time, leveraging the parallel processing of Hadoop to automatically feed Oracle Endeca Information Discovery—dramatically improving data velocity. Oracle’s first initiative with this new environment looked at Oracle Solaris SRs. In the first few weeks of that project, Oracle identified automation opportunities that will increase automated SR resolution from less than 1% to approximately 5%—simply by aggregating all of the data in near real-time. 

    Once Oracle proved via these early proofs of concept that it could process data more efficiently and effectively to feed analytical projects, it began to deploy Oracle Big Data Appliance and Oracle Exalytics In-Memory Machine.

    Read the entire profile here.

    Monday Sep 29, 2014

    The End of the Data Scientist Bubble...

    Looking around northern California and inside many technology kitchens makes me believe that we are about to see the Data Scientist bubble burst. And then I read the Fortune Magazine article on Peter Thiel - and the excerpt on Zero to One (his new book) in that article and it dawned on me that is one of the intersting ways to look at the Data Scientist bubble.

    Thiel's Classification of Innovation

    Without trying to simplify and/or bastardize mr. Thiel's theory, the example in the Fortune Mag article will make this visible to most people (I hope). In the article the analogy is; going from one type writer to 100 type writers is 1 to N, inventing a word processor is moving us from 0 to 1. In other words, true innovation dramatically changes things by giving previously unknown power to the masses. It is that innovation that moves us from 0 to 1. Expansion of existing ideas - not true innovation - moves us from 1 to N. Of course, don't take my word on this but read the article or the book...

    The Demise of the Human Data Scientist

    The above paradigm explains the Data Scientist bubble quite nicely. Once upon a time companies hired a few PhD students who by chance had a degree in statistics and had learned how to program and figured out how to deal with (large) data sets. These newly minted data scientists proved that there is potential value in mashing data together, running analytics on these newly created data sets and thus caused a storm of publicity. Companies large and small are now frantically trying to hire these elusive data scientists, or something a little more down to earth, are creating data scientists (luckily not in the lab) by forming teams that bring a part of the skillset to the table.

    This approach all starts to smell pretty much like a whole busload of typewriters being thrown at a well-known data analysis and data wrangling problem. Neither the problem nor the solution are new, nor innovative. Data Scientists are therefore not moving us from 0 to 1...

    One could argue that while the data scientist quest is not innovative, at least is solves the problem of doing analytics. Fair and by some measure correct, but there is one bigger issue with the paradigm of "data scientists will solve our analytics problem" and that is scale. Giving the keys to all that big data to only a few data scientists is not going to work because these smart and amazing people are now becoming, often unbeknownst to them, an organizational bottleneck to gaining knowledge from big data.

    The only real solution, our 0 to 1, is to expose a large number of consumers to all that big data, while enabling these consumers to apply a lot of the cool data science to all that data. In other words, we need to provide tools which include data science smarts. Those tools will enable us to apply the 80% common data science rules to the 80% of common business problems. This approach drives real business value at scale. With large chunks of issues resolved, we can then focus our few star data scientists on the 20% of problems or innovations that drive competitive advantage and change markets.

    My Conclusion

    The bubble is bursting because what I am seeing is more and more tools coming to market (soon) that will drive data science into the day-to-day job of all business people. Innovation is not the building of a better tool for data scientists or hiring more of them, instead the real 0 to 1 innovation is tools that make make all of us data scientists and lets us solve our own data science problems. The future of Data Science is smarter tools, not smarter humans.

    Wednesday Sep 03, 2014

    OpenWorld on your iPad and iPhone - UPDATED!

    In my last blog post I provided links to our OpenWorld data warehouse web app for smartphones and tablets. Now that the OOW team has released the session schedule (it is now live on the OpenWorld site) I have updated my iPhone and iPad apps to include the list of sessions on a day-by-day basis (Sunday, Monday Tuesday, Wednesday, Thursday). The list of sessions can still be viewed in subject area order (data warehousing, performance and scalability, analytics, unstructured data, industry models and big data) within the app via the “Switch to subject view” link in the top left part of the screen.

    Updated-iPad-OOW2014 Updated-iPhone-OOW2014

    I have also added a location map which can be viewed by clicking on the linked-text, “View location map", which is in the top right part of the screen on each application. The location map that is available within both the iPad and iPhone apps is shown below:

    Oow locations



    If you want to run these updated web apps on your smartphone and/or tablet then you can reuse the existing links that I published on my last blog post. If you missed that post then  follow these links:

    Android users: I have tested the app on Android and there appears to be a bug in the way the Chrome browser displays frames since scrolling within frames does not work . The app does work correctly if you use either the Android version of the Opera browser or the standard Samsung browser on Samsung devices.

    If you have any comments about the app (content you would like to see) then please let me know. Enjoy OpenWorld and, if you have time, it would be great to see you if you want to stop by at the Parallel Execution and Analytical SQL demo booth.

    Tuesday Jul 22, 2014

    StubHub Taps into Big Data for Insight into Millions of Customers’ Ticket-Buying Patterns, Fraud Detection, and Optimized Ticket Prices

    The benefits of Big Data at Stubhub:

    • Stubhub enabled data scientists to work directly with customer-related data—such as ticket-purchasing history—inside the database, and to use database options to explore the data graphically, build and evaluate multiple data-mining models, and deploy predictions and insights throughout the enterprise—drastically improving StubHub’s agility and responsiveness
    • Developed highly targeted ticket promotional campaigns and offers by having the ability to calculate 180 million customers’ lifetime value (or propensity) instead of just 20,000 values at a time
    • Used Oracle R Enterprise component of Oracle Advanced Analytics—an Oracle Database option—to reduce a fraud issue by up to 90%

    Read more or watch the video:

    About

    The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

    Search

    Archives
    « June 2016
    SunMonTueWedThuFriSat
       
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    25
    26
    27
    28
    29
    30
      
           
    Today