Friday Apr 29, 2016
Monday Apr 25, 2016
By Klaker-Oracle on Apr 25, 2016
Welcome to the third post in this deep-dive series on SQL pattern matching using the MATCH_RECOGNIZE feature that is part of Database 12c.
In the first part of this series we looked at a wide range of topics including ensuring query consistency, how to correctly use predicates and how to manage sorting. In the second part we looked at using the built-in measures to understand how a data set is matched to a pattern.
In this post I am going to review the concepts of greedy and reluctant quantifiers. I will breakdown this down into a number of areas: 1) Overview of regular expressions, 2) understanding quantifiers, and 3) greedy vs. reluctant quantifiers. The examples in this post use the built-in measures to help show the difference between greedy and reluctant matching. If you are not familiar with the MATCH_NUMBER() function or the CLASSIFIER() function then please take some time to read the second post in this series.
Overview of regular expressions[Read More]
Monday Apr 18, 2016
By Alexey Filanovskiy-Oracle on Apr 18, 2016
Using bloom filter for join operations:
Friday Apr 15, 2016
By Klaker-Oracle on Apr 15, 2016
We are starting to see a significant change in the way we analyze data as a result of the growth of interest in big data and the newer concept of Internet of Things. Ever since databases were first created everyone has been obsessed, quite rightly so, with ensuring queries returned the correct answer - i.e. precise, accurate answers. This key requirement is derived from the need to run operational, transactional applications. If we check our bank balance online we want the figure we see to be accurate right down to the last cent and for a good reason. Yet increasingly both as part of our online as well as offline experiences we deal with numbers that are not 100% accurate and somehow we manage to make good use of these approximate answers. Here are a couple of examples of where we already using approximations: route planning on our smartphones and crowd counting information in newspapers...[Read More]
Thursday Apr 14, 2016
By Alexey Filanovskiy-Oracle on Apr 14, 2016
Big Data SQL is the way to acsess data that stored in HDFS through Oracle RDBMS, using Oracle external table mechanism. In context of security "table" is key word, which means that you may apply standard security approches to those tables. Today I want to give you couple examples with:
- Oracle Data Redaction features.
Oracle Data Redaction
I don't want to replace documentation of the Oracle Data Redaction within this blog, all available information you could find there, but just let me remind you the main idea of this feature. Very often databases contain sensitive data, like credit card number, SSN or other personal information. It could be useful to have this data in unchanged format for resolve different issue with billing department, but at the same time another departments (like call center) may need only part of this information (like 4 last digits of the credit cards) and for security complience you are not able to show them original data.
Wednesday Apr 13, 2016
By Klaker-Oracle on Apr 13, 2016
Yes it's that time of year again! If you have a story to tell about data warehousing, big data and SQL analytics then we want to hear from you because the OpenWorld 2016 call for presentations is now open. Mark your calendars: this year Oracle OpenWorld conference will be held in September on 18 - 22, 2016 at the Moscone Center in San Francisco.
We are looking for proposals that describe insights and improvements that attendees can put to use in their own jobs: exciting innovations, strategies to modernize their business, different or easier ways to implement key features, unique use cases, lessons learned, the best of best practices...[Read More]
Tuesday Apr 12, 2016
By Klaker-Oracle on Apr 12, 2016
Welcome to the second post in this deep dive series on SQL pattern matching using the new MATCH_RECOGNIZE feature that is part of Database 12c. In the first part of this series we looked at the areas of ensuring query consistency, how to correctly use predicates and how to manage sorting.
In this post I am going to review the two built-in measures that we have provided to help you understand how your data set is mapped to the pattern that you have defined. This post will breakdown into three areas: 1) a review of the built-in measures, 2) understanding how to control the output (number of rows returned) and lastly I will bring these two topics together with some examples...[Read More]
Tuesday Mar 22, 2016
By Alexey Filanovskiy-Oracle on Mar 22, 2016
Monday Mar 21, 2016
By Klaker-Oracle on Mar 21, 2016
This is the start of a series of posts based on a presentation that I put together for the recent annual BIWA conference at Oracle HQ. The Oracle BI, DW and Analytics user community always puts on a great conference and this year was the best yet. You can download any or all of the presentations from this year’s conference by following this link. My pattern matching deep dive presentation started life about a year ago as a post covering some of the new keywords in the explain plan that are linked to pattern matching, see here. It has now expanded to cover a much wider range of topics.
The aim of this group of posts is to help you understand the underlying mechanics of the MATCH_RECOGNIZE clause. During these posts we will explore key concepts such as: how to get consistent results, using built-in debugging functions, deterministic vs. non-deterministic state machines, back-tracking (what is it and how to identify when it is occurring), and finally greedy vs. reluctant quantifiers. If you need a quick refresher on how MATCH_RECOGNIZE works then I would recommend that you take a look at the following links[Read More]
Thursday Mar 17, 2016
By Mgubar-Oracle on Mar 17, 2016
In summary, Big Data SQL 3.0:
- Expands support for Hadoop platforms - covering Hortonworks HDP, Cloudera CDH on commodity hardware as well as on Oracle Big Data Appliance
- Expands support for database platforms - covering Oracle Database 12c on commodity hardware as well as on Oracle Exadata
- Improves performance through new features like Predicate Push-Down on top of Smart Scan and Storage Indexes
To learn more:
- Big Data SQL Landing Page on oracle.com
- Data Sheet for Big Data SQL 3.0
- White paper on Big Data SQL
- Big Data SQL Documentation
Wednesday Mar 16, 2016
By Jean-Pierre Dijcks-Oracle on Mar 16, 2016
Oracle Maximum Availability Architecture (MAA) is Oracle's best practices blueprint based on proven Oracle high availability technologies, along with expert recommendations and customer experiences. MAA best practices have been highly integrated into the design and operational capability of Oracle Big Data Appliance, and together they provide the most comprehensive highly available solution for Big Data.
Oracle MAA papers are published at the MAA home page of the Oracle Technology Network (OTN) website. Oracle Big Data Appliance (BDA) Maximum Availability Architecture is a best-practices blueprint for achieving an optimal high-availability deployment using Oracle high-availability technologies and recommendations.
The Oracle BDA MAA exercise for this paper was executed on Oracle Big Data Appliance and Oracle Exadata Database Machine to validate high availability and to measure downtime in various outage scenarios. The current release of this technical paper covers the first phase of the overall Oracle BDA MAA project. The project comprises the following two phases:
Phase 1: High Availability and Outage scenarios at a single site
Phase 2: Disaster Recovery Scenarios across multiple sites
Tuesday Mar 15, 2016
Sunday Mar 06, 2016
By Alexey Filanovskiy-Oracle on Mar 06, 2016
Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why.
Bzip2 or not Bzip2?
In my previous blogpost I published results of the compression rate for some particular compression codecs into Hadoop. Based on those results you may think that it’s a good idea to compress everything with bzip2. But be careful with this. Within the same research, I noted that bzip2 actually has on average 3 times worse performance than Gzip for querying (decompress) and archive (compress) data (it’s not surprising based on the complexity of algorithm). Are you ready to sacrifice performance? I think it will depend on the compression benefits derived from bzip2 and the frequency of querying this data (compression speed is not so import after data is stored in Hadoop systems since you usually compress data once and read it many times). On average, bzip2 is 1.6 times better than gzip. But, again my research showed that sometimes you can achieve 2.3 times better compression, while other times you may gain only 9% of the disk space usage (and performance is still much worse compared to gzip and other codecs). Second factor to keep in mind is the frequency of data querying and your performance SLAs. If you don’t care about query performance (don’t have any SLAs) and you select this data very rarely – bzip2 could be good a candidate. Otherwise consider other options. I encourage you to benchmark your own data and decide for yourself “Bzip2 or not Bzip2”.[Read More]
Friday Feb 19, 2016
By Mgubar-Oracle on Feb 19, 2016
Several videos describing the value of graph analyses are now available from our Oracle Big Data Spatial and Graph + Oracle Labs teams.
Check out this blog post for details :).
The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.
- Big Data SQL Quick Start. Predicate Push Down - Part6.
- SQL Pattern Matching Deep Dive - Part 3, greedy vs. reluctant quantifiers
- Common Distribution Methods in Parallel Execution
- Big Data SQL Quick Start. Joins. Bloom Filter and other features - Part5.
- Is an approximate answer just plain wrong?
- Big Data SQL Quick Start. Security - Part4.
- Oracle OpenWorld 2016 call for papers is OPEN!
- SQL Pattern Matching Deep Dive - Part 2, using MATCH_NUMBER() and CLASSIFIER()
- In-Memory Parallel Query
- Data loading into HDFS - Part2. Data movement from the Oracle Database to the HDFS