Wednesday Mar 26, 2014

Oracle Big Data Lite Virtual Machine - Version 2.5 Now Available

Oracle Big Data Appliance Version 2.5 was released last week.  Some great new features in this release- including a continued security focus (on-disk encryption and automated configuration of Sentry for data authorization) and updates to Cloudera Distribution of Apache Hadoop and Cloudera Manager.

With each BDA release, we have a new release of Oracle Big Data Lite Virtual Machine.  Oracle Big Data Lite provides an integrated environment to help you get started with the Oracle Big Data platform. Many Oracle Big Data platform components have been installed and configured - allowing you to begin using the system right away. The following components are included on Oracle Big Data Lite Virtual Machine v 2.5:

  • Oracle Enterprise Linux 6.4
  • Oracle Database 12c Release 1 Enterprise Edition (12.1.0.1)
  • Cloudera’s Distribution including Apache Hadoop (CDH4.6)
  • Cloudera Manager 4.8.2
  • Cloudera Enterprise Technology, including:
    • Cloudera RTQ (Impala 1.2.3)
    • Cloudera RTS (Search 1.2)
  • Oracle Big Data Connectors 2.5
    • Oracle SQL Connector for HDFS 2.3.0
    • Oracle Loader for Hadoop 2.3.1
    • Oracle Data Integrator 11g
    • Oracle R Advanced Analytics for Hadoop 2.3.1
    • Oracle XQuery for Hadoop 2.4.0
  • Oracle NoSQL Database Enterprise Edition 12cR1 (2.1.54)
  • Oracle JDeveloper 11g
  • Oracle SQL Developer 4.0
  • Oracle Data Integrator 12cR1
  • Oracle R Distribution 3.0.1

Go to the Oracle Big Data Lite Virtual Machine landing page on OTN to download the latest release.

Monday Jan 27, 2014

Announcing: Oracle Big Data Lite Virtual Machine

You've been hearing alot about Oracle's big data platform. Today, we're pleased to announce Oracle Big Data Lite Virtual Machine - an environment to help you get started with the platform. And, we have a great OTN Virtual Developer Day event scheduled where you can start using our big data products as part of a series of workshops.


BigDataLite Picture

Oracle Big Data Lite Virtual Machine is an Oracle VM VirtualBox that contains many key components of Oracle's big data platform, including: Oracle Database 12c Enterprise Edition, Oracle Advanced Analytics, Oracle NoSQL Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator 12c, Oracle Big Data Connectors, and more. It's been configured to run on a "developer class" computer; all Big Data Lite needs is a couple of cores and about 5GB memory to run (this means that your computer should have at least 8GB total memory). With Big Data Lite, you can develop your big data applications and then deploy them to the Oracle Big Data Appliance. Or, you can use Big Data Lite as a client to the BDA during application development.

How do you get started? Why not start by registering for the Virtual Developer Day scheduled for Tuesday, February 4, 2014 - 9am  to 1pm PT / 12pm to 4pm ET / 3pm to 7pm BRT:

OTN_VDD

There will be 45 minute sessions delivered by product experts (from both Oracle and Oracle Aces) - highlighted by Tom Kyte and Jonathan Lewis' keynote "Landscape of Oracle Database Technology Evolution". Some of the big data technical sessions include:

  • Oracle NoSQL Database Installation and Cluster Topology Deployment
  • Application Development & Schema Design with Oracle NoSQL Database
  • Processing Twitter Data with Hadoop
  • Use Data from a Hadoop Cluster with Oracle Database
  • Make the Right Offers to Customers Using Oracle Advanced Analytics
  • In-DB Map Reduce with SQL/Hadoop
  • Pattern Matching in SQL

Keep an eye on this space - we'll be publishing how-to's that leverage the new Oracle Big Data Lite VM. And, of course, we'd love to hear about the clever applications you build as well!

Wednesday Nov 27, 2013

Big Data - Real and Practical Use Cases

The goal of this post is to explain in a few succinct patterns how organizations can start to work with big data and identify credible and doable big data projects. This goal is achieved by describing a set of general patterns that can be seen in the market today.
Big Data Usage Patterns
The following usage patterns are derived from actual customer projects across a large number of industries and cross boundaries between commercial enterprises and public sector. These patterns are also geographically applicable and technically feasible with today’s technologies.
This paper will address the following four usage patterns:
  • Data Factory – a pattern that enable an organization to integrate and transform – in a batch method – large diverse data sets before moving this data into an upstream system like an RDBMS or a NoSQL system. Data in the data factory is possibly transient and the focus is on data processing.
  • Data Warehouse Expansion with a Data Reservoir – a pattern that expands the data warehouse with a large scale Hadoop system to capture data at lower grain and higher diversity, which is then fed into upstream systems. Data in the data reservoir is persistent and the focus is on data processing as well as data storage as well as the reuse of data.
  • Information Discovery with a Data Reservoir – a pattern that creates a data reservoir for discovery data marts or discovery systems like Oracle Endeca to tap into a wide range of data elements. The goal is to simplify data acquisition into discovery tools and to initiate discovery on raw data.
  • Closed Loop Recommendation and Analytics system – a pattern that is often considered the holy grail of data systems. This pattern combines both analytics on historical data, event processing or real time actions on current events and closes the loop between the two to continuously improve real time actions based on current and historical event correlation.

Pattern 1: Data Factory

The core business reason to build a Data Factory as it is presented here is to implement a cost savings strategy by placing long-running batch jobs on a cheaper system. The project is often funded by not spending money on the more expensive system – for example by switching Mainframe MIPS off  - and instead leveraging that cost savings to fund the Data Factory. The first figure shows a simplified implementation of the Data Factory.
As the image below shows, the data factory must be scalable, flexible and (more) cost effective for processing the data. The typical system used to build a data factory is Apache Hadoop or in the case of Oracle’s Big Data Appliance – Cloudera’s Distribution including Apache Hadoop (CDH).

data factory

Hadoop (and therefore Big Data Appliance and CDH) offers an extremely scalable environment to process large data volumes (or a large number of small data sets) and jobs. Most typical is the offload of large batch updates, matching and de-duplication jobs etc. Hadoop also offers a very flexible model, where data is interpreted on read, rather than on write. This idea enables a data factory to quickly accommodate all types of data, which can then be processed in programs written in Hive, Pig or MapReduce.
As shown in above the data factory is an integration platform, much like an ETL tool. Data sets land in the data factory, batch jobs process data and this processed data moves into the upstream systems. These upstream systems include RDBMS’s which are then used for various information needs. In the case of a Data Warehouse, this is very close to pattern 2 described below, with the difference that in the data factory data is often transient and removed after the processing is done.
This transient nature of data is not a required feature, but it is often implemented to keep the Hadoop cluster relatively small. The aim is generally to just transform data in a more cost effective manner.
In the case of an upstream system in NoSQL systems, data is often prepared in a specific key-value format to be served up to end applications like a website. NoSQL databases work really well for that purpose, but the batch processing is better left to Hadoop cluster.
It is very common for data to flow in the reverse order or for data from RDBMS or NoSQL databases to flow into the data factory. In most cases this is reference data, like customer master data. In order to process new customer data, this master data is required in the Data Factory.
Because of its low risk profile – the logic of these batch processes is well known and understood – and funding from savings in other systems, the Data Factory is typically an IT department’s first attempt at a big data project. The down side of a Data Factory project is that business users see very little benefits in that they do not get new insights out of big data.

Pattern 2: Data Warehouse Expansion

The common way to drive new insights out of big data is pattern two. Expanding the data warehouse with a data reservoir enables an organization to expand the raw data captured in a system that is able to add agility to the organization. The graphical pattern is shown in below.


DW Expansion

A Data Reservoir – like the Data Factory from Pattern 1 – is based on Hadoop and Oracle Big Data Appliance, but rather then have transient data and just process data and then hand the data off, a Data Reservoir aims to store data at a lower than previously stored grain for a period much longer than previous periods.
The Data Reservoir is initially used to capture data, aggregate new metrics and augment (not replace) the data warehouse with new and expansive KPIs or context information. A very typical addition is the sentiment of a customer towards a product or brand which is added to a customer table in the data warehouse.
The addition of new KPIs or new context information is a continuous process. That is, new analytics on raw and correlated data should find their way into the upstream Data Warehouse on a very, very regular basis.
As the Data Reservoir grows and starts to become known to exist because of the new KPIs or context, users should start to look at the Data Reservoir as an environment to “experiment” and “play” with data. With some rudimentary programming skills power users can start to combine various data elements in the Data Reservoir, using for example Hive. This enables the users to verify a hypotheses without the need to build a new data mart. Hadoop and the Data Reservoir now becomes an economically viable sandbox for power users driving innovation, agility and possibly revenue from hitherto unused data.

Pattern 3: Information Discovery

Agility for power users and expert programmers is one thing, but eventually the goal is to enable business users to discover new and exciting things in the data. Pattern 3 combines the data reservoir with a special information discovery system to provide a Graphical User Interface specifically for data discovery. This GUI emulates in many ways how an end user today searches for information on the internet.
To empower a set of business users to truly discover information, they first and foremost require a Discovery tool. A project should therefore always start with that asset.


Once the Discovery tool (like Oracle Endeca) is in place, it pays to start to leverage the Data Reservoir to feed the Discovery tool. As is shown above, the Data Reservoir is continuously fed with new data. The Discovery tool is a business user’s tool to create ad-hoc data marts in the discovery tool. Having the Data Reservoir simplifies the acquisition by end users because they only need to look in one place for data.
In essence, the Data Reservoir now is used to drive two different systems; the Data Warehouse and the Information Discovery environment and in practice users will very quickly gravitate to the appropriate system. But no matter which system they use, they now have the ability to drive value from data into the organization.

Pattern 4: Closed Loop Recommendation and Analytics System

So far, most of what was discussed was analytics and batch based. But a lot of organizations want to come to some real time interaction model with their end customers (or in the world of the Internet of Things – with other machines and sensors).

Closed Loop System

Hadoop is very good at providing the Data Factory and the Data Reservoir, at providing a sandbox, at providing massive storage and processing capabilities, but it is less good at doing things in real time. Therefore, to build a closed loop recommendation system – which should react in real time – Hadoop is only one of the components .
Typically the bottom half of the last figure is akin to pattern 2 and is used to catch all data, analyze the correlations between recorded events (detected fraud for example) and generate a set of predictive models describing something like “if a, b and c during a transaction – mark as suspect and hand off to an agent”. This model would for example block a credit card transaction.
To make such a system work it is important to use the right technology at both levels. Real time technologies like Oracle NoSQL Database, Oracle Real Time Decisions and Oracle Event Processing work on the data stream in flight. Oracle Big Data Appliance, Oracle Exadata/Database and Oracle Advanced Analytics provide the infrastructure to create, refine and expose the models.

Summary

Today’s big data technologies offer a wide variety of capabilities. Leveraging these capabilities with the existing environment and skills already in place according to the four patterns described does enable an organization to benefit from big data today. It is a matter of identifying the applicable pattern for your organization and then to start on the implementation.

The technology is ready. Are you?

Friday May 10, 2013

Streaming data to and from Hadoop and NoSQL Database

A quick update on some of the integration components needed to build things like M2M (Machine 2 Machine communication) and on integrating fast moving data (events) with the Hadoop and NoSQL Database. As of 11.1.1.7 of the Oracle Event Processing product you now have:

OEP Data Cartridge for Hadoop (the real doc is here)

OEP Data Cartridge for NoSQL Database (the real doc is here)

The fun with these products is that you can now model (in a UI!!) how to interact with these products. For example you can sink data into Hadoop without impacting the stream logic and stream performance and you can do a quick CQL (the OEP language) lookup to our NoSQL DB to resolve for example a customer profile or status lookup.

More to come, but very interesting and really something cool on making products work together out of the box.

Thursday Apr 11, 2013

Big Data Appliance - more flexibility, same rapid time to value

Untitled Document

This week Oracle announced the availability (yes you can right away buy and use these systems) to Big Data Appliance X3-2 Starter Rack and Big Data Appliance X3-2 In-Rack Expansion. You can read the press release here. For those who are interested in the operating specs, best to look at the data sheet on OTN.

So what does this mean? In effect this means that you can now start any big data project with an appliance. Whether you are looking to try your hand on your first project with Hadoop, or whether you are building your enterprise Hadoop solution with a large number of nodes, you can now get the benefits of Oracle Big Data Appliance. By leveraging Big Data Appliance for all your big data needs (being this Hadoop or Oracle NoSQL Database) you always get:

  • Reduced risk by having the best of Oracle and Cloudera engineering available in an easy to consume appliance
  • Faster time to value by not spending weeks or months building and tuning your own Hadoop system
  • No cost creep for the cluster as your system is set up and configured for a known cost

Assume you want to start your first implementation on Hadoop, you can now start with the BDA Starter Rack, 6 servers which you can fully deploy for HDFS and MapReduce capabilities (of course we also support for example HBase). All the services are pre-configured, so you have Highly Available NameNodes, automatic failover and a balanced approach to leveraging the 6 servers as Hadoop nodes.As your project grows (and you need more compute power and space to store data) you simply add nodes in chunks of 6 using the In-Rack Expansion, filling up the rack.

Once full you can either add another Starter Rack or add Full Racks to the system. As you do that, Mammoth - the install, configure and patch utility for BDA - ensures that your service nodes are in the appropriate place. For example, once you have 2 cabinets assigned to a single cluster, Mammoth will move the second NameNode to the second Rack for higher fault tolerance.

This new release of Big Data Appliance (the software parts of it) now also include Cloudera CDH 4.2 and Cloudera Manager 4.5. On top of that, you now create multiple clusters on a single BDA Full Rack using just Mammoth, which means you can now patch and update individual clusters on that Full Rack. As you add nodes to a cluster, Mammoth will allow you to choose where to add nodes, how to grow a set of clusters, etc.

Lastly, but not least, there is more flexibility in how to acquire Big Data Appliance Full Rack, as it is now a part of Oracle's Infrastructure as a Service offering, allowing for a smooth capital outflow for your Big Data Appliance. 

Friday Jan 25, 2013

Announcing: OTN Big Data Developer Day

Announcing the first Big Data Developer Day. A full day with two tracks and hands-on on all things Big Data at Oracle!!

An influx of new data types combined with new approaches for analyzing data are creating untapped growth opportunities that have the potential to transform your business. Oracle is the first vendor to provide a complete and integrated set of enterprise-ready products to address the full spectrum of big data business requirements. Jumpstart your understanding of big data in the enterprise by attending this complementary one-day hands-on workshop. You will learn from technical experts how to:

  • Write MapReduce on Oracle’s Big Data Platform
  • Manage a Big Data environment
  • Access Oracle NoSQL Database
  • Manage Oracle NoSQL DB Cluster
  • Use data from a Hadoop Cluster with Oracle
  • Develop analytics on big data

Register today to learn these skills which you can immediately put to use within your organization.

For more information and to sign up (space is limited!) click the link here.

So if you are in the bay area, do come and learn the coolest new technologies.

Friday Jan 18, 2013

Big Data Appliance X3-2 Updates

Untitled Document

Hello world. Waaw, time went by too fast. Happy new year, and here is the long past due update on the new Big Data Appliance and the software updates.

Big Data Appliance X3-2

Both the software as well as the hardware of the Big Data Appliance got a refresher.

Hardware Update

A good place to start is to quickly review the hardware differences (no price changes!). On a per node basis the following is a comparison between old and new (X3-2) hardware:

Big Data Appliance v1

Big Data Appliance X3-2

CPU

2 x 6-Core Intel® Xeon® 5675 (3.06 GHz)
2 x 8-Core Intel® Xeon® E5-2660 (2.2 GHz)
Memory
48GB
64GB expandable to 512GB
Disk

12 x 3TB High Capacity SAS

12 x 3TB High Capacity SAS
InfiniBand
40Gb/sec
40Gb/sec
Ethernet
10Gb/sec
10Gb/sec
KVM
1 KVM Switch
N/A (removed)

For all the details on the environmentals and other useful information, review the data sheet for Big Data Appliance X3-2. For those wondering what we did with the 2RU we now have left from the KVM, that is open space, at the top of the rack.

The higher core count gives a BDA X3-2 more parallel compute power while saving some 30% in energy and heat.

Software Update

As we did with Hardware, a good place to start is a quick overview of the software changes in below table:

Big Data Appliance v1.1.x Software Stack Big Data Appliance V2.0.1 Software Stack
Linux
Oracle Linux 5.6
Oracle Linux 5.8 with UEK
JDK
1.6
1.6u35
Cloudera CDH
CDH 3u4
CDH 4.1.x
Cloudera Manager
CM 3
CM 4.1
Oracle Enterprise Manager
N/A
Big Data Appliance Plug-In for Enterprise Manager
R
Open Source R
Oracle R Distribution 2.x
Big Data Connectors *
Big Data Connectors 1.1.x
Big Data Connectors 2.0.x
Oracle NoSQL Database CE **
NoSQL DB 1.x
NoSQL DB 2.x

* Oracle Big Data Connectors is a separately licensed product which can be pre-installed and pre-configured on BDA
** Oracle NoSQL DB 2.x will be pre-installed in a future update to Mammoth but can be applied manually today

Apart from the versions updates, bug fixes and a great number of performance improvements across the entire system, the biggest updates are the inclusion of CDH 4.1.2 and the default set up of highly available name nodes for Hadoop, the Enterprise Manager management of the BDA, the uptake of the Oracle R Distribution and the updates to Oracle NoSQL Database. In a nutshell these updates deliver the following improvements:

Cloudera CDH 4.1.x

The latest version of CDH and CM deliver:

  • Higher overall performance
  • Highly available name nodes with the BDA using failover quorum processes instead of an external HA filer solution
  • Vastly expanded management capabilities via CM 4

On top of this, BDA now has both Zookeeper and Oozie configured out of the box.

Oracle Enterprise Manager

The new Big Data Appliance Plug-In for Enterprise Manager delivers the first end-to-end management of the Hadoop cluster from hardware metrics to software and Hadoop metrics. To achieve the end-to-end management of the system Enterprise Manager delivers all the system metrics users are used to from the Exadata Plug-In for Enterprise Manager. Enterprise Manager enables a seamless transition between the Hardware and high level software monitoring and the expanded Hadoop monitoring and diagnostics from Cloudera Manager. This combination of functionality makes operations for a BDA simpler and allows operations staff to seamlessly switch between their Exadata, Big Data Appliance and other Oracle Engineered systems.

Oracle R Distribution

The big difference between Oracle R Distribution and the Opensource R distribution is that Oracle R Distribution is enabled to dynamically load the math kernel libraries on the CPUs from both Intel and AMD. This increases performance of basic calculations, which in turn increases the performance of the overall R calculations because more math is off-loaded into the CPUs.

Oracle NoSQL Database 2.x

A great number of great new features are added into NoSQL DB 2.x. Most of these are in both the Community Edition as well in the Enterprise Edition. Charles Lamb has a nice concise post describing what is new here.

Big Data Connectors

To close out, Big Data Connectors got a refresher focused on performance, so download the new products here and give them a go via this download page. More information on news, read the data sheet here.

Sunday Nov 04, 2012

Blueprints for Oracle NoSQL Database

I think that some of the most interesting analytic problems are graph problems.  I'm always interested in new ways to store and access graphs.  As such, I really like the work being done by Tinkerpop to create Open Source Software to make property graphs more accessible over a wide variety of datastores.  Since key-value stores like Oracle NoSQL Database are well-suited to storing property graphs, I decided to extend the Blueprints API to work with it.  Below I'll discuss some of the implementation details, but you can check out the finished product here: http://github.com/dwmclary/blueprints-oracle-nosqldb.

 What's in a Property Graph? 

In the most general sense, a graph is just a collection of vertices and edges.  Vertices and edges can have properties: weights, names, or any number of other traits.  In an undirected graph, edges connect vertices without direction.  A directed graph specifies that all edges have a head and a tail --- a direction.  A multi-graph allows multiple edges to connect two vertices.  A "property graph" encompasses all of these traits.

Key-Value Stores for Property Graphs

Key-Value stores like Oracle NoSQL Database tend to be ideal for implementing property graphs.  First, if any vertex or edge can have any number of traits, we can treat it as a hash map.  For example:

Vertex["name"] = "Mary"

Vertex["age"] = 28

Vertex["ID"] = 12345

 and so on.  This is a natural key-value relationship: the key "name" maps to the value "Mary."  Moreover if we maintain two hash maps, one for vertex objects and one for edge objects, we've essentially captured the graph.  As such, any scalable key-value store is fertile ground for planting graphs.

Oracle NoSQL Database as a Scalable Graph Database

While Oracle NoSQL Database offers useful features like tunable consistency, what lends it to storing property graphs is the storage guarantees around its key structure.  Keys in Oracle NoSQL Database are divided into two parts: a major key and a minor key.  The storage guarantee is simple.  Major keys will be distributed across storage nodes, which could encompass a large number of servers.  However, all minor keys which are children of a given major key are guaranteed to be stored on the same storage node.  For example, the vertices:

/Personnel/Vertex/1 

and

/Personnel/Vertex/2

May be stored on different servers, but

/Personnel/Vertex/1-/name

and 

/Personnel/Vertex/1-/age

will always be on the same server.  This means that we can structure our graph database such that retrieving all the properties for a vertex or edge requires I/O from only a single storage node.  Moreover, Oracle NoSQL Database provides a storeIterator which allows us to store a huge number of vertices and edges in a scalable fashion.  By storing the vertices and edges as major keys, we guarantee that they are distributed evenly across all storage nodes.  At the same time we can use a partial major key to iterate over all the vertices or edges (e.g. we search over /Personnel/Vertex to iterate over all vertices).

Fork It!

The Blueprints API and Oracle NoSQL Database present a great way to get started using a scalable key-value database to store and access graph data.  However, a graph store isn't useful without a good graph to work on.  I encourage you to fork or pull the repository, store some data, and try using Gremlin or any other language to explore.

[Read More]

Friday Oct 12, 2012

Read how a customer uses Oracle NoSQL Database

For those who have had the pleasure to be in SF for Oracle Openworld, you might have seen or heard about this story already. If you did not, here is a great story on how to use Oracle NoSQL Database.

Apart from all the cool technology, I'm just excited that this is a company founded by a football international and dealing with sports data, games and other cool things. Like an all things cool combo in one place.


Wednesday Sep 05, 2012

E-Book on big data (featuring Analysts, Customers and more)

As we are gearing up for Openworld, here is a nice E-book on big data to start paging through. It contains Gartner's take on big data, customer and partner interviews and a lot more good info. Enjoy the read so you come prepared for Openworld!!

Read the E-Book here.

For those coming to Oracle Openworld (or the Americas Cup races around the same time), you can find big data sessions via this URL.

Enjoy!!

Thursday Nov 17, 2011

Article on Oracle NoSQL Database Testing

A quick post, looks like Infoworld did a test with Oracle SQL Database and wrote about it. Read more here:

http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107

Enjoy and maybe test this one when you start your investigations into a NoSQL Database!

Monday Oct 31, 2011

Join us for Webcast on Big Data and ask your questions!

Join us for a webcast on big data and Oracle's offerings in this space:

When: November 3rd, 10am PT / 1pm ET
Where: Register here to attend

What:
As the world becomes increasingly digital, aggregating and analyzing new and diverse digital data streams can unlock new sources of economic value, provide fresh insights into customer behavior, and help you identify market trends early on. But this influx of new data can also create problems for IT departments. Attend this Webcast to learn how to capture, organize, and analyze your big data to deliver new insights with Oracle.


Tuesday Oct 25, 2011

Read Up on the Overall Big Data Solution

On top of the NoSQL Database release I wanted to share the new paper on big data with all. It gives you an overview of the end-to-end solution as presented at Openworld and places it in context of the importance of big data for our customers.

This is a a quick look at the Executive Summary and the Introduction (or click here for the paper):

Executive Summary

Today the term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of non-traditional, less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Decreases in the cost of both storage and compute power have made it feasible to collect this data - which would have been thrown away only a few years ago.  As a result, more and more companies are looking to include non-traditional yet potentially very valuable data with their traditional enterprise data in their business intelligence analysis.

To derive real business value from big data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data types and analyze them alongside your existing data to find new insights and capitalize on hidden relationships.

Introduction

With the recent introduction of Oracle Big Data Appliance, Oracle is the first vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracle's big data strategy is centered on the idea that you can evolve your current enterprise data architecture to incorporate big data and deliver business value. By evolving your current enterprise architecture, you can leverage the proven reliability, flexibility and performance of your Oracle systems to address your big data requirements.

Defining Big Data

Big data typically refers to the following types of data:

  • Traditional enterprise data - includes customer information from CRM systems, transactional ERP data, web store transactions, general ledger data.
  • Machine-generated /sensor data - includes Call Detail Records ("CDR"), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.
  • Social data - includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook

The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it's often the most visible parameter, volume of data is not the only characteristic that matters. In fact, there are four key characteristics that define big data:

  • Volume. Machine-generated data is produced in much larger quantities than non-traditional data. For instance, a single jet engine can generate 10TB of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the Petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem.
  • Velocity. Social media data streams - while not as massive as machine-generated data - produce a large influx of opinions and relationships valuable to customer relationship management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).
  • Variety. Traditional data formats  tend to be relatively well described and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.
  • Value. The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.

To make the most of big data, enterprises must evolve their IT infrastructures to handle the rapid rate of delivery of extreme volumes of data, with varying data types, which can then be integrated with an organization's other enterprise data to be analyzed. 

The Importance of Big Data

When big data is distilled and analyzed in combination with traditional enterprise data, enterprises can develop a more thorough and insightful  understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater  innovation - all of which can have a significant impact on the bottom line.
For example, in the delivery of healthcare services, management of chronic or long-term conditions is expensive. Use of in-home monitoring devices to measure vital signs, and monitor progress is just one way that sensor data can be used to improve patient health and reduce both office visits and hospital admittance.
Manufacturing companies deploy sensors in their products to return a stream of telemetry. Sometimes this is used to deliver services like OnStar, that delivers communications, security and navigation services. Perhaps more importantly, this telemetry also reveals usage patterns, failure rates and other opportunities for product improvement that can reduce development and assembly costs.

The proliferation of smart phones and other GPS devices offers advertisers an opportunity to target consumers when they are in close proximity to a store, a coffee shop or a restaurant. This opens up new revenue for service providers and offers many businesses a chance to target new customers.
Retailers usually know who buys their products.  Use of social media and web log files from their ecommerce sites can help them understand who didn't buy and why they chose not to, information not available to them today.  This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies.

Finally, social media sites like Facebook and LinkedIn simply wouldn't exist without big data. Their business model requires a personalized experience on the web, which can only be delivered by capturing and using all the available data about a user or member.
-------

The full paper is linked here. Happy reading...

Monday Oct 24, 2011

Oracle NoSQL Database EE is now Available

Oracle NoSQL Database was announced at Openworld, just under a month ago. It is now available (Enterprise Edition) and can be downloaded from OTN:

Read more in the press release here. Obviously, this is great news and we are happy to see the first of a set of big data products becoming generally available. More products coming!

 

Monday Oct 17, 2011

More information on Oracle NoSQL Database

Since everyone asks me about more information all the time, here is some more information on Oracle NoSQL Database:

  • The OTN page discussing NoSQL Database - make sure to browse all the way down (link)
  • Data Sheet on Oracle NoSQL Database (link)
  • Technical white paper (link)
  • Cisco's solutions brief (link) on the Cisco site
  • Technical blog  from the development team (link)
As we progress to the release dates for the other components expect details like this to show up across the board...
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today