Wednesday May 09, 2012
Tuesday Apr 17, 2012
By Jean-Pierre Dijcks-Oracle on Apr 17, 2012
Decisive Power with Oracle Big Data Appliance
Raw information—like raw power—must be channeled and transformed to be useful. Every day, S.H.I.E.L.D. captures terabytes of information from a variety of sources—surveillance videos, satellites, sensors, field reports, network traffic—and all of this high-volume, high-velocity, and high-variety data is processed, filtered, transformed, and sorted in Oracle Big Data Appliance.
Tuesday Apr 10, 2012
By Jean-Pierre Dijcks-Oracle on Apr 10, 2012
For those of you who want to work with Oracle Data Integrator and its Hadoop capabilities, a good way to start is the newly released self-study course from Oracle University. You can find the course here.
Enjoy, and if you have any feedback, do send this into Oracle University by logging in (so we can unleash our big data analytics on it ;-) ).
Monday Apr 09, 2012
By Jean-Pierre Dijcks-Oracle on Apr 09, 2012
As everyone reads more about big data, I get more and more questions on use cases. “How should we use big data?” is the most common question. “Are there applications available for my vertical?” and “what do others in my vertical do with big data?” are a close second on the list. There are various reports out there by many authors which describe possible or real use cases. A search on the beacon of truth will no doubt get you links to most, but are they really relevant?
Then when I was reading Fortune Magazine (I do read the paper copy – guess I am old fashioned) the graph of disconnected points in my brain finally connected while reading the edition with the Facebook cover article (volume 165, number 4). Interestingly however, the Facebook articla was not the article that got my dots connected and persuaded me to writing this post.
The article that connected the dots was about the re-invention of JCPenney. More precisely about Ron Johnson, JCPenney’s new CEO and his thoughts on how to re-invent a department store business. For those who do not know who Ron Johnson is, it is the person who – together with Steve Jobs – created the Apple Store for you. That would be the physical, brick-and-mortar one, the hugely successful one (more $$$ per square foot than any other store)…
His quote in the article (following quote is of course copyright Fortune Magazine!) says it all:
“Improvement merely lets you hit the numbers. Creativity is what transforms.” Ron Johnson, CEO, JCPenney
Again, I think that quote says it all, it tells you that you will not find the transformational use case for your business in all those articles and papers on the internet. It says that you will have to dig deeper into the actual business you are running, and try to find that one thing, that one idea that will transform your business. I’m going to guess that this idea – if you are reading this blog post – is somehow related to having more data, or doing much more analytics…
With that in mind look at the following picture. It gives you an idea of the implementations in production of big data solutions (Gartner’s Mark Beyer often uses the 5-20-75 scale to explain where technology is):
What this really means is that big data is not yet in the mainstream of technology deployments. It also means that you will not be able to buy an entire solution off the shelve, that instead, you will have to be the first one to implement that crazy successful idea. That is a good thing! It means that there is so much competitive advantage to be had by investing in big data now…
I then caught a glimpse of Oracle Profit (Volume 17, Number 1), which quotes Mark Hurd, Oracle’s President:
“Technology presents the opportunity to Transform Business”
Creativity and technology, now that is something that will really transform a business. And that is why I am writing this post! We – Oracle – can give you the technology platform, we can even give you analytical building blocks (LEGO for your analytics), but you, your business people, are in charge of that big idea.
How do you start with Big Data projects?
I hope the above at least tickled your brain, because the following will want you to believe! But before we change every business, let’s get grounded in what a real start to a big data project is. For many of us, our first big data project is going to be something that needs to prove “it” can be done.
Step 1) Gather the required technology and people to get a real project done. You will need at least (access to) the following:
- Advanced analytics (statistics, data mining, graph analytics, semantics, spatial etc.)
- A technology foundation that can handle a lot of data and allows you to analyze that data (it may not be fully at scale). Make sure you understand what technologies can do what for you, and leverage the internet of ideas to jump start your knowledge on these technologies
- A data set you know, understand, that at least supports the basic idea you are pursuing in your improvement. For example, if you are going to work on social graph driven churn analysis, you will need both your customer data and relevant social data
- A small number of people to work with and drive the above – you might not have them all ready to go, so find someone who is willing to learn and excited about technology…
Step 2) Find a problem you have today (risk, fraud, CRM etc.), that actually costs you money and improve on that problem using the stuff you gathered in step 1. This is your playground, this is your chance to learn (!) and in the process improve something in your organization. It also will prove this analytics and big data stuff really works and drives business value. Your known problem is something that gives you a quantifiable ROI. I’ve shortened the time to do X, adding $Y in our coffers by spending only $Z (where Z < Y)… Oh, and make sure you do this in the shortest possible time frame or you are going to be too late to the party.
For getting things done in #2 (and #4 below), you should actually read the Fortune article mentioned above on how Facebook works, and try to understand the mentality and method applied to new products and features. Build, show and improve or rethink but never take “no” for an answer if you believe in something.
Step 3) Build a production environment and do it all on real hardware and software. Potentially re-evaluate some of the technology you had used before (this stuff better work in production). And then do it all again in step 4 when it really matters.
Step 4) Go catch the big fish! Go after the big idea; what is the new business this opens? That fantastic new component in your services. Rally the troops and go do it! Be creative, use the technology and forget the boundaries. Do make sure you find a corporate sponsor to see this through.
Once you arrive in Step 4, think big, build the first proto-type and never, ever hedge your bets. Go for the big idea, focus and don’t try to do many other things or try to spread the risk by not really doing anything new to avoid the risk of failure.
IMHO, the way to real innovation by leveraging a big data solution is to first follow the money driven by improvement. Use that pilot to learn the technologies (analytics is key!) while solving a real tangible problem, then go all in and do the cool stuff.
PS. This is my motivational speech. I genuinely believe that most of the technology hurdles are gone, we just need to harness the creativity that naturally lives in a business, apply the technology and stick to the big ideas to transform business.
Tuesday Feb 28, 2012
By Jean-Pierre Dijcks-Oracle on Feb 28, 2012
I thought I share this link. Very interesting report written by IDC on Oracle's Big Data strategy.
Thursday Feb 23, 2012
By Jean-Pierre Dijcks-Oracle on Feb 23, 2012
The new OTN forums for big data are now live.
We have two forums, one covering big data as a topic, which includes Oracle Big Data Appliance and the Hadoop ecosystem, a second covering all the components in Oracle Big Data Connectors. The forums are monitored by Oracle and by folks in development, so we hope to provide you with excellent value on all your questions and ideas.
Monday Feb 20, 2012
By Jean-Pierre Dijcks-Oracle on Feb 20, 2012
Oracle’s Big Data development team is Hiring a Product Manager
About Us and the Product
Data is here today. Leading-edge enterprises are implementing Big Data
projects. Oracle has extended its product line to embrace the new
technologies and the new opportunities in Big Data. The Big Data
development team has recently introduced the Big Data Appliance, an
engineered system that combines hardware and software optimized for
Hadoop and Oracle NoSQL Database (More information on Oracle’s Big Data
offerings can be found at oracle.com/bigdata). The Big Data development
team is part of the data warehouse organization in Oracle’s Database
Server Technologies division, a vibrant engineering organization with
deep experience in scalable, parallel data processing, complex query
optimization, and advanced analytics.
About the Role
position is located at our headquarters location in Redwood Shores, CA
(San Francisco Bay Area). We are seeking a product manager for the Big
Data Appliance. As product manager, you would leverage your strong
technical background in order to help define the roadmap for the Oracle
Big Data Appliance, and become one of the faces of Big Data at Oracle.
You will be actively writing collateral, delivering presentations, and
visiting customers to ensure the success of Oracle Big Data Appliance
and other products in Oracle’s Big Data portfolio, while also working
internally within the development organization to ensure that the Big
Data Appliance meets all of the current and future requirements of our
Learn more and apply: http://www.linkedin.com/jobs?viewJob=&jobId=2555091
Thursday Feb 09, 2012
By Jean-Pierre Dijcks-Oracle on Feb 09, 2012
Before you start reading this post please understand the following:
- General Hadoop capacity planning guidelines and principles are here. I’m not doubting, replicating or replacing them.
- I’m going to use our (UPDATED as of Dec 2012)Big Data Appliance as the baseline set of components to discuss capacity. That is because my brain does not hold theoretical hardware configurations.
- This is (a bit of) a theoretical post to make the point that you worry about both (just given away the conclusion!) but I’m not writing the definitive guide to sizing your cluster. I just want to get everyone to think of Hadoop as processing AND storage. Not either or!
Now that you are warned, read on…
Imagine you want to store ~50TB (sounded like a nice round number) of data on a Hadoop cluster without worrying about any data growth for now (I told you this was theoretical). I’ll leave the default replication for Hadoop at 3 and simple math now dictates that I need 3 * 50TB = 150TB of Hadoop storage.
I also need space to do my MapReduce work (and this is the same for Pig and Hive which generate MapReduce) so your system will be even bigger to ensure Hadoop can write temporary files, shuffle date etc.
Now, that is fabulous, so I need to talk to my lovely Oracle Rep and buy a Big Data Appliance which would easily hold the above mentioned 50TB with triple replication. It actually holds 150TB (to make the math easy for me) or so of user data, and you will instantly say that the BDA is way to big!
Ah, but how fast do you want to process data? A Big Data Appliance has 18 nodes, each node has 12 cores to do the work for you. MapReduce is using processes called mappers and reducers (really!) to do the actual work.
Let’s assume that we are allowing Hadoop to spin up 15 mappers per node and 10 reducers per node. Let’s further assume we are going full bore and have every slot allocated to the current and only job’s mappers and reducers (they do not run together I know, theoretical exercise – remember?).
Because you decide the Big Data Appliance was way to big, you have bought 8 equivalent nodes to fit your data. Two of these run your Name Node, your Jobtracker and secondary Name Node (and you should actually have three nodes of all this, but I’m generous and say we run Jobtracker on Secondary Name Node). You have however 6 nodes for the data nodes based on your capacity based sizing.
That system you just bought based on storage will give us 6 * 15 = 90 mappers and 6 * 10 = 60 reducers working on my workload (the 2 other nodes do not run data nodes and do not run mappers and reducers).
Now let’s assume that I finish my job in N minutes on my lovely 8 node cluster by leveraging the full set of workers, and assume that my business users want to refresh the state of the world every N/2 minutes (it always has to go faster), then the assumption would be to simply get 2 * the number of nodes in my original cluster assuming linear scalability… The assumption is reasonable by the way for a lot of workloads, certainly for the ones in social, search and other data patterns that show little data skew because of their overall data size.
A Big Data Appliance gives us 15 * 15 = 225 mappers and 15 * 10 = 150 reducers working on my 50TB of user data… providing a 2.5x speed up on my data set.
Just another reference point on this, a Terasort of 100GB is run on a 20 node cluster with a total disk capacity of 80TB. Now that is of course a little too much, but you will see the point of not worrying too much about “that big system” and think processing power rather than storage.
You will need to worry about the processing requirements and you will need to understand the characteristics of the machine and the data. You should not size a system, or discard something as too big right away by just thinking about your raw data size. You should really, really consider Hadoop to be a system that scales processing and data storage together and use the benefits of the scale-out to balance data size with runtimes.
PS. Yes, I completely ignored those fabulous compression algorithms… Compression can certainly play a role here but I’ve left it out for now. Mostly because it is extremely hard (at least for my brain) to figure out an average compression rate and because you may decide to only compress older data, and compression costs CPU, but allows faster scan speeds and more of this fun stuff…
Monday Jan 09, 2012
By Jean-Pierre Dijcks-Oracle on Jan 09, 2012
Today - January 10th, we announced the general availability of Oracle Big Data Appliance and Oracle Big Data Connectors as well as a partnership with Cloudera. Now that should be fun to start the new year in big data land!!
Big Data Appliance
Oracle Big Data Appliance brings Big Data solutions to mainstream enterprises. Built using industry-standard hardware from Sun and Cloudera's Distribution including Apache Hadoop, the Big Data Appliance is designed and optimized for big data workloads. By integrating the key components of a big data platform into a single product, Oracle Big Data Appliance delivers an affordable, scalable and fully supported big data infrastructure without the risks of a custom built solution. The Big Data Appliance integrates tightly with Oracle Exadata and Oracle Database using Oracle Big Data Connectors, and enables analysis of all data in the enterprise -structured and unstructured.
Big Data Connectors
Built from the ground up by Oracle, Oracle Big Data Connectors delivers a high-performance Hadoop to Oracle Database integration solution and enables optimized analysis using Oracle’s distribution of open source R analysis directly on Hadoop data. By providing efficient connectivity, Big Data Connectors enables analysis of all data in the enterprise – both structured and unstructured.
Cloudera CDH and Cloudera Manager
Oracle Big Data Appliance contains Cloudera’s Distribution including Apache Hadoop (CDH) and Cloudera Manager. CDH is the #1 Apache Hadoop-based distribution in commercial and non-commercial environments. CDH consists of 100% open source Apache Hadoop plus the comprehensive set of open source software components needed to use Hadoop. Cloudera Manager is an end-to-end management application for CDH. Cloudera Manager gives a cluster-wide, real-time view of nodes and services running; provides a single, central place to enact configuration changes across the cluster; and incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization.
Data sheets, white papers and other interesting information can be found here:
* Big Data Appliance OTN page
* Big Data Connectors OTN page
Happy new year and I hope life just got a bit more interesting!!
Thursday Dec 15, 2011
By Jean-Pierre Dijcks-Oracle on Dec 15, 2011
I often get asked about big data, and more often than not we seem to be talking at different levels of abstraction and understanding. Words like real time show up, words like advanced analytics show up and we are instantly talking about products. The latter is typically not a good idea.
So let’s try to step back and go look at what big data means from a use case perspective and how we then map this use case into a usable, high-level infrastructure picture. As we walk through this all you will – hopefully – start to see a pattern and start to understand how words like real time and analytics fit…
The Use Case in Business Terms
Rather then inventing something from scratch I’ve looked at the keynote use case describing Smart Mall (you can see a nice animation and explanation of smart mall in this video).
The idea behind this is often referred to as “multi-channel customer interaction”, meaning as much as “how can I interact with customers that are in my brick and mortar store via their phone”. Rather than having each customer pop out there smart phone to go browse prices on the internet, I would like to drive their behavior pro-actively.
The goals of smart mall are straight forward of course:
- Increase store traffic within the mall
- Increase revenue per visit and per transaction
- Reduce the non-buy percentage
What do I need?
In terms of technologies you would be looking at:
- Smart Devices with location information tied to an invidivual
- Data collection / decision points for real-time interactions and analytics
- Storage and Processing facilities for batch oriented analytics
In terms of data sets you would want to have at least:
- Customer profiles tied to an individual linked to their identifying device (phone, loyalty card etc.)
- A very fine grained customer segmentation
- Tied to detailed buying behavior
- Tied to elements like coupon usage, preferred products and other product recommendation like data sets
A picture speaks a thousand words, so the below is showing both the real-time decision making infrastructure and the batch data processing and model generation (analytics) infrastructure.
The first – and arguably most important step and the most important piece of data – is the identification of a customer. Step 1 is in this case the fact that a user with cell phone walks into a mall. By doing so we trigger the lookups in step 2a and 2b in a user profile database. We will discuss this a little more later, but in general this is a database leveraging an indexed structure to do fast and efficient lookups. Once we have found the actual customer, we feed the profile of this customer into our real time expert engine – step 3. The models in the expert system (customer built or COTS software) evaluate the offers and the profile and determine what action to take (send a coupon for something). All of this happens in real time… keeping in mind that websites do this in milliseconds and our smart mall would probably be ok doing it in a second or so.
To build accurate models – and this where a lot of the typical big data buzz words come around, we add a batch oriented massive processing farm into the picture. The lower half in the picture above shows how we leverage a set of components to create a model of buying behavior. Traditionally we would leverage the database (DW) for this. We still do, but we now leverage an infrastructure before that to go after much more data and to continuously re-evaluate all that data with new additions.
A word on the sources. One key element is POS data (in the relational database) which I want to link to customer information (either from my web store or from cell phones or from loyalty cards). The NoSQL DB – Customer Profiles in the picture show the web store element. It is very important to make sure this multi-channel data is integrated (and de-duplicated but that is a different topic) with my web browsing, purchasing, searching and social media data.
Once that is done, I can puzzle together of the behavior of an individual. In essence big data allows micro segmentation at the person level. In effect for every one of my millions of customers!
The final goal of all of this is to build a highly accurate model to place within the real time decision engine. The goal of that model is directly linked to our business goals mentioned earlier. In other words, how can I send you a coupon while you are in the mall that gets you to the store and gets you to spend money…
Detailed Data Flows and Product Ideas
Now, how do I implement this with real products and how does my data flow within this ecosystem? That is something shown in the following sections…
Step 1 – Collect Data
To look up data, collect it and make decisions on it you will need to implement a system that is distributed. As these devices essentially keep on sending data, you need to be able to load the data (collect or acquire) without much delay. That is done like below in the collection points. That is also the place to evaluate for real time decisions. We will come back to the Collection points later…
The data from the collection points flows into the Hadoop cluster – in our case of course a big data appliance. You would also feed other data into this. The social feeds shown above would come from a data aggregator (typically a company) that sorts out relevant hash tags for example. Then you use Flume or Scribe to load the data into the Hadoop cluster.
Next step is the add data and start collating, interpreting and understanding the data in relation to each other.
For instance, add user profiles to the social feeds and the location data to build up a comprehensive understanding of an individual user and the patterns associated with this user. Typically this is done using MapReduce on Hadoop. The NoSQL user profiles are batch loaded from NoSQL DB via a Hadoop Input Format and thus added to the MapReduce data sets.
To combine it all with Point of Sales (POS) data, with our Siebel CRM data and all sorts of other transactional data you would use Oracle Loader for Hadoop to efficiently move reduced data into Oracle. Now you have a comprehensive view of the data that your users can go after. Either via Exalytics or BI tools or, and this is the interesting piece for this post – via things like data mining.
That latter phase – here called analyze will create data mining models and statistical models that are going to be used to produce the right coupons. These models are the real crown jewels as they allow an organization to make decisions in real time based on very accurate models. The models are going into the Collection and Decision points to now act on real time data.
In the picture above you see the gray model being utilized in the Expert Engine. That model describes / predicts behavior of an individual customer and based on that prediction we determine what action to undertake.
The above is an end-to-end look at Big Data and real time decisions. Big Data allows us to leverage tremendous data and processing resources to come to accurate models. It also allows us to find out all sorts of things that we were not expecting, creating more accurate models, but also creating new ideas, new business etc.
Once the Big Data Appliance is available you can implement the entire solution as shown here on Oracle technology… now you just need to find a few people who understand the programming models and create those crown jewels.
Friday Dec 09, 2011
By Jean-Pierre Dijcks-Oracle on Dec 09, 2011
I've been a bit quiet, been a bit busy working towards releasing our Big Data Appliance. But I thought I'd share the Youtube versions of the Openworld Videos on big data:
Big Data -- The Challenge http://www.youtube.com/watch?v=DeQIdp6vYHg Big Data -- Gold Mine, or just Stuff http://www.youtube.com/watch?v=oiWlOeGG26U Big Data -- Big Data Speaks http://www.youtube.com/watch?v=Qz8bRyf1374 Big Data -- Everything You Always Wanted to Know http://www.youtube.com/watch?v=pwQ9ztbSEpI Big Data -- Little Data http://www.youtube.com/watch?v=J2H6StHNJ18 Should be fun to watch over the weekend!
Thursday Nov 17, 2011
By Jean-Pierre Dijcks-Oracle on Nov 17, 2011
A quick post, looks like Infoworld did a test with Oracle SQL Database and wrote about it. Read more here:
Enjoy and maybe test this one when you start your investigations into a NoSQL Database!
Friday Nov 11, 2011
By Jean-Pierre Dijcks-Oracle on Nov 11, 2011
I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting?
I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms.
The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point…
On the use cases that were mentioned:
- ETL – how can I do complex data transformation at scale
- Doing Basel III liquidity analysis
- Private banking – transaction filtering to feed [relational] data marts
- Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere
- 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer
- Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies]
- Data Mining
- Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples]
- Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords
- Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline
One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate.
Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up.
I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years.
Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it.
The planning focused on three things/phases:
- Determine the Strategy – Design a platform and evangelize this within the organization
- Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience
- Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform
This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform.
One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models.
All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!
Monday Oct 31, 2011
By Jean-Pierre Dijcks-Oracle on Oct 31, 2011
Join us for a webcast on big data and Oracle's offerings in this space:
When: November 3rd, 10am PT / 1pm ET
Where: Register here to attend
As the world becomes increasingly digital, aggregating and analyzing new and diverse digital data streams can unlock new sources of economic value, provide fresh insights into customer behavior, and help you identify market trends early on. But this influx of new data can also create problems for IT departments. Attend this Webcast to learn how to capture, organize, and analyze your big data to deliver new insights with Oracle.
Tuesday Oct 25, 2011
By Jean-Pierre Dijcks-Oracle on Oct 25, 2011
On top of the NoSQL Database release I wanted to share the new paper on big data with all. It gives you an overview of the end-to-end solution as presented at Openworld and places it in context of the importance of big data for our customers.
This is a a quick look at the Executive Summary and the Introduction (or click here for the paper):
Today the term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of non-traditional, less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Decreases in the cost of both storage and compute power have made it feasible to collect this data - which would have been thrown away only a few years ago. As a result, more and more companies are looking to include non-traditional yet potentially very valuable data with their traditional enterprise data in their business intelligence analysis.
To derive real business value from big data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data types and analyze them alongside your existing data to find new insights and capitalize on hidden relationships.
With the recent introduction of Oracle Big Data Appliance, Oracle is the first vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracle's big data strategy is centered on the idea that you can evolve your current enterprise data architecture to incorporate big data and deliver business value. By evolving your current enterprise architecture, you can leverage the proven reliability, flexibility and performance of your Oracle systems to address your big data requirements.
Defining Big Data
Big data typically refers to the following types of data:
- Traditional enterprise data - includes customer information from CRM systems, transactional ERP data, web store transactions, general ledger data.
- Machine-generated /sensor data - includes Call Detail Records ("CDR"), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.
- Social data - includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook
The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it's often the most visible parameter, volume of data is not the only characteristic that matters. In fact, there are four key characteristics that define big data:
- Volume. Machine-generated data is produced in much larger quantities than non-traditional data. For instance, a single jet engine can generate 10TB of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the Petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem.
- Velocity. Social media data streams - while not as massive as machine-generated data - produce a large influx of opinions and relationships valuable to customer relationship management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).
- Variety. Traditional data formats tend to be relatively well described and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.
- Value. The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
To make the most of big data, enterprises must evolve their IT infrastructures to handle the rapid rate of delivery of extreme volumes of data, with varying data types, which can then be integrated with an organization's other enterprise data to be analyzed.
The Importance of Big Data
When big data is distilled and analyzed in combination with traditional enterprise data, enterprises can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater innovation - all of which can have a significant impact on the bottom line.
For example, in the delivery of healthcare services, management of chronic or long-term conditions is expensive. Use of in-home monitoring devices to measure vital signs, and monitor progress is just one way that sensor data can be used to improve patient health and reduce both office visits and hospital admittance.
Manufacturing companies deploy sensors in their products to return a stream of telemetry. Sometimes this is used to deliver services like OnStar, that delivers communications, security and navigation services. Perhaps more importantly, this telemetry also reveals usage patterns, failure rates and other opportunities for product improvement that can reduce development and assembly costs.
The proliferation of smart phones and other GPS devices offers advertisers an opportunity to target consumers when they are in close proximity to a store, a coffee shop or a restaurant. This opens up new revenue for service providers and offers many businesses a chance to target new customers.
Retailers usually know who buys their products. Use of social media and web log files from their ecommerce sites can help them understand who didn't buy and why they chose not to, information not available to them today. This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies.
Finally, social media sites like Facebook and LinkedIn simply wouldn't exist without big data. Their business model requires a personalized experience on the web, which can only be delivered by capturing and using all the available data about a user or member.
The full paper is linked here. Happy reading...
The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.
- PX In Memory, PX In Memory IMC?
- Big Data Spatial and Graph is now released!
- Monitoring Parallel Execution using Real-Time SQL Monitoring in Oracle Database 12c
- Managing overflows in LISTAGG
- Statement of Direction -- Big Data Management System
- Noteworthy event for big data and data warehousing
- Oracle Academy: Data Science Bootcamp for 2015
- Open World 2015 call for papers - my simple guidelines
- Why SQL Part 4 - Intelligent and continuous evolution
- Finding the Distribution Method in Adaptive Parallel Joins