X

Learn about data lakes, machine learning & more innovations

Recent Posts

Innovation

Data in Action: IoT and the Smart Bearing

The Internet of Things (IoT) represents a big wave of technological change, and organizations in virtually every industry will benefit from this technology. Some 4.9 billion connected objects will be in use this year, up 30 percent from just last year, predicts research firm Gartner. By 2020, it adds, that number will increase to some 25 billion connected objects worldwide. Businesses in many industries are evaluating the use of IoT technology for remote monitoring and to improve maintenance for mission critical operations. In a recent article, researchers with McKinsey & Company stated that "High uncertainty and low growth rates have forced companies in transportation, energy, manufacturing and other industries to squeeze every asset for maximum value." The problem is that reactive maintenance exposes companies to significant risks and is not the transformational solution businesses need to remain or become competitive. Cheaper computational power, data streaming, autonomous data management and advanced analytics with embedded machine learning and visualization are enabling more efficient and effective asset utilization." What is needed is a predictive maintenance system that relies on an informed approach for each production asset. It should gather data from multiple connected sources such as temperature and acceleration so that predicting failure is more regulated. A key functional component of assets and equipment in many industries, and the topic of extensive analysis and Industry 4.0 (I4) focus, is the mechanical bearing. Bearings are critical components of rotating equipment including engines, fans, pumps and machines of all types. They are responsible for the continuous operation of planes, vehicles, production machinery, wind turbines, air conditioning systems and elevator hoists. The purpose of bearings is to reduce friction between moving parts and to enable continuous operation. The name comes from the notion of 'bearing' the load of a rotating shaft or sliding surface. Bearings are designed for continuous and long use but each one has a finite lifespan and eventually will fail. Bearing failure causes the equipment it supports to cease operation, resulting in impact that ranges from inconvenient (a household fan stops running) to disruptive (a production line goes down) to catastrophic (a vehicle engine fails). Maintenance managers want to minimize risk and avoid unexpected service disruptions, particularly when the economic or human costs are high, but replacing bearings requires equipment downtime and has significant cost. They are constantly trying to achieve a balance between equipment up time and maintenance cost. Three approaches to bearing maintenance are: (1) run to failure and replace, (2) perform maintenance at scheduled intervals based on observed aggregate historical norms, and (3) use condition monitoring. Bearing condition monitoring is based on wireless sensors embedded in bearings or located in host assemblies. It involves analyzing huge volumes of vibration data, isolating frequencies associated with the bearing geometry, calculating the spectrum view of the data, analyzing the spectrum and then comparing the spectrum to historical data. Before they fail, bearings emit telltale signs of weakness resulting from excessive wear. These signs include increased vibration and higher operating temperature. The trick is to use data streams to anticipate time to failure and to lower the risk of downtime, while maximizing useful life. Handling that stream, storing all the historical data, and running the machine learning models is all part of the big data story. "If you wait too long, you can destroy the shaft and the bearing. But do it too early and you lose money by replacing a bearing that can run longer," says author Alan S. Brown. Bearing Failure Advances in technology have made it possible to establish normal operating conditions by continuously monitoring the performance of each individual bearing, including vibration, temperature, torque and rotational speed, and to then use machine learning to process vast amounts of data. The result is the capability to find hidden patterns that represent potential failure scenarios. and to predict the remaining life of the bearing. The combination of IoT, machine learning and analytics provides a solution for maintenance managers, enabling them to optimize machine life, manage costs and reduce the risk of damaging failure. "Machine learning… comes without the prejudices of engineers who look for problems only when they expect to see them," Brown notes. Combining this capability with powerful, visual analytics provides real-time insight into bearing condition and empowers engineers to reduce cost, raise up time and lower risk. According to Krishna Raman of Frost & Sullivan, "The adoption of the Industrial Internet of Things (IIoT)-based smart bearings, which can self-diagnose impending faults and failures, is expected to significantly increase in aerospace and defense, wind turbines, railway and automotive" segments. Bearing manufacturers are now looking at ways to leverage data and analytics to provide predictability rather than just metal components, and to "…catapult one of the world's oldest mechanical devices into the digital future." Visit our website to learn more about how to apply Oracle Big Data to your IoT strategy. Guest author, Jake Krakauer (@JakeKrakauer) is the Head of Product Marketing, Oracle Analytics

The Internet of Things (IoT) represents a big wave of technological change, and organizations in virtually every industry will benefit from this technology. Some 4.9 billion connected objects will be...

Data Lakes

Design Your Data Lake for Maximum Impact

Data lakes are fast becoming valuable tools for businesses that need to organize large volumes of highly diverse data from multiple sources. However, if you are not a data scientist, a data lake may seem more like an ocean that you are bound to drown in. Making a data lake manageable for everyone requires mindful designs that empower users with the appropriate tools. A recent webcast conducted by TDWI and Oracle, entitled "How to Design a Data Lake with Business Impact in Mind," identified the best use cases for using a data lake and then defined how to design one for an enterprise-level business. The presentation recommended keeping data-driven use cases at the forefront, making a data lake a central IT-managed function, blending old and new data, empowering self-service, and establish a sponsor group to manage the company's data lake plan with enough staffing and skills to keep it relevant. "Business want to make more fact-based data but they also want to go deeper into the data they have with analytics," says Philip Russom, a Senior Research Director for Data Management at TDWI. "We see data lakes as a good advantage for companies that want to do this as the data can be repurposed repeatedly for new analytics and use cases." Data lake usage is on the rise, according to TDWI surveys. A 2017 query revealed that nearly a quarter of those businesses questioned (23 percent) have a data lake already in production with another quarter (24 percent) expected to launch in 12 months with only 7 percent admitting they would not jump into a data lake. A significant number (21 percent) said they would be establish a data lake within three years. In the same survey, respondents were asked about the business benefit of deploying a Hadoop-based data lake. Half (49 percent) rated advanced analytics including data mining, statistics, and machine learning the primary use case, followed by data exploration and discovery. The third largest response saw big data source for analytics as the third most likely use case for a data lake. Use cases for data lakes include investigating new data coming from sensors and machines, streaming, and human language text. More complex uses for data lakes include multiplatform data warehouse environments, omnichannel marketing, and digital supply chain. As the best argument for deploying and using a data lake is to be able to blend old and new data together. This is especially helpful for departments like marketing, finance, and governance which require insight from multiple sources old and new. Russom noted multi-module enterprise resource planning, Internet of Things (IoT), insurance claim workflow, and digital healthcare would all be areas that could benefit from data lake deployments. When it comes to design, Russom suggests the following: Create a plan, prioritize use cases, and update as biz evolves Choose data platform(s) that support business requirements Get tools that work with platform and satisfy user requirements Augment your staff with consultants experienced with data lakes Train staff for Hadoop, analytics, lakes, clouds. Start with business use case that a lake can address w/ROI Bruce Edwards, a Cloud Luminary and Information Management Specialist with Oracle, added that the convergence of cloud, big data, and data science have enabled the explosion of data lake deployments. Having a central vendor that not only understands large scale data management the but can integrate existing infrastructures into core data lake components is essential. "What data lake users need is an open, integrated, self-healing, high performance tool," Edwards said. "These elements are all needed to allow businesses to begin their data lake journey. To experience the entire webcast, download the presentation from our website. if you’re ready to start playing around with a data lake, we can offer you a free trial right here.

Data lakes are fast becoming valuable tools for businesses that need to organize large volumes of highly diverse data from multiple sources. However, if you are not a data scientist, a data lake...

Analytics

Big Data Preparation: The Key to Unlocking Value from Your Data

Making a success of big data analytics is a bit like constructing a skyscraper. Foundations need to be laid and the land prepared for construction, or else the building will rest on shaky ground. Download your free book, "Driving Growth & Innovation with Big Data" The success of any analytics project depends on the quality and relevance of the data it’s built upon. The issue today is that companies collect an exponentially large volume and variety of information in many different formats and are struggling to convert it all into useable insight. In short, they're having trouble preparing their big data and unlocking the value.  Difficulties with Big Data Preparation For instance, before analysis, a business may need to aggregate data from diverse sources, remove, or complete empty data fields, de-duplicate data, or transform data into a consistent format. These tasks have traditionally relied on the expertise of the IT department – even as ownership of analytics projects has shifted towards line of business leaders. But as volumes of data grow, preparing data in these ways becomes more laborious. With this mounting demand, IT teams can take weeks to fulfill requests. Businesses have recognized this and are investing in data preparation technologies. Two thirds say they have implemented a data preparation or wrangling solution to manage a growing volume of data, and 56% have done so to help them work with multiple data sources, according to research from Forrester. Today’s data preparation tools aren’t restricted to those with IT expertise and they allow companies to spread their analytics processes to individual lines of business. Not only does this dislodge their data bottleneck, but analyses are managed by subject matter experts with a keen eye for the most valuable insights. How Companies Use Big Data for Business Benefits As organizations are overwhelmed by the flood of data, it’s also important to unify data from the various sources and ensure they are accessible and consistent across the business. For example, CaixaBank is storing vast pools of data on one consolidated platform – commonly referred to as a data lake – so each of its business units can access, analyze, and digest relevant data as needed. From here, businesses can start experimenting with the data to explore new ideas. For instance, Telefonica worked with a single view of its data to test a new algorithm designed to create personalized TV-content optimized pricing models for customers. After successful testing, Telefonica made the algorithm live and has since seen higher TV viewing rates and improved customer satisfaction, while also reducing customer churn by 20%. In addition to unlocking the commercial value of data, there is a strong regulatory driver for companies to gain more control and oversight of their data. When the EU’s GDPR comes into effect this month, companies will face harsh penalties if they are not transparent about the way they collect, use, and share customer information. Conclusion To reach skyscraper heights and build the businesses of tomorrow, data preparation must rise up the corporate agenda and be a priority for all companies looking to unlock the value of their ever-increasing volumes of data. From data scientists and analysts, who work closely with company data each day, to business leaders exploring new ways to improve the way they work, Oracle has a set of rich integrated solutions for everybody in your organization.  Read our ebook, "Driving Growth & Innovation With Big Data" to understand how Oracle’s Cloud Platform for Big Data helps companies uncover new benefits across their business.

Making a success of big data analytics is a bit like constructing a skyscraper. Foundations need to be laid and the land prepared for construction, or else the building will rest on shaky ground. Downlo...

CALL FOR SPEAKERS is Now Open for Oracle BIWA Summit'18 User Community Meeting in March, 2018

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center at Oracle Headquarters Campus, Redwood Shores, CA Share your successes… We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2018, featuring Oracle Spatial and Graph Summit, March 20 - 22, 2018 and share your successes with Oracle technology. The call for speakers is now open through December 3, 2017.  Submit now for possible early acceptance and publication in Oracle BIWA Summit 2018 promotion materials.  Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2018. Oracle Spatial and Graph Summit will be held in partnership with BIWA Summit.  BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) User Community and the Oracle Spatial and Graph SIG – a Special Interest Group in the Independent Oracle User Group (IOUG). BIWA Summits attract presentations and talks from the top Business Intelligence, Data Warehousing, Advanced Analytics, Spatial and Graph, and Big Data experts. The 3-day BIWA Summit 2017 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2017. Call for Speaker DEADLINE is December 3, 2017 at midnight Pacific Time. Presentations and Hands-on Labs must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated.  Speakers whose abstracts are accepted will be expected to submit their presentation as PDF slide deck for posting on the BIWA Summit conference website.  Accompanying technical and use case papers are encouraged, but not required. Complimentary registration to Oracle BIWA Summit 2018 is provided to the primary speaker of each accepted presentation. Note:  Any additional co-presenters need to register for the event separately and provide appropriate registration fees.    Please submit session proposals in one of the following areas: Machine Learning Analytics Big Data Data Warehousing and ETL Cloud Internet of Things Spatial and Graph (Oracle Spatial and Graph Summit) …Anything else “Cool” using Oracle technologies in “novel and interesting” ways Proposals that cover multiple areas are acceptable and highly encouraged.  On your submission, please indicate a primary track and any secondary tracks for consideration.  The content committee strongly encourages technical/how to sessions, strategic guidance sessions, and real world customer end user case studies, all using Oracle technologies. If you submitted a session last year, your login should carry over for 2018. We will be accepting abstracts on a rolling basis, so please submit your abstracts as soon as possible. Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle technology on premises, on Cloud, hybrid Cloud, and infrastructure: Cloud & Infrastructure Spatial & Graph Analytics Big Data & Machine Learning Internet of Things Database  Cloud Service Big Data Cloud Service Data Visualization Cloud Service Hadoop Spark Big Data Connectors (Hadoop & R) IaaS, PaaS, SaaS Spatial and Graph for Big Data and Database GIS and smart cities features Location intelligence Geocoding & routing Property graph DB Social network, fraud detection, deep learning graph analytics RDF graph Oracle Data Visualization Big Data Discovery OBIEE OBIA Applications Exalytics Real-Time Decisions Machine Learning Advanced Analytics Data Mining R Enterprise Fraud detection Text Mining SQL Patterns Clustering Market Basket Analysis Big Data Preparation Big Data from sensors Edge Analytics Industrial Internet IoT Cloud Monetizing IoT Security Standards   What To Expect 400+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking New at this year’s BIWA Summit: Strategy track – targeted at the C-level audience, how to assess and plan for new Oracle Technology in meeting enterprise objectives Oracle Global Leaders track – sessions by Oracle’s Global Leader customers on their use of Oracle Technology, and targeted product managers on latest Oracle products and features Grad-student track – sessions on cutting edge university work using Oracle Technology, continuing Oracle Academy’s sponsorship of graduate student participation  Exciting Topics Include:  Database, Data Warehouse, and Cloud, Big Data Architecture Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL Novel and Interesting Use Cases of Spatial and Graph, Text, Data Mining, ETL, Security, Cloud Working with Big Data:  Hadoop, "Internet of Things", SQL, R, Sentiment Analysis Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics —All Better Together Example Talks from BIWA Summit 2017:  [Visit www.biwasummit.org to see the  Full Agenda from BIWA’17 and to download copies of BIWA’17 presentations and HOLs.] Machine Learning Taking R to new heights for scalability and performance Introducing Oracle Machine Learning Zeppelin Notebooks Oracle's Advanced Analytics 12.2c New Features & Road Map: Bigger, Better, Faster, More! An Post -- Big Data Analytics platform and use of Oracle Advanced Analytics Customer Analytics POC for a global retailer, using Oracle Advanced Analytics Oracle Marketing Advanced Analytics Use of OAA in Propensity to Buy Models Clustering Data with Oracle Data Mining and Oracle Business Intelligence How Option Traders leverage Oracle R Enterprise to maximize trading strategies From Beginning to End - Oracle's Cloud Services and New Customer Acquisition Marketing K12 Student Early Warning System Business Process Optimization Using Reinforcement Learning Advanced Analytics & Graph: Transparently taking advantage of HW innovations in the Cloud Dynamic Traffic Prediction in Road Networks Context Aware GeoSocial Graph Mining Analytics Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Make the most of Oracle DV (DVD / DVCS / BICS) Data Visualization at SoundExchange – A Case Study Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Does Your Data Have a Story? Find out with Oracle Data Visualization Desktop Social Services Reporting, Visualization, and Analytics Using OBIEE Leadership Essentials in Successful Business Intelligence (BI) Programs Big Data Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Why Apache Spark has become the darling in Big Data space? Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Cloud Computing Oracle Big Data Management in the Cloud Oracle Cloud Cookbook for Professionals Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Deploying Oracle Database in the Cloud with Exadata: Technical Deep Dive Employee Onboarding: Onboard – Faster, Smarter & Greener Deploying Spatial Applications in Oracle Public Cloud Analytics in the Oracle Cloud: A Case Study Deploying SAS Retail Analytics in the Oracle Cloud BICS - For Departmental Data Mart or Enterprise Data Warehouse? Cloud Transition and Lift and Shift of Oracle BI Applications Data Warehousing and ETL Business Analytics in the Oracle 12.2 Database: Analytic Views Maximizing Join and Sort Performance in Oracle Data Warehouses Turbocharging Data Visualization and Analyses with Oracle In-Memory 12.2 Oracle Data Integrator 12c: Getting Started Analytic Functions in SQL My Favorite Scripts 2017 Internet of Things Introduction to IoT and IoT Platforms The State of Industrial IoT Complex Data Mashups: an Example Use Case from the Transportation Industry Monetizable Value Creation from Industrial-IoT Analytics Spatial and Graph Summit Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Build Recommender Systems, Detect Fraud, and Integrate Deep Learning with Graph Technologies Building a Tax Fraud Detection Platform with Big Data Spatial and Graph technologies Maps, 3-D, Tracking, JSON, and Location Analysis: What’s New with Oracle’s Spatial Technologies Deploying Spatial Applications in Oracle Public Cloud RESTful Spatial services with Oracle Database as a Service and ORDS Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Smart Parking for a Smart City Using Oracle Spatial and Graph at Los Angeles and Munich Airports Analysing the Panama Papers with Oracle Big Data Spatial and Graph Apply Location Intelligence and Spatial Analysis to Big Data with Java  Example Hands-on Labs from BIWA Summit 2017: Using R for Big Data Advanced Analytics and Machine Learning Learn Predictive Analytics in 2 hours!  Oracle Data Miner Hands on Lab Deploy Custom Maps in OBIEE for Free Apply Location Intelligence and Spatial Analysis to Big Data with Java Use Oracle Big Data SQL to Analyze Data Across Oracle Database, Hadoop, and NoSQL Make the most of Oracle DV (DVD / DVCS / BICS) Analyzing a social network using Big Data Spatial and Graph Property Graph Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’17.   Dan Vlamis and Shyam Nath , Oracle BIWA Summit '18 Conference Co-Chairs

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center...

Analytics

What's new in the latest Big Data Cloud Service-Compute Edition Release

Big Data Cloud Service – Compute Edition new release update version 17.2.5 is generally available. What’s New: Big Data File System: Big Data Cloud Service - Compute Edition includes the Oracle Big Data File System (BDFS), an in-memory file system with support for tiered storage that accelerates access to data stored in Cloud Storage and enables Big Data workloads to run much faster. Customers no longer have to choose between the performance of a HDFS based Data Lake and the agility/lower-cost of a Cloud Storage based Data Lake. Bootstrap Scripts: Bootstrap Script feature is available with the release 17.2.5. Bootstrap scripts help customers to spin up customized big data clusters.  This capability helps customers to install binaries, load data/libraries, customize configurations and/or perform any other action (that can be executed through script) after the default cluster provisioning. A sample bootstrap illustrating the install of R is also included. MapReduce Jobs: Previous release supported creation and running of MapReduce Jobs as an experimental feature.   This MapReduce feature in the current release is no more experimental and is in production. Customers can submit MapReduce jobs using big data cluster console, the REST API or CLI (Command Line Interface) Highlights since GA Deployment Profiles: Deployment Profiles are pre-defined set of services optimized for specific use case or workload. This helps users to avoid the complexity of choosing various Hadoop components for their big data workloads. High Performance Block Storage: Now customers can utilize High Performance SSDs in conjunction with their Big Data - Compute Edition clusters. (Part # B87608) Cloud@Customer: Oracle Cloud Machine X6 supports Big Data Cloud Service –Compute Edition deployment. Starting FY18 BDCS-CE runs on Oracle Cloud Machine (OCM). Customers that can’t go to public cloud for various reasons now can leverage BDCS-CE running on OCM in Customers’ data center. BDCS-CE can be consumed both as subscriptions and as a metered capacity on the OCM. Additional Hadoop Components: Big Data Cloud Service-Compute Edition continues to add support for additional Hadoop components. Since GA, we have added support for Apache Hive, Apache Spark-R and Apache Mahout in BDCS-CE. The Apache Zeppelin version has also been updated to Zeppelin 0.7.x To learn more about Big Data Cloud Service - Compute Edition check out these resources:  BDCS-CE Public Website BDCS-CE Introduction Video BDCS-CE Getting Started Video BDCS-CE Demos & Videos New Data Lake Workshop

Big Data Cloud Service – Compute Edition new release update version 17.2.5 is generally available. What’s New: Big Data File System: Big Data Cloud Service - Compute Edition includes the Oracle Big Data...

Big Data

Harness the Power of Big Data

Companies today have built a voracious appetite for data and insights. They demand information at their fingertips, but face the growing complexities of data ingestion, data processing, data management, and data security. Here's where most companies find their dreams for agility come to a screeching halt. Time to Crack the Code Business and IT leaders can together untangle this challenging situation by finding the answer to the fundamental question: How can my company best harness the power of data in a way that easily delivers real-time, streamlined insights without compromising security? It's time to crack the code and see how companies like CaixaBank have reaped the fruits of success. via GIPHY Removing the Complexity: You Need Data that Listens to You For businesses to succeed, they need the speed and simplicity of the public cloud. Specifically, users need to be able to grow storage capacity or increase compute on-demand without needing to configure for peak workloads. The challenge is that most companies face cloud compliance or corporate policy regulations that inhibit the effective use of public cloud services. So teams end up building their own private cloud in an attempt to run large and diverse data workloads at speed and scale. But that often results in the purchase of complex infrastructure that is difficult to maintain, patch, and upgrade. This runs counter to what organizations actually wants to do—which is getting to extract their data ASAP. What enterprises end up with is a complex data environment that is hard to manage and control. Data-driven leaders find themselves with an abyss of uncooperative data that doesn't listen to their analytical needs. The Good News The winners are those who can combine the data sources they want into the shape they need for the task at hand; they achieve data liquidity. In the past, the public cloud had been great for data liquidity because the public cloud is an elastic and easy to provision service that scales up and down as needed and users only need to pay on a per-usage basis. But today's policies often demand that data stay behind the firewall and this eliminates the long-term option of leveraging the public cloud. The good news? Enterprises can finally securely manage large data workloads with ease and speed as well as deliver actionable insights by tapping into the Oracle Big Data Cloud Machine. Customers like CaixaBank rely on the Oracle Big Data Cloud Machine to enjoy the same on-demand, subscription benefits of the public cloud in their data center behind their firewall. Now, companies can rapidly unleash the value of data with ease by leveraging the latest addition to Oracle's cloud offering—the Oracle Big Data Cloud Machine.

Companies today have built a voracious appetite for data and insights. They demand information at their fingertips, but face the growing complexities of data ingestion, data processing,...

Analytics

Oracle's SQL Based Statistical Functions - FREE in every Oracle Database On-Premise on in Cloud

Included in every Oracle Database is a collection of basic statistical functions accessible via SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fits, cross tabs with Chi-square statistics, and analysis of variance (ANOVA). The basic statistical functions are implemented as SQL functions and leverage all the strengths of the Oracle Database.  The SQL statistical functions work on Oracle tables and views and exploit all database parallelism, scalability, user privileges and security schemes.  Hence the SQL statistical functions can be included and exposed within SQL queries, BI dashboards and embedded in real-time Applications.    The SQL statistical functions can be used in a variety of ways.  For example, users can call Oracle's SQL statistical functions to obtain mean, max, min, median, mode and standard deviation information for their data; or users can measure the correlations between attributes and measure the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. The SQL Aggregate functions return a single result row based on groups of rows, rather than on single rows while the SQL Analytical functions compute an aggregate value based on a group of rows.  SQL statistical functions include:  Descriptive statistics (e.g. median, stdev, mode, sum, etc.) Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test Correlations analysis (parametric and nonparametric e.g. Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient) Ranking functions Cross Tabulations with Chi-square statistics Linear regression ANOVA (Analysis of variance) Test Distribution fit (e.g. Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test, etc.) Aggregate functions Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers) LAG/LEAD functions Reporting aggregate functions   STATS_T_TEST_INDEPU Example: The following example determines the significance of the difference between the average sales to men and women where the distributions are known to have significantly different (unpooled) variances: SELECT SUBSTR(cust_income_level, 1, 22) income_level,      AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,      AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,      STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,      STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value    FROM sh.customers c, sh.sales s WHERE c.cust_id = s.cust_id    GROUP BY ROLLUP(cust_income_level)    ORDER BY income_level, sold_to_men, sold_to_women, t_observed;  INCOME_LEVEL           SOLD_TO_MEN SOLD_TO_WOMEN T_OBSERVED TWO_SIDED_P_VALUE  ---------------------- ----------- ------------- ---------- -----------------  A: Below 30,000          105.28349    99.4281447 -2.0542592        .039964704  B: 30,000 - 49,999       102.59651    109.829642 2.96922332        .002987742  C: 50,000 - 69,999      105.627588    110.127931  2.3496854        .018792277  D: 70,000 - 89,999      106.630299    110.47287  2.26839281        .023307831  E: 90,000 - 109,999     103.396741    101.610416 -1.2603509        .207545662  F: 110,000 - 129,999     106.76476    105.981312 -.60580011        .544648553  G: 130,000 - 149,999    108.877532    107.31377  -.85219781        .394107755  H: 150,000 - 169,999    110.987258    107.152191 -1.9451486        .051762624  I: 170,000 - 189,999    102.808238    107.43556  2.14966921        .031587875  J: 190,000 - 249,999    108.040564    115.343356 2.54749867        .010854966  K: 250,000 - 299,999    112.377993    108.196097 -1.4115514        .158091676  L: 300,000 and above    120.970235    112.216342 -2.0726194        .038225611                          107.121845    113.80441  .689462437        .490595765                          106.663769    107.276386 1.07853782        .280794207  14 rows selected.  (See link below to SQL Language Reference for STATS_T_TEST_*)    Most statistical software vendors charge license fees for these statistical capabilities.  Oracle includes them in every Oracle Database. Users can reduce annual license fees and perform the equivalent basic statistical functionality while keeping big data and analytics simple in a single, unified, consistent, scalable and secure Oracle Database platform.  Because the statistical functions are native SQL functions, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways. Additionally, Oracle Advanced Analytics' Oracle R Enterprise component exposes the SQL statistical functions through the R statistical programming language and allows R users to use R statistical functions e.g. Summary but then pushes down the R functions to the equivalent SQL statistical functions for avoidance of data movement and significant in-database performance gains. The SQL Developer Oracle Data Miner workflow GUI extension also leverages the SQL statistical functions in the Explore, Graph, SQL Query and Transform nodes.

Included in every Oracle Database is a collection of basic statistical functions accessible via SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for...

Big Data

The New Data Lake - You Need More Than HDFS

  A data lake is a key element of any big data strategy and conventional wisdom has it that Hadoop/HDFS is the core of your lake. But conventional wisdom changes with new information (which is why we're no longer living on an earth presumed to be both flat and at the center of the universe), and in this case that new information is all about object storage. Guest blogger Paul Miller, Big Data and Analytics Manager at Oracle, has this post on object storage as the foundation of the new data lake. And if you'd like to try building one yourself, head over to our New Data Lake Workshop (it's free!) which will guide you through the process. After a short time, you'll have a functioning, modern data lake, ready to go. Object Store is the New Data Lake There are many ways to persist data in cloud platforms today such as Object, Block, File, SMB, DB, Queue, Archive, etc. As an overview, here are Oracle’s, AWS' and Azure’s primary storage solutions. Object Based Distributed Storage: Key/Content driven interface Oracle Object Store AWS S3 Azure Blob Storage File Based Distributed Storage: Nested file/folders interface Oracle BDCS-CE Storage (HDFS) AWS EMR HDFS/EMRFS Azure Data Lake Store (HDFS) Block Based Storage: Raw disk like 1s and 0s interface  Oracle Cloud Block Volume Storage AWS Elastic Block Storage (EBS) Azure Disk Storage Of the three persistence strategies outlined above, Object Based Distributed Storage is the center piece for public cloud platforms. Amazon paved a mindset centered around cloud native application developers using object store (AWS S3) as their persistent store. Object store is now the integration point where cloud and on-premise applications can easily persist and distribute data globally in a canonical way. Oracle, recognizing this fact, made a massive investment in developing an object store that is fast and easy to use within the Oracle Public Cloud. When it comes to analytics, cloud native persistence and backup targets, Oracle Object Store is critical. How Object Storage Works Object storage is a scalable redundant foundational storage service. Objects and files are written to multiple disk drives spread throughout servers in the Oracle Public Cloud, with the Oracle’s software responsible for ensuring data replication and integrity across the cluster. Because Oracle uses provisioning logic to maintain availability locally and across different data centers, they are able to provide 11 9s data durability. Should anything fail, Oracle handles the replication of the container's content from other active nodes to new locations in the Oracle Public Cloud ecosystem. When it comes to using the latest in greatest tools for data science and fast data processing, object store enables agility, cost saving and deployment time saving capabilities by: 1.    Detaching compute from storage allowing for the environments to grow independently - check out what we are doing with Big Data Cloud Service CE or IoT Cloud Service 2.    Persisting all the data in a low cost, globally distributed store that speeds processes up while making it more durable 3.    Maintaining a core, distribution based environment (Cloudera) while being able to use the latest and greatest Hadoop projects on demand (Apache) The Benefits of Object Store Hadoop HDFS' strategy of intrinsically tying storage and compute is increasing becoming an inefficient use of resources when it comes to enterprise data lakes. Think of object store as the lowest tier in your storage hierarchy. Object store allows you to decouple storage from compute giving organizations more flexibility, durability and cost savings. Store everything in object store and read only the data you need into the application or processing tier (Java CS, Node.js, Coherence Data Grid, DBaaS, Spark RDD, Essbase, etc) on demand. At the end of the day, the cost of copying this data as needed is small compared with the cost savings and the increased flexibility. These key factors placed object store at the center of our Oracle Analytics and Big Data Reference Architecture: Don't forget to visit our other blog article on data lake best practices. Or if you're ready to get started, try building a data lake for free with an Oracle trial. 

  A data lake is a key element of any big data strategy and conventional wisdom has it that Hadoop/HDFS is the core of your lake. But conventional wisdom changes with new information (which is why...

Analytics

"It's tough to make predictions...

... especially about the future" as a wise man once said (though check #36). But we've been doing this for a few years now, and 2017's list finally made it to oracle.com/bigdata or here's a direct link to the PDF. With some additional help from Yogi, we did a webcast with O'Reilly back in December which is still up for you to view if you'd like some more background. "You can observe a lot just by watching" aptly describes machine learning which was the subject of our first prediction. Simplifying hugely, ML is just the process of using an algorithm to examine data and come up with new insights. Initially the preserve of data scientist, ML is becoming more widely used and embedded in other tools and applications: everything from music recommendations to IT. And speaking of IT tools,  Oracle Management Cloud already embeds ML to do things like flagging unusual resource usage, identify configuration changes and forecast outages before they happen. Systems management is a classic big data problem, with lots of different data sources and formats, real-time data streams, and now the opportunity to apply sophisticated analytics to deliver benefits that weren't possible before. Expect new capabilities like that in many more products this year. We'll do some more background posts about these predictions throughout the year. When exactly will that happen? Don't know. After all, it's tough to make predictions...  

... especially about the future" as a wise man once said (though check #36). But we've been doing this for a few years now, and 2017's list finally made it to oracle.com/bigdata or here's a direct...

Analytics

CALL FOR ABSTRACTS: Oracle BIWA Summit '17 - THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Oracle User Conference 2017

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool"  Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA   What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Successes Can You Share? We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016. Submit now for possible early acceptance and publication inOracleBIWA Summit 2017 promotion materials. Presentations must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated. Speakers whose abstracts are accepted will be expected to submit at a later date a presentation outline and presentation PDF slide deck. Accompanying technical and use case papers are encouraged, but not required. Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2017. BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) SIG, the Oracle Spatial and Graph SIG—both Special Interest Groups in the Independent Oracle User Group (IOUG), and the Oracle Northern California User Group. BIWA Summits attract presentations and talks from the top BI, DW, Advanced Analytics, Spatial, and Big Data experts. The 3-day BIWA Summit 2016 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2016. Call for Speaker DEADLINE is October 1, 2016 at midnight Pacific Time. Complimentary registration to Oracle BIWA Summit 2017 is provided to the primary speaker of each accepted abstract.  Note: One complimentary registration per accepted session will be provided. Any additional co-presenters need to register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration. Please submit speaker proposals in one of the following tracks: Advanced Analytics Business Intelligence Big Data + Data Discovery Data Warehousing and ETL Cloud Internet of Things Spatial and Graph …Anything else “Cool” using Oracle technologies in “novel and interesting” ways    Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle Business Intelligence, Data Warehousing, IoT and Analytical products: Cloud & Big Data DW & Data Integration BI & Data Discovery & Visualization Advanced Analytics & Spatial Internet of Things Oracle Database Cloud Service Big Data Appliance Oracle Data Visualization Cloud Service Hadoop abd Spark Big Data Connectors (Hadoop & R)   Oracle Data as a Service Engineered Systems Exadata Oracle Partitioning Oracle Data Integrator (ETL) In-Memory Oracle Big Data Preparation Cloud Service   Big Data Discovery Data Visualization OBIEE OBI Applications Exalytics Cloud Real-Time Decisions Oracle Advanced Analytics Oracle Spatial and Graph Oracle Data Mining & Oracle Data Miner Oracle R Enterprise SQL Patterns Oracle Text Oracle R Advanced Analytics for Hadoop Big Data from sensors Edge Analytics Industrial Internet IoT Cloud Monetizing IoT Security Standards   What To Expect 500+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking Exciting Topics Include:  Database, Data Warehouse, and Cloud, Big Data Architecture Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL Novel and Interesting Use Cases of Everything! Spatial, Text, Data Mining, ETL, Security, Cloud Working with Big Data: Hadoop, "Internet of Things", SQL, R, Sentiment Analysis Oracle Big Data Discovery, Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics—All Better Together Example Talks from BIWA Summit 2016:   [Visit www.biwasummit.org to see the last year’s Full Agenda from BIWA’16 and to download copies of BIWA’16 presentations and HOLs.]   Advanced Analytics Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency, Frank Heilland, Oracle Sales and Support Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments, Julia Minkowski, Fiserv Enabling Clorox as Data Driven Enterprise, Yigal Gur, Clorox Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud, Charlie Berger, Oracle Stubhub and Oracle Advanced Analytics, Brian Motzer, Stubhub Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold, Mark Hornick, Oracle Large Scale Machine Learning with Big Data SQL, Hadoop and Spark, Marcos Arancibia, Oracle Oracle R Enterprise 1.5 - Hot new features!, Mark Hornick, Oracle BI and Visualization Electoral fraud location in Brazilian General Elections 2014, Alex Cordon, Henrique Gomes, CDS See What’s There and What’s Coming with BICS & Data Visualization, Philippe Lions, Oracle Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option, Kai Yu, Dell BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres, Tim Vlamis, Vlamis Defining a Roadmap for Migrating to Oracle BI Applications on ODI, Patrick Callahan, AST Corp. Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c, Philippe Lions, Oracle Big Data How to choose between Hadoop, NoSQL or Oracle Database , Jean-Pierre Djicks, Oracle Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer, Mark Rittman, Rittman Mead Oracle Big Data: Strategy and Roadmap, Neil Mendelson, Oracle High Speed Video Processing for Big Data Applications, Melliyal Annamalai, Oracle How to choose between Hadoop, NoSQL or Oracle Database, Shyam Nath, General Electric What's New With Oracle Business Intelligence 12c, Stewart Bryson, Red Pill Leveraging Oracle Big Data Discovery to Master CERN’s Control Data, Antonio Romero Marin, CERN Cloud Computing Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology, Giovanni Corcione, Oracle Oracle DBaaS Migration Road Map, Daniel Morgan, Forsythe Meta7 Safe Passage to the CLOUD – Analytics, Rich Solari, Privthi Krishnappa, Deloitte Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud, Jeffrey Schauer, JS Business Intelligence Data Warehousing and ETL Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) , Panel Discussion, Andy Mendelsohn, Oracle, Steve Feuerstein, Oracle, George Lumpkin, Oracle The Place of SQL in the Hybrid World, Kerry Osborne and Tanel Poder, Accenture Enkitec Group Is Oracle SQL the best language for Statistics, Brendan Tierney, Oralytics Taking Full Advantage of the PL/SQL Compiler, Iggy Ferenandez, Oracle Internet of Things Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive, Robert Liekar, M&S Consulting Spatial Summit Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management, Dave Ellerbeck, Global Information Systems Oracle Spatial and Graph: New Features for 12.2, Siva Ravada, Oracle High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph, Qingyun (Jeffrey) Xie, Oracle Example Hands-on Labs from BIWA Summit 2016: Scaling R to New Heights with Oracle Database, Mark Hornick, Oracle, Tim Vlamis, Vlamis Software Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.1, Charlie Berger, Oracle, Brendan Tierney, Oralytics, Karl Rexer, Rexer Analytics Predictive Analytics using SQL and PL/SQL, Oracle Brendan Tierney, Oralytics, Charlie Berger, Oracle Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases, Pravin Patil, Kapstone Lunch & Partner Lightning Rounds Fast and Fun 5 Minute Presentations from Each Partner--Must See!   Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’16. Dan Vlamis and Shyam Nath , Oracle BIWA Summit '17Conference Co-Chairs

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool"  Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores,...

Three Successful Customers Using IoT and Big Data

When I wrote about the convergence of IoT and big data I mentioned that we have successful customers. Here I want to pick three that highlight different aspects of the complete story. There are a lot of different components to a complete big data solution. These customers are using different pieces of the Oracle solution, integrating them with existing software and processes.  Gemü manufactures precision valves used to make things like pharmaceuticals. As you can imagine, it's critical that valves operate correctly to avoid adding too much or too little of an active ingredient. So Gemü turned to the Oracle IoT Cloud Service to help the monitor those valves in use in their customers' production lines. This data helps Gemü and their partners ensure the quality of their product. And over time, this data will enable them to predict failures or even the onset of out of tolerance performance. Predictive maintenance is a potentially powerful new capability and enables Gemü to maintain the highest levels of quality and safety. From small valves to the largest machine on the planet: the Large Hadron Collider at CERN. There are many superlatives about this system. Their cryogenics system is also the largest in the world, and has to keep 36,000 tons of superconducting magnets at 1.9K (-271.3 Celsius) using 120 tons of liquid helium. Failures in that system can be costly. They've had problems with a weasel and a baguette, both of which are hard to predict, but other failures could potentially be stopped. Which is why CERN is using Big Data Discovery to help them understand what's going on with their cryogenics system. They are also using predictive analytics with the ultimate goal of predicting failures before they happen, and avoiding the two months it can take to warm up systems long enough to make even a basic repair, before cooling them down again. And finally this one. IoT and big data working together can help a plane to fly, a valve to make pharmaceuticals, and the world's largest machine to stay cool. What can we do for you?

When I wrote about the convergence of IoT and big data I mentioned that we have successful customers. Here I want to pick three that highlight different aspects of the complete story. There are a lot...

Focus On Big Data at Oracle OpenWorld!

Oracle OpenWorld is fast approaching and you won’t want to miss the big data highlights. Participate in our live demos, attend a theater session, or take part in one of our many hands-on labs, user forums, and conference sessions all dedicated to big data. Whether you’re interested in machine learning, predictive maintenance, real-time analytics, the internet of things (IoT), data-driven marketing, or learning how Oracle supports open source technologies such as Kafka, Apache Spark, and Hadoop as part of our core strategy, we have the information for you. For more details on how to center your attention on Big Data at OpenWorld, you can access the “Focus On” Big Data program guide link, however here are a few things you won’t want to miss: General Session: Oracle Cloud Platform for Big Data [GEN7471] Tuesday, Sept. 20th 11:00 a.m. | Moscone South—103 Oracle Cloud Platform for big data enables complete, secure solutions that maximize value to your business, lowers costs and increases agility, and embraces open source technologies. Learn about Oracle’s strategy for big data in the cloud. Oracle Big Data Management in the Cloud [CON7473] Wednesday, Sept. 21, 11:00 a.m. | Moscone South—302 Successful analytical environments require seamless integration of Hadoop, Spark, NoSQL, and relational databases. Data virtualization can eliminate data silos and make this information available to your entire business. Learn to tame the complexity of data management. Oracle Big Data Lab in the Cloud [CON7474] Wednesday, Sep 21, 12:15 p.m. | Moscone South—302 Business analysts and data scientists can experiment and explore diverse data sets and uncover what new questions can be answered in a data lab environment. Learn about the future of the data lab in the cloud and also how lab insights can unlock the value of big data for the business.   Oracle Big Data Integration in the Cloud [CON7472] Tuesday, Sep 20, 4:00 p.m. | Moscone South—302 Oracle Data Integration’s cloud services and solutions can help manage your data movement and integration challenges across on-premises, cloud, and other data platforms. Get started quickly in the cloud with data integration for Hadoop, Spark, NoSQL, and Kafka. You’ll also see the latest data preparation self-service tools for nontechnical users. Drive Business Value and Outcomes Using Big Data Platform [THT7828] Monday, Sep 19, 2:30 p.m. | Big Data Theater, Moscone South Exhibition Hall Driving business value with big data requires more than big data technology. Learn how to maximize the value of big data by bringing together big data management, big data analytics, and enterprise applications. The session explores several different use cases and shows what it takes to construct integrated solutions that address important business problems. Oracle Streaming Big Data and Internet of Things Driving Innovation [CON7477] Wednesday, Sep 21, 3:00 p.m. | Moscone South—302 In the Internet of Things (IoT), a wealth of data is generated, and can be monitored and acted on in real time. Applying big data techniques to store and analyze this data can drive predictive, intelligent learning applications. Learn about how the convergence of IoT and big data can reduce costs, generate competitive advantage, and open new business opportunities. Oracle Big Data Showcase Moscone South Visit the Big Data Showcase throughout the show and participate in a live demo or attend one of our many dedicated 20-minute theater sessions with big data experts. We are looking forward to Oracle OpenWorld 2016 and we can’t wait to see you there! In the meantime, check out oracle.com/bigdata for more information.    

Oracle OpenWorld is fast approaching and you won’t want to miss the big data highlights. Participate in our live demos, attend a theater session, or take part in one of our many hands-on labs, user...

Internet of Things and Big Data - Better Together

What's the difference between the Internet of Things and Big Data? That's not really the best question to ask, because these two are much more alike than they are different. And they complement each other very strongly which is one reason we've written a white paper on the convergence. Big data is all about enabling organizations to use more of the data around them: things customers write in social media; log files from applications and processes; sensor and device data. And there's IoT! One way to think of it is as one of the sources for big data. But IoT is more than that. It's about collecting all that data, analyzing it in real time for events or patterns of interest, and making sure to integrate any new insight into the rest of your business. With you add the rest of big data to IoT, there's much more data to work with and powerful big data analytics to come up with additional insights. Best to look at an example. Using IoT you can track and monitor assets like trucks, engines, HVAC systems, and pumps. You can correct problems as you detect them. With big data, you can analyze all the information you have about failures and start to uncover the root causes. Combine the two and now you can not just react to problems as they occur. You can predict them, and fix them before they occur. Go from being reactive to being proactive. Check out this infographic. The last data point, down at the bottom right hand side may be the most important one. Only 8% of businesses are fully capturing and analyzing IoT data in a timely fashion. Nobody likes to arrive last to a party and find the food and drink all gone. This party's just getting started. You should be asking every vendor you deal with how they can help you take advantage of IoT and big data - they really are better together, and there's lots of opportunity. Next post will highlight 3 customers who are taking advantage of that opportunity.

What's the difference between the Internet of Things and Big Data? That's not really the best question to ask, because these two are much more alike than they are different. And they complement each...

Cloud

DIY Hadoop: Proceed At Your Own Risk

Could your security and performance be in jeopardy? Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think all those people generate a huge amount of data (in the form of website visits, clicks, likes, tweets, photos, online transactions, and blog posts), wait for the data explosion that will happen when the Internet of Things (IoT) meets the Internet of People. Gartner, Inc. forecast that there will be twice as many--6.4 billion--Internet-connected gadgets (everything from light bulbs to baby diapers to connected cars) in use worldwide in 2016, up 30 percent from 2015, and will reach over 20 billion by 2020. Companies of all sizes and in virtually every industry are struggling to manage the exploding amounts of data. To cope with the problem, many organizations are turning to solutions based on Apache Hadoop, the popular open-source software framework for storing and processing massive datasets. But purchasing, deploying, configuring, and fine-tuning a do-it-yourself (DIY) Hadoop cluster to work with your existing infrastructure can be much more challenging than many organizations expect, even if your company has the specialized skills needed to tackle the job.   But as both business and IT executives know all too well, managing big data involves far more than just dealing with storage and retrieval challenges—it requires addressing a variety of privacy and security issues as well. Beyond the brand damage that companies like Sony and Target have experienced in the last few years from data breaches, there's also the likelihood that companies that fail to secure the life cycle of their big data environments will face regulatory consequences. Early last year, the Federal Trade Commission released a report on the Internet of Things that contains guidelines to promote consumer privacy and security.  The Federal Trade Commission’s document, Careful Connections: Building Security in the Internet of Things, encourages companies to implement a risk-based approach and take advantage of best practices developed by security experts, such as using strong encryption and proper authentication. While not calling for new legislation (due to the speed of innovation in the IoT space), the FTC report states that businesses and law enforcers have a shared interest in ensuring that consumers’ expectations about the security of IoT products are met. The report recommends several "time-tested" security best practices for companies processing IoT data, such as: Implementing "security by design" by building security into your products and services at the outset of your planning process, rather than grafting it on as an afterthought. Implementing a defense-in-depth approach that incorporates security measures at several levels. Business and IT executives who try to follow the FTC's big data security recommendations are likely to run into roadblocks, especially if you're trying to integrate Hadoop with your existing IT infrastructure. The main problem with Hadoop is that is it wasn’t originally built with security in mind; it was developed solely to address massive distributed data storage and fast processing, which leads to the following threats: DIY Hadoop. A do-it-yourself Hadoop cluster presents inherent risks, especially since many times it's developed without adequate security by a small group of people in a laboratory-type setting, closed off from a production environment. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management, and so forth—becomes more difficult and time-consuming. Unauthorized access. Built under the principle of “data democratization”—so that all data is accessible by all users of the cluster— Hadoop has had challenges complying with certain compliance standards, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data Security Standard (PCI DSS). That’s due to the lack of access controls on data, including password controls, file and database authorization, and auditing. Data provenance. With open source Hadoop, it has been difficult to determine where a particular dataset originated and what data sources it was derived from.  Which means you can end up basing critical business decisions on analytics taken from suspect or compromised data. 2X Faster Performance than DIY Hadoop In his keynote at last year's Oracle OpenWorld 2015, Intel CEO Brian Krzanich described work Intel has been doing with Oracle to build high performing datacenters using the pre-built Oracle Big Data Appliance, an integrated, optimized solution powered by the Intel Xeon processor family. Specifically, he referred to recent benchmark testing by Intel engineers that showed an Oracle Big Data Appliance solution with some basic tuning achieved nearly two times better performance than a comparable DIY cluster built on comparable hardware. Not only is it faster, but it was designed to meet the security needs of the enterprise.  Oracle Big Data Appliance automates the steps required to deploy a secure cluster – including complex tasks like setting up authentication, data authorization, encryption, and auditing.  This dramatically reduces the amount of time required to both set up and maintain a secure infrastructure. Do-it-yourself (DIY) Apache Hadoop clusters are appealing to many business and IT executives because of the apparent cost savings from using commodity hardware and free software distributions. As I've shown, despite the initial savings, DIY Hadoop clusters are not always a good option for organizations looking to get up to speed on an enterprise big data solution, both from a security and performance standpoint. Find out how your company can move to an enterprise Big Data architecture with Oracle’s Big Data Platform at https://www.oracle.com/big-data. Securing the Big Data Life Cycle Deploying an Apache Hadoop Cluster? Spend Your Time on BI, Not DIY

Could your security and performance be in jeopardy? Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think...

Innovation

The Surprising Economics of Engineered Systems

The title's not mine. It comes from a video done for us by ESG, based on their white paper, which looks at the TCO of building your own Hadoop cluster vs buying one ready-built (Oracle Big Data Appliance). You should watch or read, depending on your preference, or even just check out the infographic. The conclusion could be summed up as "better, faster, cheaper, pick all three". Which is not what you'd expect. But they found that it's better (quicker to deploy, lower risk, easier to support), faster (from 2X to 3X faster than a comparable DIY cluster) and cheaper (45% cheaper if you go with list pricing). So while you may not think that an engineered system like the Big Data Appliance is the right system for you, it should always be on your shortlist. Compare it with building your own - you'll probably be pleasantly surprised. There's a lot more background in the paper in particular, but let me highlight a few things:  - We have seen some instances where other vendors offer huge discounts and actually beat the BDA price.  If you see this, check two things. First, will that discount be available for all future purchases or is this just a one-off discount. And second, remember to include the cost that you incur to setup, manage, maintain and patch the system. -  Consider performance. We worked with Intel to tune Hadoop for this specific configuration. There are something like 500 different parameters on Hadoop that can impact performance one way or the other. That tuning project was a multi-week exercise with several different experts. The end result was performance of nearly 2X, sometimes up to 3X faster than a comparable, untuned DIY cluster. Do you have the resources and expertise to replicate this effort? Would a doubling of performance be useful to you? - Finally, consider support. A Hadoop cluster is a complex system. Sometimes problems arise that result from the interaction of multiple components. It can be really hard to figure those out, particularly when multiple vendors are involved for different pieces. When no single component is "at fault" it's hard to find somebody to fix the overall system. You'd never buy a computer with 4 separate support contracts for operating system, CPU, disk and network card - you'd want one contract for the entire system. The same can be true for your Hadoop clusters as well.

The title's not mine. It comes from a video done for us by ESG, based on their white paper, which looks at the TCO of building your own Hadoop cluster vs buying one ready-built (Oracle Big Data...

Predictions for Big Data Security in 2016

Leading into 2016, Oracle made ten big data predictions, and one in particular around security. We are nearly four months into the year and we've seen these predictions coming to light. Increase in regulatory protections of personal information Early February saw the creation of the Federal Privacy Council, "which will bring together the privacy officials from across the Government to help ensure the implementation of more strategic and comprehensive Federal privacy guidelines. Like cyber security, privacy must be effectively and continuously addressed as our nation embraces new technologies, promotes innovation, reaps the benefits of big data and defends against evolving threats." The European Union General Data Protection Regulation is a reform of EU's 1995 data protection rules (Directive 95/46/EC). Their Big Data fact sheet was put forth to help promote the new regulations. "A plethora of market surveys and studies show that the success of providers to develop new services and products using big data is linked to their capacity to build and maintain consumer trust." As a timeline, the EU expects adoption in Spring 2016 and enforcement will begin two years later in Spring 2018. Earlier this month, the Federal Communications Commission announced a proposal to restrict Internet providers' ability to share the information they collect about what their customers do online with advertisers and other third parties. Increase use of classification systems that categorize data into groups with pre-defined policies for access, redaction and masking. Infosecurity Magazine article highlights the challenge of data growth and the requirement for classification: "As storage costs dropped, the attention previously shown towards deleting old or unnecessary data has faded. However, unstructured data now makes up 80% of non-tangible assets, and data growth is exploding. IT security teams are now tasked with protecting everything forever, but there is simply too much to protect effectively – especially when some of it is not worth protecting at all." The three benefits of classification highlighted include the ability to raise security awareness, prevent data loss, and address records management regulations. All of these are legitimate benefits of data classification that organizations should consider. Case in point, Oracle customer Union Investment increased agility and security by automatically processing investment fund data within their proprietary application, including complex asset classification with up to 500 data fields, which were previously distributed to IT staff using spreadsheets. Continuous cyber-threats will prompt companies to both tighten security, as well as audit access and use of data. This is sort of a no-brainer. We know more breaches are coming, such as here, here and here. And we know companies increase security spending after they experience a data breach or witness one close to home. Most organizations now know that completely eliminating the possibility of a data breach is impossible, and therefore, appropriate detective capabilities are more important than ever. We must act as if the bad guys are on our network and then detect their presence and respond accordingly. See the rest of the Enterprise Big Data Predictions, 2016. Image Source: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

Leading into 2016, Oracle made ten big data predictions, and one in particular around security. We are nearly four months into the year and we've seen these predictions coming to light. Increase in...

Accelerating SQL Queries that Span Hadoop and Oracle Database

It's hard to deliver "one fast, secure SQL query on all your data". If you look around you'll find lots of "SQL on Hadoop" implementations which are unaware of data that's not on Hadoop. And then you'll see other solutions that combine the results of two different SQL queries, written in two different dialects, and run mostly independently on two different platforms. That means that while they may work, the person writing the SQL is effectively responsible for optimizing that joint query and implementing the different parts in those two different dialects. Even if you get the different parts right, the end result is more I/O, more data movement and lower performance. Big Data SQL is different in several ways. (Start with this blog to get the details). From the viewpoint of the user you get one single query, in a modern, fully functional dialect of SQL. The data can be located in multiple places (Hadoop, NoSQL databases and Oracle Database) and software, not a human, does all the planning and optimization to accelerate performance. Under the covers, one of the key things it tries to do is minimize I/O and minimize data movement so that queries run faster. It does that by trying to push down as much processing as possible to where the data is located. Big Data SQL 3.0 completes that task: now all the processing that can be pushed down, is pushed down. I'll give an example in the next post. What this means is cross-platform queries that are as easy to write, and as highly performant, as a query written just for one platform. Big Data SQL 3.0 further improves the "fast" part of "one fast, secure SQL query on all your data". We'd encourage you to test it against anything else out there, whether it's a true cross-platform solution or even something that just runs on one platform.

It's hard to deliver "one fast, secure SQL query on allyour data". If you look around you'll find lots of "SQL on Hadoop" implementations which are unaware of data that's not on Hadoop. And then...

Delegation and (Data) Management

Every business book you read talks about delegation. It's a core requirement for successful managers: surround yourself with good people, delegate authority and responsibility to them, and get out of their way. It turns out that this is a guiding principle for Big Data SQL as well. I'll show you how. And without resorting to code. (If you want code examples, start here). Imagine a not uncommon situation where you have customer data about payments and billing in your data warehouse, while data derived from log files about customer access to your online platform is stored in Hadoop. Perhaps you'd like to see if customers who access their accounts online are any better at paying up when their bills come due. To do this, you might want to start by determining who is behind on payments, but has accessed their account online in the last month. This means you need to query both your data warehouse and Hadoop together. Big Data SQL uses enhanced Oracle external tables for accessing data in other platforms like Hadoop. So your cross-platform query looks like a query on two tables in Oracle Database. This is important, because it means from the viewpoint of the user (or application) generating the SQL, there's no practical difference between data in Oracle Database, and data in Hadoop. But under the covers there are differences, because some of the data is on a remote platform. How you process that data to minimize both data movement and I/O is key to maximizing performance. Big Data SQL delegates work to Smart Scan software that runs on Hadoop (derived from Exadata's Smart Scan software). Smart Scan on Hadoop does its own local scan, returning only the rows and columns that are required to complete that query, thus reducing data movement, potentially quite dramatically. And using storage indexing, we can avoid some unnecessary I/O as well. For example, if we've indexed a data block and know that the minimum value of "days since accessed accounts online" is 34, then we know that none of the customers in that block has actually accessed their accounts in the last month (30 days). So this kind of optimization reduces I/O. Together, these two techniques increase performance. Big Data SQL 3.0 goes one step further, because there's another opportunity for delegation. Projects like ORC or Parquet, for example, are efficient columnar data stores on Hadoop. So if your data is there, Big Data SQL's Smart Scan can delegate work to them, further increasing performance. This is the kind of optimization that the fastest SQL on Hadoop implementations do. Which is why we think that with Big Data SQL you can get performance that's comparable to anything else that's out there. But remember, with Big Data SQL you can also use the SQL skills you already have (no need to learn a new dialect), your applications can access data in Hadoop and NoSQL using the same SQL they already use (don't have to rewrite applications), and the security policies in Oracle Database can be applied to data in Hadoop and NoSQL (don't have to write code to implement a different security policy). Hence the tagline: One Fast, Secure SQL Query on All Your Data.

Every business book you read talks about delegation. It's a core requirement for successful managers: surround yourself with good people, delegate authority and responsibility to them, and get out of...

Oracle Big Data SQL 3.0 adds support for Hortonworks Data Platform and commodity clusters

Big Data SQL has been out for nearly two years and version 3.0 is a major update. In addition to increasing performance (next post) we've added support for clusters that aren't built on engineered systems. And alongside several other Oracle big data software products, Big Data SQL now also supports Hortonworks Data Platform. Before 3.0, the requirements were simple. You could use Big Data SQL to deliver "one fast, secure SQL query on all your data" as long as you were using our Big Data Appliance to run Hadoop and Exadata to run Oracle Database. While those configurations continue, they are not required. If you are running Cloudera Enterprise or Hortonworks Data Platform on any commodity cluster, you can now connect that with your existing Oracle data warehouse. And you don't need Exadata to run Oracle Database (you will need version 12.1.0.2 or later), either. Any system running Linux should do the trick. Many of our other big data products also run on Hortonworks HDP. With version 3.0, Big Data SQL joins Big Data Discovery, Big Data Spatial and Graph, Big Data Connectors, GoldenGate for Big Data and Data Integrator for Big Data. Most Oracle data warehouse customers are running Hadoop somewhere in the organization. If that's you, and you're using Cloudera Enterprise or Hortonworks HDP, then it's much easier now to link those two data management components together. So instead of silos, you can have all your data simply and securely accessible via the language you and your applications already know: SQL.

Big Data SQL has been out for nearly two years and version 3.0 is a major update. In addition to increasing performance (next post) we've added support for clusters that aren't built on engineered...

Learn Predictive Analytics in 2 Days - New Oracle University Course!

What you will learn: This Predictive Analytics using Oracle Data Mining Ed 1 training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to leverage the predictive analytical power of Oracle Data Mining, a component of the Oracle Advanced Analytics option. Learn To: Explain basic data mining concepts and describe the benefits of predictive analysis. Understand primary data mining tasks, and describe the key steps of a data mining process. Use the Oracle Data Miner to build, evaluate, apply, and deploy multiple data mining models. Use Oracle Data Mining's predictions and insights to address many kinds of business problems. Deploy data mining models for end-user access, in batch or real-time, and within applications. Benefits to You When you've completed this course, you'll be able to use the Oracle Data Miner 4.1, the Oracle Data Mining "workflow" GUI, which enables data analysts to work directly with data inside the database. The Data Miner GUI provides intuitive tools that help you explore the data graphically, build and evaluate multiple data mining models, apply Oracle Data Mining models to new data, and deploy Oracle Data Mining's predictions and insights throughout the enterprise. Oracle Data Miner's SQL APIs - Get Results in Real-Time Oracle Data Miner's SQL APIs automatically mine Oracle data and deploy results in real-time. Because the data, models, and results remain in the Oracle Database, data movement is eliminated, security is maximized and information latency is minimized. Introduction Course Objectives Suggested Course Prerequisites Suggested Course Schedule Class Sample Schemas Practice and Solutions Structure Review location of additional resources Predictive Analytics and Data Mining Concepts What is the Predictive Analytics? Introducting the Oracle Advanced Analytics (OAA) Option? What is Data Mining? Why use Data Mining? Examples of Data Mining Applications Supervised Versus Unsupervised Learning Supported Data Mining Algorithms and Uses Understanding the Data Mining Process Common Tasks in the Data Mining Process Introducing the SQL Developer interface Introducing Oracle Data Miner 4.1 Data mining with Oracle Database Setting up Oracle Data Miner Accessing the Data Miner GUI Identifying Data Miner interface components Examining Data Miner Nodes Previewing Data Miner Workflows Using Classification Models Reviewing Classification Models Adding a Data Source to the Workflow Using the Data Source Wizard Using Explore and Graph Nodes Using the Column Filter Node Creating Classification Models Building the Models Examining Class Build Tabs Using Regression Models Reviewing Regression Models Adding a Data Source to the Workflow Using the Data Source Wizard Performing Data Transformations Creating Regression Models Building the Models Comparing the Models Selecting a Model Using Clustering Models Describing Algorithms used for Clustering Models Adding Data Sources to the Workflow Exploring Data for Patterns Defining and Building Clustering Models Comparing Model Results Selecting and Applying a Model Defining Output Format Examining Cluster Results Performing Market Basket Analysis What is Market Basket Analysis? Reviewing Association Rules Creating a New Workflow Adding a Data Source to the Workflow Creating an Association Rules Model Defining Association Rules Building the Model Examining Test Results Performing Anomaly Detection Reviewing the Model and Algorithm used for Anomaly Detection Adding Data Sources to the Workflow Creating the Model Building the Model Examining Test Results Applying the Model Evaluating Results Mining Structured and Unstructured Data Dealing with Transactional Data Handling Aggregated (Nested) Data Joining and Filtering data Enabling mining of Text Examining Predictive Results Using Predictive Queries What are Predictive Queries? Creating Predictive Queries Examining Predictive Results Deploying Predictive models Requirements for deployment Deployment Options Examining Deployment Options

What you will learn: This Predictive Analytics using Oracle Data Mining Ed 1training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to leverage...

Links to Presentations: BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center

Note: Cross-posting this BIWA Summit Links to Presentations blog entry. Can also be found at https://blogs.oracle.com/datamining/entry/links_to_presentations_biwa_summit We had a great www.biwasummit.org event with ~425 attendees, in depth technical presentations delivered by experts and even had several 2 hour Hands on Labs training classes that used the Oracle Database Cloud! Watch for more coverage of event in various Oracle marketing and partner content venues. Many thanks to all the BIWA board of directors and many volunteers who have put in so much work to make this BIWA Summit the best BIWA user event ever. Mark your calendars for BIWA Summit’17, January 31, Feb. 1 & Feb. 2, 2017. We’ll be announcing Call for Abstracts in the future, so please direct your best customers and speakers to submit. We’re aiming to continue to make BIWA + Spatial + YesSQL Summit the best focused user gathering for sharing best practices for novel and interesting use cases of Oracle technologies. BIWA is an IOUG SIG run by entirely by customers, partners and Oracle employee volunteers. We’re always looking for people who would like to be involved. Let me know if you’d like to contribute to the planning and organization of future BIWA events and activities. See everyone at BIWA’17! Charlie, on behalf of the entire BIWA board of directors (charlie.berger@oracle.com) (see www.biwasummit.org for more information) See List of BIWA Summit'16 Presentations below. Click on Details to access the speaker’s abstract and download the files (assuming the speaker has posted them for sharing). We now have a schedule at a glance to show you all the sessions in a tabular agenda. See bottom of page for the Session Search capability Below is a list of the sessions and links to download most of the materials for the various sessions. Click on the DETAILS button next to the session you want to download, then the page should refresh with the session description and (assuming the presenter uploaded files, but be aware that files may be limited to 5MB) you should see a list of files for that session. See the full list below: Advanced Analytics Presentations (Click on Details to access file if submitted by presenter) Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Predictive Modelling and Forecasting using OER Details Enabling Clorox as Data Driven Enterprise Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Large Scale Machine Learning with Big Data SQL, Hadoop and Spark Details Stubhub and Oracle Advanced Analytics Details Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments Details Advanced Analytics for Call Center Operations Details Machine Learning on Streaming Data via Integration of Oracle R Enterprise and Oracle Stream Explorer Details Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.0 Hands on Lab Details Scaling R to New Heights with Oracle Database Details Predictive Analytics using SQL and PL/SQL Details Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud Details Improving Predictive Model Development Time with R and Oracle Big Data Discovery Details Oracle R Enterprise 1.5 - Hot new features! Details Is Oracle SQL the best language for Statistics Details BI and Visualization Presentations (Click on Details to access file if submitted by presenter) Electoral fraud location in Brazilian General Elections 2014 Details The State of BI Details Case Study of Improving BI Apps and OBIEE Performance Details Preparing for BI 12c Upgrade Details Data Visualization at Sound Exchange – a Case Study Details Integrating OBIEE and Essbase, Why it Makes Sense Details The Dash that changed a culture Details Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option Details Oracle Data Visualization vs. Answers: The Cage Match Details What's New With Oracle Business Intelligence 12c Details Workforce Analytics Leveraging Oracle Business Intelligence Cloud Serivces (BICS) Details Defining a Roadmap for Migrating to Oracle BI Applications on ODI Details See What’s There and What’s Coming with BICS & Data Visualization Details Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c Details Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases Details On Metadata, Mashups and the Future of Enterprise BI Details OBIEE 12c and the Leap Forward in Lifecycle Management Details Supercharge BI Delivery with Continuous Integration Details Visual Analyzer and Best Practices for Data Discovery Details BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres Details Oracle Business Intelligence (OBIEE) the Smart View Way Details Big Data Presentations (Click on Details to access file if submitted by presenter) Oracle Big Data: Strategy and Roadmap Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Leveraging Oracle Big Data Discovery to Master CERN’s Control Data Details Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer Details Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform Details High Speed Video Processing for Big Data Applications Details Enterprise Data Hub with Oracle Exadata and Oracle Big Data Appliance Details How to choose between Hadoop, NoSQL or Oracle Database Details Analytical SQL in the Era of Big Data Details Cloud Computing Presentations (Click on Details to access file if submitted by presenter) Oracle DBaaS Migration Road Map Details Centralizing Spatial Data Management with Oracle Cloud Databases Details End Users data in BI - Data Mashup and Data Blending with BICS , DVCS and BI 12c Details Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud Details Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology Details Build Your Cloud with Oracle Engineered Systems Details Safe Passage to the CLOUD – Analytics Details Your Journey to the Cloud : From Dedicated Physical Infrastructure to Cloud Bursting Details Data Warehousing and ETL Presentations (Click on Details to access file if submitted by presenter) Getting to grips with SQL Pattern Matching Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Controlling Execution Plans (without Touching the Code) Details Taking Full Advantage of the PL/SQL Result Cache Details Taking Full Advantage of the PL/SQL Compiler Details Advanced SQL: Working with JSON Data Details Oracle Database In-Memory Option Boot Camp: Everything You Need to Know Details Best Practices for Getting Started With Oracle Database In-Memory Details Extreme Data Warehouse Performance with Oracle Exadata Details Real-Time SQL Monitoring in Oracle Database 12c Details A Walk Through the Kimball ETL Subsystems with Oracle Data Integration Details MySQL 5.7 Performance: More Than 1.6M SQL Queries per Second Details Implement storage tiering in Data warehouse with Oracle Automatic Data Optimization Details Edition-Based Redefinition Case Study Details 12-Step SQL Tuning Method Details Where's Waldo? Using a brute-force approach to find an Execution Plan the CBO hides Details Delivering an Enterprise-Wide Standard Chart of Accounts at GE with Oracle DRM Details Agile Data Engineering: Introduction to Data Vault Data Modeling Details Worst Practice in Data Warehouse Design Details Same SQL Plan, Different Performance Details Why Use PL/SQL? Details Transforming one table to another: SQL or PL/SQL? Details Understanding the 10053 Trace Details Analytic Views - Bringing Star Queries into the Twenty-First Century Details The Place of SQL in the Hybrid World Details The Next Generation of the Oracle Optimizer Details Internet of Things Presentations (Click on Details to access file if submitted by presenter) Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Meet Your Digital Twin Details Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Big Data and the Internet of Things in 2016: Beyond the Hype Details IoT for Big Machines Details The State of Internet of Things (IoT) Details Oracle Spatial Summit Presentations (Click on Details to access file if submitted by presenter) Build Your Own Maps with the Big Data Discovery Custom Visualization Component Details Massively Parallel Calculation of Catchment Areas in Retail Details Dismantling Criminal Networks with Graph and Spatial Visualization and Analysis Details Best Practices for Developing Geospatial Apps for the Cloud Details Map Visualization in Analytic Apps in the Cloud, On-Premise, and Mobile Details Best Practices, Tips and Tricks with Oracle Spatial and Graph Details Delivering Smarter Spatial Data Management within Ordnance Survey, UK Details Deploying a Linked Data Service at the Italian National Institute of Statistics Details ATLAS - Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management Details Oracle Spatial 12c as an Applied Science for Solving Today's Real-World Engineering Problems Details Assembling a Large Scale Map for the Netherlands Using Oracle 12c Spatial and Graph Details Using Open Data Models to Rapidly Develop and Prototype a 3D National SDI in Bahrain Details Implementation of LBS services with Oracle Spatial and Graph and MapViewer in Zain Jordan Details Interactive map visualization of large datasets in analytic applications Details Gain Insight into Your Graph Data -- A hands on lab for Oracle Big Data Spatial and Graph Details Applying Spatial Analysis To Big Data Details Big Data Spatial: Location Intelligence, Geo-enrichment and Spatial Analytics Details What’s New with Spatial and Graph? Technologies to Better Understand Complex Relationships Details Graph Databases: A Social Network Analysis Use Case Details High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph Details 3D Data Management - From Point Cloud to City Model Details The Power of Geospatial Visualization for Linear Assets Using Oracle Enterprise Asset Management Details Oracle Spatial and Graph: New Features for 12.2 Details Fast, High Volume, Dynamic Vehicle Routing Framework for E-Commerce and Fleet Management Details Managing National Broadband Infrastructure at Turk Telekom with Oracle Spatial and Graph Details Other Presentations (Click on Details to access file if submitted by presenter) Taking Full Advantage of the PL/SQL Compiler Details Taking Full Advantage of the PL/SQL Result Cache Details Meet Your Digital Twin Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Lightning Round for Vendors Details

Note: Cross-posting this BIWA Summit Links to Presentations blog entry. Can also be found at https://blogs.oracle.com/datamining/entry/links_to_presentations_biwa_summit We had a great www.biwasummit.org e...

Experimental data labs take off 2016

Oracle's #2 big data prediction out of the 10 predictions for 2016 is experimental data labs will take off. With more hypotheses to investigate, professional data scientists will see increasing demand for their skills from established companies. For example, watch how banks, insurers, and credit-rating firms turn to algorithms to price risk and guard against fraud more effectively. But many such decisions are hard to migrate from clever judgments to clear rules. Expect a proliferation of experiments default risk, policy underwriting, and fraud detection as firms try to identify hotspots for algorithmic advantage faster than the competition. Watch how the CERN European Lab Project's data scientist has delivered self-service analytics flexible enough for engineers to better research the physics of particle collisions in the universe by providing a full picture of the overall status of the accelerator complex. Oracle Big Data Discovery facilitates the data exploration and leverages the power of Hadoop to transform and analyze large amount and variety of data. Another sign that the experimental data labs will take off into data factories of production is from the big data survey Oracle conducted in August 2015 that had 633 global IT decision makers respondents to gauge capabilities. 64% of all global respondents stated that they are able to use big data in real-time as competitive advantage. However, only 45% respondents in the Asia-Pacific region agreed they can respond with big data in real-time. Watch StubHub's principal architect discuss on how they use Oracle Advanced Analytics to understand its customers in its online marketplace. Analysis times are much shorter, setup was fast and easy, and data scientists like the integration of R with the data warehouse. All fans the choice to buy or sell their tickets in a safe, convenient, and highly reliable environment. Read why StubHub senior manager of data science declares, “Big data is having a tremendous impact on how we run our business. Oracle Database and its various options—including Oracle Advanced Analytics—combine high-performance data-mining functions with the open source R language to enable predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations, and interactive graphics—all inside the database.” Learn how other customers are capitalizing on big data and what analysts are saying now at oracle.com/big-data.

Oracle's #2 big data prediction out of the 10 predictions for 2016 is experimental data labs will take off. With more hypotheses to investigate, professional data scientists will see increasing...

IoT + Cloud = Drives Bigger, Big Data

The Internet of Things (IoT) represents the next big wave of real technological change, and it’s a wave that’s rolling in fast. Organizations in virtually every industry will benefit from this technology—if, that is, they’re able to take advantage of the opportunity and avoid the potential pitfalls of both implementation and production. In an interesting recent survey(1) around the topic of improved IoT data collection and analysis, 70 percent of businesses surveyed say they would make better, more meaningful decisions with improved data. However, today only 8% of businesses are fully capturing and analyzing IoT data in a timely fashion. Experience tells us that this is typically due to complexities like cost, technology and limited resources. Therefore, as businesses start to manage and analyze big device data, they will gain benefit and value from the insights. During 2016 the percentage of businesses gaining benefit and value from the new insights brought by IoT data will increase- driving the need for ever closer integration between IoT and Big Data services. http://www.oracle.com/us/dm/oracle-iot-cloud-service-2625351.pdf (1) ParStream, New Research Highlights the Untapped Potential of IoT Data, 2015

The Internet of Things (IoT) represents the next big wave of real technological change, and it’s a wave that’s rolling in fast. Organizations in virtually every industry will benefit from...

Big Data For All? Oracle's 2016 Top 10 Predictions

It's time for Oracle's annual predictions in big data for the year to identify the key areas of change. This is the year the big data adoption trend will begin to make the leap from 3,000 organizations that Hadoop and Spark vendors count as paying customers with most only in development to tens of thousands of organizations in production. The industry will finally begin to shift gears into more mainstream applications, affecting thousands more businesses. We are making 10 predictions for 2016 that have three trend categories: a big expansion to the big data user base, major technology advances, and growing effects on society, politics, and business process. The #1 out of the Top 10 Predictions: Data civilians will operate more and more like data scientists. While complex statistics may still be limited to data scientists, data-driven decision-making shouldn’t be. In the coming year, simpler big data discovery tools will let business analysts shop for datasets in enterprise Hadoop clusters, reshape them into new mashup combinations, and even analyze them with exploratory machine learning techniques. Extending this kind of exploration to a broader audience will both improve self-service access to big data and provide richer hypotheses and experiments that drive the next level of innovation. Oracle conducted a big data survey in August 2015 that had 633 global IT decision makers respondents to gauge top benefits and impediments. 55% of the respondents reported the biggest benefit for big data projects is to simplify access to all data, ahead of 53% for faster and better decision making and 48% for increased business and customer insight at 48%. Empowering business users to analyze their own data ranked as the top purpose of big data at with 83% of respondents in agreement slightly followed combing structured and unstructured data in a single query at 82%. Watch Ovum's Tom Pringle and Oracle's Nick Whitehead discuss Thriving in the Age of Big Data Analytics and Self-Service The demand for organizations to empower business managers and civilians to be as productive as data scientists for decision making is real and the self-service capabilities to find, transform, discover, explore, and share insight from big data and with any other data are available. Learn how other customers are capitalizing on big data and what analysts are saying now at oracle.com/big-data.

It's time for Oracle's annual predictions in big data for the year to identify the key areas of change. This is the year the big data adoption trend will begin to make the leap from 3,000 organizations...

DIY for IT is not like going to the hardware store

I enjoy a trip to the hardware store as much as the next person. I like the feeling of achievement after I've built, painted or repaired something. And I know I've saved money when I compare the cost of the parts I bought, with the cost of paying a pro to do it all for me. Of course, I don't account for my own time or how much extra time it takes to complete the job (or even the risk that I'll have to pay a pro more to fix a mistake). But if you're working in IT, those costs are real. And that's what our third big data prediction for 2016 is all about. For a few years now we've done a white paper comparing the Big Data Appliance with a DIY cluster. Somewhat paradoxically for many people, it really is cheaper to buy an appliance than to build your own from scratch (unless you can get massive discounts from your hardware and software vendors). This shouldn't be a surprise, of course: go and try to build a fridge, toaster or car from parts, and you'll find the same. But it really is about more than the cost of acquisition. Unlike my trip to the hardware store, every organization needs to account for the cost of time spend building, and the opportunity cost of the delays in completing the project. Average build time for a DIY Hadoop cluster is around 6 months. And while I know some people can do it faster, that's an average time and some people are slower. Meanwhile, we have a customer who installed, configured and tested their first Hadoop cluster, based on the Big Data Appliance, in just one week. In that recent white paper showing that the BDA is cheaper than DIY, Enterprise Strategy Group touched on some of this: Beyond this, however, ESG’s research shows that most enterprise organizations feel that stakeholders from server, storage, network, security, and other IT infrastructure and operations teams are important to the success of a big data initiative. Thus, despite the hope and hype, Hadoop on a commodity stack of hardware does not always offer a lower cost ride to big data analytics, or at least not as quickly as hoped. Many organizations implementing Hadoop infrastructures based on human expertise working with commodity hardware and open source software may experience unexpected costs, slower speed to market, and unplanned complexity throughout the lifecycle. Accordingly, ESG’s research further shows that more than three-quarters of respondents expect it will take longer than six months to realize significant business value from a new big data initiative. Cost of acquisition is important, but those hidden costs are more significant in the long run. So take a look at that white paper and see how Oracle can help you accelerate time to value in the cloud or in your own datacenter - no trip to the hardware store needed.

I enjoy a trip to the hardware store as much as the next person. I like the feeling of achievement after I've built, painted or repaired something. And I know I've saved money when I compare the cost...

Data Swamps Try Provenance to Clear Things Up

It’s a Data Lake; Not the Data Loch Ness Loch Ness (Loch Nis) is a large, deep, freshwater loch in the Scottish Highlands extending for approximately 23 miles (37 km) southwest of Inverness. It is one of a series of interconnected, murky bodies of water in Scotland; its water visibility is exceptionally low due to a high peat content in the surrounding soil. It would be a wonder if such mysterious conditions did not give raise to Nessie, the Loch Ness monster. It is not a stretch of analogy to say that corporate data swamps and reservoirs can easily evolve into a series of murky interconnectedness, where ungoverned data could lurk that is not only forgotten but could also be dangerous if not secured properly. Line of business managers are rightly afraid to dig too deep for fear of what they might not find, or more importantly what they might find. Business decision makers will always need a brave and enterprising data scientist to organize exploratory tours to scour for mythical data treasures from the depths of the data lake. Data Provenance to the Rescue Data lineage used to be a nice-to-have capability because so much of the data feeding corporate dashboards came from trusted data warehouses. But in the big data era data lineage is a must-have because customers are mashing up company data with third-party data sets. Some of these new combinations incorporate high-quality, vendor-verified data. But others will use data that’s not officially perfect, but good for prototyping. When surprisingly valuable findings come from these opportunistic explorations, managers will look to the lineage to know how much work is required to raise it to production-quality levels. Data provenance, or data lineage, in whatever form, should be a process that is well thought out, organic and flexible (and by flexible picture Elastigirl from The Incredibles). The data is going to expand, take various forms and come at varying speeds. Data provenance should be able to contort all these data, stitching and stringing them all together wherever they are being used and be able to provide clear answers to questions about the data, because that’s is what is going to make data transparent and trustworthy. And the more we can trust the data, the greater the propensity to use data in decision making in the right context. Data Provenance: Technology, Process or Wizardry Data provenance is an all encompassing area. It aims to capture all ways the data is interconnected with each other in an organization. The most common uses of data provenance of course are impact analysis (figuring out downstream effects of a change in data), or data lineage (the traceability and history of a data element). But there are other questions that it can answer. Data visibility is a critical criterion within an organization handling sensitive and customer data. So is a data topography map which can help analyze data usage, performance and bottle necks within the organization. Historic data is crucial to “travel back in time” to scenarios that needs to be recreated to get answers. The trouble is data is not always digitized. Many decisions are made whose only imprint is in the head of the person who made that decision. Oracle, and whoever is grappling with data provenance and governance issues, has to acknowledge and account for that gap in data. This is where Oracle’s suite of technologies offers a distinct advantage. The Oracle Advantage Oracle, through its transparent set of suites helps capture, stream, store, compute and visualize data and information. From its Big Data solutions, through to its Data Integration products, big data appliance and discovery solutions keep track of data diligently. Oracle Metadata Management is built with data provenance in mind. It “harvests” (pulls data from various systems) metadata (data about data) from various systems and provides answers to data provenance questions, enhancing transparency, governance and trust about data within organizations. Big Data projects generally require a variety of technology strung together to meet its business mandate. For example, if using an Oracle Stack (recommended but you could of course use any other trusted technology), data elements that pass through Oracle GoldenGate for data ingestion, Oracle Data Integrator for data transformation and load into Oracle Big Data Appliance is fully captured and stored and surfaced by Oracle Metadata Management to ensure there are no data black holes. Loch Ness makes for great tourist attraction but what we need are the crystal clear Maldivian data lakes.

It’s a Data Lake; Not the Data Loch Ness Loch Ness (Loch Nis) is a large, deep, freshwater loch in the Scottish Highlands extending for approximately 23 miles (37 km) southwest of Inverness. It is one of...

Analytics

BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center

Oracle Big Data + Analytics + Spatial + YesSQL User Community, ---PLEASE Share with OTHERS!!!--- BIWA Summit 2016 – Big Data + Analytics 3-Day User Conference at Oracle HQ Conference Center has a great lineup!   See Schedule at a Glance to show all the BIWA Summit'16 sessions in a tabular agenda. Download BIWA Summit'16 Full Agenda & Sessions See BIWA Summit’16 Talks by Tracks with Abstracts and Speaker Bios   See some representative talks: Advanced Analytics · Enabling Clorox as Data Driven Enterprise · Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments · Improving Predictive Model Development Time with R and Oracle Big Data Discovery · Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.0 Hands on Lab · Stubhub and Oracle Advanced Analytics · Is Oracle SQL the Best Language for Statistics?—Brendan Tierney, Oralytics BI & Visualization · BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres · Data Visualization at Sound Exchange – a Case Study · Business Intelligence Visual Analyzer Cloud Service: View and Analyze Your Data with customer use case · Electoral fraud location in Brazilian General Elections 2014 · See What’s There and What’s Coming with BICS & Data Visualization Cloud Services · Visual Analyzer and Best Practices for Data Discovery Big Data · The Place of SQL in the Hybrid World—Kerry Osborne, Accenture and Tanel Põder, Gluent · Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform · Analytical SQL in the Era of Big Data · How to choose between Hadoop, NoSQL or Oracle Database Cloud · Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud Data Warehousing & SQL · Panel discussion: Making SQL Great Again (SQL is Huuuuuuuuuuuge!)—Andy Mendelsohn, Executive Vice President for Database Server Technologies, Oracle · Taking Full Advantage of the PL/SQL Result Cache—Steven Feuerstein, Oracle · Why Use PL/SQL?—Bryn Llewellyn, Oracle Journal - Editor’s Pick Spatial & Graph · Deploying a Linked Data Service at the Italian National Institute of Statistics · Gain Insight into Your Graph Data -- A hands on lab for Oracle Big Data Spatial and Graph Internet of Things · Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive · Leveraging Oracle Big Data Discovery to Master CERN’s Control Data   BIWA Inc. is an independent user group SIG of the Oracle Independent User Group (IOUG) See everyone at BIWA Summit’16! Charlie  

Oracle Big Data + Analytics + Spatial + YesSQL User Community, ---PLEASE Share with OTHERS!!!--- BIWA Summit 2016 – Big Data + Analytics 3-Day User Conference at Oracle HQ Conference Center has a great...

On Graph Databases and Cats

We did a series of 10 predictions for 2016 around big data. The fourth one was around data virtualization, a key component of which is Oracle Big Data SQL which we’ve blogged about before. But the prediction was a little broader: "Look for a shifting focus from using any single technology, such as NoSQL, Hadoop, relational, spatial or graph…”. You know about NoSQL, relational and Hadoop. You’ve used a map, so can figure out spatial (though there’s much more to it than just a map). But graphs? Graph databases and graph analytics are growing in use. To some they’re a bit like a cat: difficult to understand and not clear what use they are (though people keep them around). But there are some real uses and I want to give a high level overview, so you can see where they might apply in your organization. Hopefully graph will then be a little easier to understand. I bet you’ve already used graph analytics. You just didn’t realize it at the time. But that game Six Degrees of Kevin Bacon is just graph analytics at work. The most important thing is not the actors, or the movies they were in; it’s the relationships between the actors that matter to generating the answer. Running with this example, one way to store data in a graph database is to capture the actors as nodes along with all their relationships to other actors. Finding someone’s Bacon number is just a matter of following those relationships to find the shortest route between two nodes in the database.** And this is a much quicker and easier task using a graph database than other kinds of data management. Relationships between people play a role in many different applications: - Intelligence agencies want phone call metadata to identify potential terrorism suspects based on who talks to who (independent of what’s actually said). - Insurance companies can identify gangs that make multiple fraudulent claims when they spot relationships between people in multiple, apparently unconnected claims, that couldn’t happen by chance - Communications companies identify influencers and customers likely to churn by seeing who is connected to who (phone call metadata again, amongst other things). - Financial companies can identify and track money laundering by “following the money” (if person A sends $5,000 to business B, who sends it to person C, who sends it back to person A, this is much easier to spot with graph analytics) or by uncovering unexpected relationships between people and organizations. - Any organization can spot productive teams (and find the optimal size for a project team) by studying who calls and emails who (again, independent of the content; metadata is very useful). And it’s not just relationships between people. Master data management (this part is a component of this sub-assembly is a component of this product, to give a manufacturing example) is about relationships between things. Finding the shortest route between two points is in part about the relationships between places. Identifying which distributor can deliver all the needed components in the shortest time potentially mixes both things and places. Big data is not just more data; it’s more types of data. As the relationships within a given data set, or between multiple data sets, grow in complexity, graph analytics should be a tool in your toolbox. Oracle offers Spatial and Graph analytics in Oracle Database and on Hadoop. It’s good to have them around. Just like my cat. Graph databases can be hard to understand. Mine likes the dog ** Which brings up a lovely attribute of graph databases. The shortest route between two nodes is often the key thing you care about. Some node pairs have a shortest route of just 1. Others require 2 hops or even longer routes. And the longest shortest route is called the graph diameter. How often can you say “longest shortest” and have it make sense?

We did a series of 10 predictions for 2016 around big data. The fourth one was around data virtualization, a key component of which is Oracle Big Data SQL which we’ve blogged about before. But the...

Enterprise Class Hadoop, the Best Tool for Mining Data

This is a special contributed post by Charles Zedlewski (@zedlewski), VP of Products at Cloudera, about Oracle's and Cloudera's joint work in bringing enterprise-grade Hadoop to the corporate computing environment. Strata + Hadoop World in Singapore is just around the corner and we were reminded about the importance of big data and how it is playing a more critical role in everything we do. Data isn’t just something that corporate executives consume to make business decisions, but rather, it is something that we all consume and it is shaping the customer experience. There are new emerging data sources like IoT data, social media, and machine logs that are increasing the demand to capture and analyze data. Ultimately, this all boils down to what kind of valuable insights can be attained from data that allow people to take action. In a way, we could look at data as the new gold. Gold’s value is determined by market forces primarily based on supply and demand. Data also has market forces - demand to create insight and supply of time to collect and process the data. The better organizations can manage the supply and demand functions of data, or process more valuable data in less time, the better the return on investment. Furthermore, data is similar to gold because it has to be mined; quality data insights only come from a quality mine that uses the right tools. These tools need to have good performance, redundancy, security, and availability. In other words, these tools need to be enterprise class. Over the past seven years Cloudera has been driving the Hadoop ecosystem by creating a more enterprise class solution. Many of the improvements to Hadoop can be seen in how end customers are moving beyond the standard “data lake” model of Hadoop, where customers were simply aggregating data for eventual consumption. With technology like Oracle Big Data Appliance (BDA), analysts and data scientists are actively drawing insights from the combination of data from traditional sources and newer sources. For example, Wargaming.net, a worldwide massively multiplayer online (MMO) gaming company, recently described how they use Oracle BDA in conjunction with the Oracle Database Appliance and Oracle Advanced Analytics to draw insights about game play. With Oracle BDA, Wargaming.net was able to stand up a Cloudera Enterprise cluster quickly, reducing time to value. They could then quickly mine the data in their organization, resulting in increased revenues of 62% in one of their key sales regions. Like Wargaming.net, many companies are looking for agility, but not just in the speed of setting up a Hadoop cluster but also in how quickly they can get real-time insights from their data. They are looking for real-time analytics and Cloudera is leading the industry by adopting Apache Spark as a part of Hadoop. This September, Cloudera announced the One Platform Initiative, which highlights Spark as a key part of the future of Hadoop. The Hadoop ecosystem is a mix of services that go beyond what Spark can offer alone. By becoming more tightly integrated with Hadoop, Cloudera expects Spark to increase the ROI for customers looking for better security, scale, management and streaming. Besides running 100 times faster than MapReduce on certain data workloads, this demonstrates that Hadoop is not purely defined by HDFS and MapReduce. As described by Mike Olson, “Cloudera’s customers use Spark for portfolio valuation, tracking fraud and handling compliance in finance, delivering expert recommendations to doctors in healthcare, analyzing risk more precisely in insurance and more.” Of course, the quality of the tools you use to mine gold have a tremendous effect on how much the mine produces. In the case of big data, Oracle BDA is the leading solution to most effectively mine and manage data. In many cases, Oracle customers are already using one of Oracle’s database management systems to support critical enterprise systems along with a data warehouse to analyze the data for business insights. Yet, now there are more streams of data flowing in from social media feeds, log files, and new internet enabled devices. Data analysts now need to draw valuable insights from variety of separate, combined, and stream data sources to help their organizations gain a competitive advantage. In support of this demand, recent improvements by Cloudera and Oracle take data mining to the next level of enterprise class data management and analytics. Going forward, these improvements will allow data analysts to have even greater confidence in insights gained from their data mine. As you can see, the Hadoop stack has increased its focus on data quality and integrity. We fully expect this innovation to continue to evolve into more enterprise class software offerings that provides additional manageability, agility, security, and governance features. As Hadoop continues to mature, so will the enterprise feature set. Enterprise grade big data solutions are here to stay and are only getting stronger by the day. Oracle BDA with Cloudera Enterprise are leading the way in turning data gold into the real thing.

This is a special contributed post by Charles Zedlewski (@zedlewski), VP of Products at Cloudera, about Oracle's and Cloudera's joint work in bringing enterprise-grade Hadoop to the corporate...

Heading to Strata + Hadoop World Singapore?

Hot off the heels of Oracle OpenWorld San Francisco, and Strata + Hadoop NY, Oracle will be present at Strata + Hadoop World in Singapore. Come along and meet the team and find out more about how Oracle - the biggest data company - is able to make big data effective in the enterprise. You can also learn more about the new Oracle Cloud Platform for Big Data, launched less than a month ago at Oracle OpenWorld. These new capabilities deliver on our promise to enable customers to take advantage of the same technologies on-premises and in the cloud. No wonder our big data business is growing faster than the market as a whole! In fact, VP Product Management, Big Data, Neil Mendelson will be discussing Real-World Success with Big Data, in his session at 4pm on Wednesday December 2nd, during which he will be sharing the best practices and lessons learned from successful big data projects in Asia and around the world. (Location: 333) Or visit us, and experience our innovative demos at booth #102: · Unlocking Value with Oracle Big Data Discovery · Oracle Big Data Preparation Cloud Service—Ready Your Big Data for Use · Best SQL Across Hadoop, NoSQL, and Oracle Database · Graph, Spatial, Predictive, and Statistical Analytics Alternatively, you can hear Oracle’s Hong Eng Koh and Vladimir Videnovic in their session entitled, ‘Don't believe everything you see on CSI: Beyond predictive policing’, which takes place at 4:50pm on December 2. (Location: 331) Sessions you won’t want to miss: Real-World Success with Big Data 4:00pm–4:40pm Wednesday, 2nd December 2015 Location: 333 Neil Mendelson (Oracle) Companies who are successful with big data need to be analytics-driven. During this session, Neil will look at new analytics capabilities that are essential for big data to deliver results, and discuss how to maximize the time you spend providing differentiation for your organization. This session will also cover some common big data use cases in both industry and government. Don't believe everything you see on CSI: Beyond predictive policing 4:50pm–5:30pm Wednesday, 2nd December 2015 Location: 331 Hong Eng Koh (Oracle), Vladimir Videnovic (Oracle) Public safety and national security are increasingly being challenged by technology; the need to use data to detect and investigate criminal activities has increased dramatically. But with the sheer volume of data and noise, law enforcement organisations are struggling to keep up. This session will examine trends and use cases on how big data can be utilised to make the world a safer place.

Hot off the heels of Oracle OpenWorld San Francisco, and Strata + Hadoop NY, Oracle will be present at Strata + Hadoop World in Singapore. Come along and meet the team and find out more about how...

Big Made Great – Big Data from the Biggest Data Company…Oracle

Big Made Great – Big Data from the Biggest Data Company…Oracle At Oracle OpenWorld last year, big data was big news for over 60,000 attendees. From executive keynotes to the Industry Showcases, from product deep dives to customer panels, there was a wealth of information where the discussions centered on how the phenomenon of big data and the datafication of everything is transforming businesses. This year’s big data discussions at Oracle OpenWorld will center on how the biggest data company is able to make big data effective in the enterprise. Perspectives from Oracle product executive and customers will shed light on big data strategies and best practices for approaching your enterprise infrastructure, data management, security and governance and analytics. Here are some sessions you won’t want to miss: Sunday, October 25 5:00 p.m.–7:00 p.m., Moscone North, Hall D Integrated Cloud Applications and Platform Services Keynote featuring Larry Ellison - Executive Chairman of the Board and Chief Technology Officer, Oracle Oracle has more cloud application, platform, and infrastructure services than any other cloud provider—it has the only truly integrated cloud stack. Larry Ellison will announce a broad set of new products and highlight why integrated cloud will deliver the most innovative and cost-effective benefits to customers Transformation and Innovation in the Data Center: Driving Competitive Advantage for the Enterprise Keynote featuring Intel CEO Brianh Krzanich The last few years have witnessed an incredible transformation in the data center—from the build-out of the cloud, to the power of big data, to the proliferation of connected devices. The pace of this transformation continues to accelerate. This transformation provides both incredible new opportunities as well as new challenges to solve. Intel CEO Brian Krzanich, along with some special guests, will explore these opportunities and challenges, share the innovative solutions Intel and our partners are creating to address them, and show how best-in-class organizations are using this transformation to drive a competitive advantage. Monday, October 26 – Thursday, October 29 Big Data Showcase – Moscone South Exhibition Hall Monday, October 26 2:45 pm – Moscone South 103 Exploiting All Your Data for Business Value: Oracle’s Big Data Strategy [GEN7350] General Session featuring: Andy Mendelson - EVP, Database Server Technologies, Oracle Inderjeet Singh – EVP, Fusion Middleware Development, Oracle Luc Ducrocq, SVP - BI&A NA Leader, HCCg Big data in all its variety is now becoming critical to the development of new products, services and business processes. Organizations are looking to exploit all available data to generate tremendous business value. Generating this new value requires the right approach to discover new insights, predict new outcomes and integrate everything securely with existing data, infrastructure, applications and processes. In this session we’ll explain Oracle’s strategy and architecture for big data, and both present and demonstrate the complete solution including analytics, data management, data integration and fast data. Hitachi Consulting will close the session covering three specific use cases where both companies align to deliver high value, high impact big data solutions. 4:00 pm – Moscone West 2020 The Rise of Data Capital [CON10053] Session featuring Paul Sonderegger, Oracle Big Data Strategist Data is now a kind of capital—as vital as financial capital to the development of new products, services, and business processes. This creates a land-grab competition to digitize and “datafy” key activities before rivals do, intensified pressure to bring down the overall cost of managing and using data capital, and a new thirst for algorithms and analytics to increase the return on invested data capital. In this session, learn about competitive strategies to exploit data capital and hear examples of companies already putting these ideas into action. 5:15 pm – Moscone South 102 Big Data and the Next Generation of Oracle Database [CON8738] Session featuring George Lumpkin, Vice President, Product Management, Oracle Oracle’s data platform for big data is Oracle Big Data Management System, which combines the performance of Oracle’s market-leading relational database, the power of Oracle’s SQL engine, and the cost-effective, flexible storage of Hadoop and NoSQL. The result is an integrated architecture for managing big data, providing all of the benefits of Oracle Database, Oracle Exadata, and Hadoop, without the drawbacks of independently accessed data repositories. In this session, learn how today’s data warehouses are evolving into tomorrow’s big data platforms, and how Oracle is continuing to enhance Oracle Big Data Management System through new database features and new capabilities on top of Hadoop. Tuesday, October 27 4:00pm – Moscone South 104 Oracle Big Data SQL: Deep Dive—SQL over Relational, NoSQL, and Hadoop [CON6768] Session featuring Dan Mcclary, Senior Principal Product Manager, Big Data, Oracle Big data promises big benefits to your organization, but real barriers exist in achieving these benefits. Today’s big data projects face serious challenges in terms of skills, application integration, and security. Oracle Big Data SQL radically reduces these barriers to entry by providing unified query and management of data in Oracle Database, Hadoop, and beyond. In this session, learn how Oracle Big Data SQL uses its Smart Scan feature, SQL, and storage indexing technology to make big data feel small. Additionally, learn how to use your existing skills and tools to get the biggest bang out of big data with Oracle Big Data SQL. 6:15 pm – Moscone South 303 Meet the Experts: Oracle’s Big Data Management System [MTE9564] Jean-Pierre Dijcks, Sr. Principal Product Manager, Oracle Martin Gubar, Big Data Product Management, Oracle Dan Mcclary, Senior Principal Product Manager, Big Data, Oracle New transformative capabilities delivered with Oracle Database 12c and Oracle Big Data Appliance will have a dramatic impact on how you design and implement your data warehouse and information systems. You now have the opportunity to analyze all your data across the big data platform—including Oracle Database 12c, Hadoop, and NoSQL sources—using Oracle’s rich SQL dialect and data governance policies. Attend this session to ask the experts about Oracle’s big data management system, including Oracle Big Data SQL, Oracle In-Memory Database, Oracle NoSQL Database, and Oracle Advanced Analytics. 6:15 – Moscone South 304 Meet the Expert: Oracle Data Integration and Governance for Big Data [MTE10023] Alex Kotopoulis, Product Manager, Oracle Oracle Data Integration and governance provide solutions to future-proof your big data technology and use a tool- and metadata-driven approach to realizing your data reservoir. You now have the opportunity to logically design your big data integration as part of your enterprise architecture and execute it using native Hadoop technologies such as Spark, Hive, or Pig. Attend this session to ask the experts about Oracle’s big data integration and governance, including Oracle Data Integrator, Oracle GoldenGate for Big Data, Oracle Big Data Preparation Cloud Service, and Oracle Enterprise Metadata Management. 6:15 pm – Moscone South 306 Meet the Experts: Oracle Spatial and Graph [MTE9565] Jean Ihm, Principal Product Manager, Oracle Spatial and Graph, Oracle Xavier Lopez, Senior Director, Oracle Jayant Sharma, Director, Product Mgmt, Oracle James Steiner, Vice President, Product Management, Oracle This session is for those interested in learning about customer innovations and best practices with Oracle Spatial and Graph and the Oracle Fusion Middleware MapViewer feature. Meet the experts and discuss benefits derived from the extreme performance and advanced analytics capabilities of the platform. Topics include use cases from application areas such as business analytics, mobile tracking, location-based services, interactive web mapping, city modeling, and asset management. 7:15 pm – Moscone South 303 Oracle NoSQL Database: Meet the Experts [MTE9622] Rick George, Senior Principal Product Manager, Oracle Ashok Joshi, Senior Director, Oracle David Rubin, Director of NoSQL Database Development, Oracle NoSQL is a hot, rapidly evolving landscape. Every Oracle NoSQL Database implementation is different. Whether you’re new to NoSQL or an experienced NoSQL application developer, come to this open Q&A session to hear from experienced NoSQL practitioners. The session features senior members of the Oracle NoSQL Database engineering team, field consultants, and customers and partners who have been using the product in production applications. From the theoretical to the practical, this is your opportunity to get your questions answered. Wednesday, October 28 11:00 pm – Park Central – Metropolitan III Introducing Oracle Internet of Things Cloud Service [CON9472] Henrik Stahl, Vice President, Product Management, Oracle Jai Suri, Director, Product Management, Oracle This session introduces Oracle Internet of Things Cloud Service, which is at the heart of Oracle’s Internet of Things (IoT) strategy. In this session, learn how the feature set helps you quickly build IoT applications, connect and manage devices, configure and monitor security policies, manage and analyze massive amounts of data, and integrate with your business processes and applications. See examples of out-of-the-box integration with other Oracle Cloud services and applications, showing the unique value through the end-to-end integration that Oracle provides. 1:15 pm – Moscone South 309 Big Data Security: Implementation Strategies [CON8747] Martin Gubar, Big Data Product Management, Oracle Bruce Nelson, Principal Sale Consultant, Big Data Lead, Oracle Hadoop has long been regarded as an insecure system. However, that is so 2014! A lot has changed in the past year. As Hadoop enters the mainstream, both its security capabilities and Oracle’s ability to secure Hadoop are evolving. Oracle Big Data Appliance facilitates the deployment of secure systems—greatly simplifying the configuration of authentication, authorization, encryption, and auditing—enabling organizations to confidently store sensitive information. This session shares the lessons learned from implementing secure Oracle Big Data Appliance projects for customers and identifies four security levels of increasing sophistication. As a bonus, we describe the roadmap of big data security as we see it playing out. 3:00 pm - Moscone West 2020 Introduction to Oracle Big Data Discovery [CON9101] Chris Lynskey, Vice President, Product Management, Oracle Ryan Stark, Director Product Management, Oracle Oracle Big Data Discovery and Oracle Big Data Discovery Cloud Service enable anyone to turn raw data in Hadoop into actionable insight in minutes. For organizations eager to get more value out of Hadoop, Oracle Big Data Discovery allows business analysts and data scientists to find, explore, transform, and analyze data, then easily share results with the big data community. This intuitive data discovery solution means analysts don’t need to learn complex tools or rely only on scarce resources, and data scientists can spend time on high-value analysis instead of being mired in data preparation. Join us in this session to hear how Oracle Big Data Discovery can help your organization take a huge step forward with big data analytics. 4:15 – Moscone South 301 Customer Panel: Big Data and Data Warehousing [CON8741] Manuel Martin Marquez, Senior Research Fellow and Data Scientist, Cern Organisation Européenne Pour La Recherche Nucléaire Jake Ruttenberg, Senior Manager, Digital Analytics, Starbucks Coffee Company Chris Wones, Chief Enterprise Architect, 8451 Reiner Zimmermann, Senior Director, DW & Big Data Global Leaders Program, Oracle Serdar Özkan, AVEA In this session, hear how customers around the world are solving cutting-edge analytical business problems using Oracle Data Warehouse and big data technology. Understand the benefits of using these technologies together, and how software and hardware combined can save money and increase productivity. Learn how these customers are using Oracle Big Data Appliance, Oracle Exadata, Oracle Exalytics, Oracle Database In-Memory 12c, or Oracle Analytics to drive their business, make the right decisions, and find hidden information. The conversation is wide-ranging, with customer panelists from a variety of industries discussing business benefits, technical architectures, implementation of best practices, and future directions. Thursday, October 29 10:15 am – Marriott Marquis' Converting Big Data into Economic Value Jim Gardner Senior Director, WSJ. Insights, The Wall Street Journal Rich Clayton Vice President, Business Analytics Product Group, Oracle  

Big Made Great – Big Data from the Biggest Data Company…Oracle At Oracle OpenWorld last year, big data was big news for over 60,000 attendees. From executive keynotes to the Industry Showcases, from...

Biggest Data Company @Strata+Hadoop NY

Strata+Hadoop World September 29–October 1 at the Javits Center in New York is fast approaching and you will not want to miss learning about Big Data Cloud Services from the Biggest Data Company! Hear how to "Simplify Big Data with Platform, Discovery and Data Preparation from the Cloud" from Oracle VPs Product Management, Jeff Pollock and Chris Lynskey on Thursday, October 1st at 1:15 p.m., in Room 3D 06/07. Experience innovative demos at booth #123: · Unlocking Value with Oracle Big Data Discovery · Oracle Big Data Preparation Cloud Service—Ready Your Big Data for Use · Best SQL Across Hadoop, NoSQL, and Oracle Database · Graph, Spatial, Predictive, and Statistical AnalyticsHear from the experts and partners in a Mini-Theater at the booth. Wednesday, September 30 Time Session and Speakers 11:00 a.m. Big Data Graph Analytics for the Enterprise Melli Annamalai, Senior Principal Product Manager, Oracle 11:30 a.m. Is IT Operations a Big Data Problem? Tom Yates, Product Marketing, Rocana 1:30 p.m. Transparently Bursting into Cloud with Hadoop, Workload Brett Redenstein, Director, Product Management, WANdisco 2:00 p.m. Unlocking the Insights in Big Data Prabha Ganapathy, Big Data Strategist, Intel 2:30 p.m. Achieving True Impact with Actionable Customer Insights on Oracle BDA and BDD and Lily Enterprise Steven Noels, CTO, NGDATA 3:00 p.m. Scaling R to Big Data Mark Hornick, Director, Oracle Advanced Analytics, Oracle 3:30 p.m. Big Data Preparation – Avoiding the 80% “Hidden Cost” of Big Data Luis Rivas, Director, Product Management, Oracle Thursday, October 1 3:00 p.m. SQL Performance Innovations for Big Data JP Dijcks, Senior Principal Product Manager, Big Data, Oracle Finally, you can also visit us in the Cloudera Partner Pavilion - K8 and at oracle.com/big-data.

Strata+Hadoop World September 29–October 1 at the Javits Center in New York is fast approaching and you will not want to miss learning about Big Data Cloud Services from the Biggest Data Company! Hear...

Rocana partners with Oracle, supports Oracle Big Data Appliance

Guest Blog Author: Eric Sammer, CTO and Founder at Rocana I’m very excited to announce our partnership with Oracle. We’ve been spending months optimizing and certifying Rocana Ops for the Oracle Big Data Appliance (BDA) and will be releasing certified support for Oracle’s Big Data software. Ever since we worked with the Oracle Big Data team in the early days of the Big Data Appliance Engineered System while still at Cloudera, we’ve had a vision of the power of a pre-packaged application backed by the BDA. Today, our customers have access to an optimized all in one Big Data system with which they can run Rocana Ops to control their global infrastructure. The Oracle BDA is a platform that helps our customers realize their vision of monitoring everything in one place. During our certification we saw the incredible power of the BDA. On a half rack (9 node) system we were clocking 40,000 events per second with subsecond latency to query time. One of our large enterprise customers can monitor an entire application infrastructure with 500 servers using a half-rack Oracle BDA. Fully loaded, these half racks can retain 500 billion events, which amounts to 4.5 months of data retention in just half a rack of floor space. This means that each full rack of of a BDA can monitor one thousand nodes of a high traffic website as well as all of the network status, database events and metrics for the entire system. A fully loaded BDA system of 20 racks could monitor an entire data center with 10,000 machines and keep full detailed historical data for 6 months. This kind of power was unheard of just a few years ago. The raw power and simplicity of the BDA is a boon to our customers, and even more so is the extensibility of Oracle’s Big Data offerings. With Oracle Big Data SQL, Oracle Exadata customers can access Rocana data transparently through their existing Exadata connected applications. Analysts can use the Big Data Discovery software to explore data within Rocana and perform in depth analysis on data center behavior. Certification for these integrations is coming soon and we look forward to enabling these capabilities for our customers. As an Oracle ExaStack Optimized partner, we’re thrilled to be able to offer certified support for Rocana Ops on the Oracle BDA. For both Oracle and Rocana customers, this combination sets a new bar for what’s possible in global data center monitoring. If you’re interested in learning more about Rocana Ops on the Oracle BDA, contact sales@rocana.com. Rocana will also be talking about IT Operations being a Big Data Problem at the Oracle mini theater at Strata Hadoop World in New York (September 29 - October 1) as well as at the Oracle Open World (October 25 - 29). Eric (@esammer) is a co-founder of Rocana and serves as its CTO. As CTO, Eric is responsible for Rocana’s engineering and product. Eric is the author of Hadoop Operations published by O’Reilly Media, and speaks frequently on technology and techniques for large scale data processing, integration, and system management.

Guest Blog Author: Eric Sammer, CTO and Founder at Rocana I’m very excited to announce our partnership with Oracle. We’ve been spending months optimizing and certifying Rocana Ops for the Oracle Big...

Oracle Big Data Discovery v1.1 is Ready

Oracle Big Data Discovery version 1.1 includes major new functionality as well as hundreds of enhancements, providing new value for customers and addressing key feedback from the 1.0 release. Highlights and benefits include: More Hadoop: Big Data Discovery now runs on Hortonworks HDP 2.2.4+, in addition to Cloudera CDH 5.3+. That makes BDD the first Spark-based big data analytics product to run on the top two Hadoop distributions, significantly expanding the user community. In addition, the changes that enable BDD to run on Hortonworks also make it easier to port to other distributions, paving the way for even broader community in the future. For Cloudera CDH, customers have the option to run BDD 1.1 on the Oracle Big Data Appliance as part of an Engineered System or commodity hardware; for Hortonworks HDP, customers can run BDD on commodity hardware. More data: Customers can now access enterprise data sources via JDBC, making it easy to mash up trusted corporate data with big data in Hadoop. BDD 1.1 elegantly handles changes across all this data, enabling full refreshes, incremental updates, and easy sample expansions. All data is live, which means changes are reflected automatically in visualizations, transformations, enrichments, and joins. BDD 1.1 includes all Oracle Endeca Information Discovery functionality and more. More visual: Dynamic visualizations fuel discovery – but no product can include every visualization out-of-the-box. This release includes a custom visualization framework that allows customers and partners to create and publish any visual and have it behave like it’s native to BDD. Combined with new visualizations and simpler configuration, this streamlines the creation of discovery dashboards and rich, reusable discovery applications. More wrangling: Big Data Discovery is unique in allowing customers to find, explore, transform, and analyze big data all within a single product. This release significantly extends BDD Transform, making it both easier and more powerful. New UIs make it easy to derive structure from messy Hadoop sources, guiding users through common functions, like extracting entities and locations, without writing code. Transformation scripts can be shared and published, driving collaboration, and scripts can be scheduled, automating enrichment. Transform also includes a redesigned custom transformation experience and the ability to call external functions (such as R scripts), providing increased support for sophisticated users. Together with an enhanced architecture that makes committing transformations much faster, these capabilities greatly accelerate data wrangling. More security: Secure data and analytics are a hot topic in the big data community. BDD 1.1 addresses this need by supporting Kerberos for authentication (both MIT and Microsoft versions); enabling authorization via Studio (including integration with LDAP) to support Single Sign-on (SSO); and providing security at both project and dataset levels. These options allow customers to leverage their existing security and extend fine-grained control to big data analytics, ensuring people see exactly what they should. More virtual: BDD 1.1 joins the many of the key big data technologies that part of Oracle's big data platform in the Oracle Big Data Lite Virtual Machine for testing and educational purposes. Learn more at oracle.com/big-data.

Oracle Big Data Discovery version 1.1 includes major new functionality as well as hundreds of enhancements, providing new value for customers and addressing key feedback from the 1.0...

Evolution of Your Information Architecture

A Little Background Information quality is the single most important benefit of an information architecture. If information cannot be trusted, then it is useless. If untrusted information is part of an operational process, then the process is flawed and must be mitigated. If untrusted information is part of an analytical process, then the decisions will be wrong. Architects work hard to create a trustworthy architecture. Furthermore, most architects would agree that regardless of data source, data type, and the data itself, data quality is enhanced by having standardized, auditable processes and a supporting architecture. In the strictest enterprise sense, it is more accurate to say that an information architecture needs to manage ALL data – not just one subset of data. Big Data is not an exception to this core principle. The processing challenges for large, real-time, and differing data sets (aka Volume, Velocity, and Variety) do not diminish the need to ensure trustworthiness. The key task in Big Data is to discover, ‘the value in there somewhere.’ But we cannot expect to find value before the data can be trusted. The risk is that treated separately, Big Data can easily add to the complexity of a corporate IT environment as it continues to evolve through frequent open source contributions, expanding cloud services, and true innovation in analytic strategies. Oracle’s perspective is that Big Data is not an island. Nearly every use case ultimately blends new data and data pipelines with old data and tools, and you end up with an integration, orchestration, transformation project. Therefore, the more streamlined approach is to think of Big Data as merely the latest aspect of an integrated enterprise-class information management capability. It is also important to adopt an enterprise architecture approach to navigate your way to the safest and most successful future state. By taking an enterprise architecture approach, both technology and non-technology decisions can be made ensuring business alignment, a value centric roadmap, and ongoing governance. Learn more about Oracle’s EA approach here. A New White Paper So, in thinking about coordinated, pragmatic enterprise approaches to Big Data, Oracle commissioned IDC to do a study that illustrates how Oracle customers are approaching Big Data in the context of their existing and planned larger enterprise information architectures. The study was led by Dan Vesset, head of business analytics and big data research at IDC, who authored the paper, titled Six Patterns of Big Data and Analytics Adoption: The Importance of the Information Architecture, and you can get it here. Highlights - Three Excerpts from the Paper Patterns of Adoption The paper explores six Big Data use cases across industries that illustrate various architectural approaches for modernizing their information management platforms. The use cases differ in terms of goals, approaches, and outcomes, but they are united in that each company highlighted has a Big Data strategy based on clear business objectives and an information technology architecture that allows it to stay focused on moving from that strategy to execution. Case Industry Project Motivation Scope 1 Banking Transformational modernization Transform core business processes to improve decision-making agility and transform and modernize supporting information architecture and technology. 2 Retail Agility and resiliency Develop a two-layer architecture that includes a business process–neutral canonical data model and a separate layer that allows agile addition of any type of business interpretation or optimization. 3 Investment Banking Complementary expansion Complement the existing relational data warehouse with a Hadoop-based data store to address a near-real-time financial consolidation and risk assessment. 4 Travel Targeted enablement Improve a personalized sales process by deploying a specific, targeted solution based on real-time decision management while ensuring minimal impact on the rest of the information architecture. 5 Consumer Packaged Goods Optimized exploration Enable the ingestion, integration, exploration, and discovery of structured, semi-structured, and unstructured data coupled with advanced analytic techniques to better understand the buying patterns and profiles of customers. 6 Higher Education Vision development Guarantee architectural readiness for new requirements that would ensure a much higher satisfaction level from end users as they seek to leverage new data and new analytics to improve decision making. Copyright IDC, 2015 Oracle in the Big Data Market Oracle offers a range of Big Data technology components and solutions that its customers are using to address their Big Data needs. In addition, the company offers Big Data architecture design and other professional services that can assist organizations on their path to addressing evolving Big Data needs. The following figure shows Oracle’s Big Data Platform aligned with IDC’s conceptual architecture model. Copyright IDC, 2015 Lessons Learned Henry David Thoreau said, "If you have built castles in the air, your work need not be lost; that's where they should be. Now put the foundations under them." The information foundation and architecture on which it is based is a key building block of these capabilities. In conducting IDC's research through interviews and surveys with customers highlighted in this white paper and others, we have found the following best practices related to the information architecture for successful Big Data initiatives: Secure executive sponsorship that emphasizes the strategic importance of the information architecture and ensure that the information architecture is driven by business goals. Develop the information architecture in the context of the business architecture, application architecture, and technology architecture — they are all related. Create an architecture board with representation from the IT, analytics, and business groups, with authority to govern and monitor progress and to participate in change management efforts. Design a logical architecture distinct from the physical architecture to protect the organization from frequent changes in many of the emerging technologies. This enables the organization to maintain a stable logical architecture in the face of a changing physical architecture. Consider the range of big use cases and end-user requirements of Big Data. Big Data is not only about exploration of large volumes of log data by data scientists. Even at the early stages of a project when evaluating technologies, always consider the full range of functional and nonfunctional requirements that will most likely be required in any eventual deployment. Bolting them on later will drive costs and delays and may require a technology reevaluation. This is yet another reason why an architecture-led approach is important. Oracle also has a variety of business and technical approaches to discussing Big Data and Information Architecture. Here are a few: Big Data Reference Architecture Information Architecture Reference Architecture 12 Industry-specific Guides for Big Data Business Opportunities Oracle Big Data Products

A Little Background Information quality is the single most important benefit of an information architecture. If information cannot be trusted, then it is useless. If untrusted information is part of an...

Utilities Are Getting Smarter Using Big Data, from WSJ. Custom Studios

The final industry-specific research from WSJ. Custom Studios on how senior executives plan to empower their organizations with big data is about utilities. Utilities gather huge amounts of operational and customer data but struggle to apply data to solve critical business problems. Utilities grapple with translating the vast amount of performance data from power plants, transmission lines, and thermostats that are always on, sensing fluctuations in power, temperature, and usage. Many utilities rely on industry-specific applications more than others yet rank these apps as less business-critical; 83 percent plan to grow their business analyst team and 64 percent contract with vendors to host and manage their business-critical data, second only to financial services. Utilities that are on the cutting edge of developing such cultures are applying analytics to new uses and new kinds of data, especially with smart metering. Take the issue of unpredictable energy demand: Shifts in customer behavior, such as the growing use of plug-in electric vehicles, have made it increasingly difficult for utilities to predict how much energy will be needed by using traditional forecasting techniques. A sudden, unexpected bump in demand can cause blackouts or require a utility to acquire energy at a high cost. Kansas City Power & Light (KCP&L) is an example of a regulated electric utility successfully partnering with Oracle to implement its advanced metering initiative spanning network management, outage analytics, smart grid gateway, and meter data management. KCP&L is now able to bring its reliable, smart-grid operations and business processes together to enable storm-proven outage management and key distribution management functions, which in turn help to improve operational excellence and pave a streamlined pathway for critical customer communications. Additionally, KCP&L gains a wealth of useful, granular insight from Oracle on various aspects of energy production and distribution—such as field-crew performance related to short- and long-cycle projects, that establishes a decision-support system to improve business performance and reduce operational costs. To learn more about how to convert your big data as a utility, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.      

The final industry-specific research from WSJ. Custom Studios on how senior executives plan to empower their organizations with big data is about utilities. Utilitiesgather huge amounts of...

Telecommunications: Timing Is Everything, from WSJ. Custom Studios

Following up on the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now call on telecommunications companies using big data to make customers happier—and save lives. Do you receive advertising offers that know where you live and shop? Wonder how they know who and where you are? The answer may well be your mobile, Internet, cable, and landline service providers. Telecommunications companies collect more information about customers than almost any other industry: where people relocate, who they chat with, what they look at online. And telecom executives are eager to draw more intelligence from their great reservoirs of data—theirs is one of the top industries, making it an utmost priority. Telecom companies are relatively more data-driven than the other five industries in the research, distinguishing them as highly effectively in data management. Drawing intelligence from data and expanding analytics skills are top priorities. Much of telecom’s big-data focus today is on location-based services. For instance, telecom operators can capture a customer’s location as he or she enters a certain area (called “geo-fencing”) and create targeted promotions. Telecoms are investing in optimizing subscriber and network information processing to better understand subscriber behavior, improve subscriber retention, and increase cross-selling of mobile communications products and services. Necessary for optimizing the telecom information is strengthening analytics and reporting capabilities for better insight into customer preferences—to improve marketing campaigns and new service offerings from leisure entertainment to vital telemedicine. Customer-care agents must be enabled to respond to customer queries regarding network faults, connection errors, and charging and service options in as close to real time as possible. One successful organization partnering with Oracle to tackle big data is Globacom. It is saving inordinate call-processing minutes daily to improve decision-making and customer service, as well as providing vital telemedicine in remote Middle Eastern and African areas. Globacom’s COO Jameel Mohammed states, “Oracle Big Data Appliance enables us to capture, store, and analyze data that we were unable to access before. We now analyze network events forty times faster; create real-time, targeted marketing messages for data-plan users; and save 13 million call-center minutes per year by providing first-call resolution to more and more customers. There is no other platform that offers such performance.” To learn more about how to convert your data into value for telecom service providers, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.

Following up on the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now call...

Retailers Get Personal: Improving the Customer Experience, from WSJ. Custom Studios

Following up on the global summary research of WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now journey into how This refers to customers using more than one channel to buy goods, such as purchasing an item online then picking it up in a brick-and mortar store, browsing or sharing sentiment on social media, emailing and texting from mobile phones, and calling customer service from home. These complex, omnichannel shopping models are driving retailers to embrace more scalable and robust analytics and IT platforms that can capture web activity logs and transactional records in stores. Responding to customers who are both mobile and social in real time is no small challenge. Retailers have long gathered customer data tied to loyalty cards, the majority of which show what items customers previously purchased, as well as demographic data. The customer data illustrates past buying patterns, but might not be indicative of future demand. Utilizing additional Hadoop data such as Internet search, clickstream, mobile location-based services, weather, and social media sources can help retailers gain a better understanding of future customer demand, as well as a better view of the customer, and his or her family and network buying patterns. The savvy use of predictive analytics and next-best offers also have the potential to please customers and provide them with a better experience: a winning strategy in an increasingly competitive and demanding marketplace. Consumer science company dunnhumby provides an example of delivering big data solution to retailers. Watch as dunnhumby’s Director Denise Day tells how an Oracle big data platform helps their analysts no longer be confined to only sample data sizes and relieved of the inefficiencies in searching for collected data. Now analysts can view 100 percent of the data, drill down for anomalies, and understand individual behavior at the detail level. “Our analysts don’t have to learn new coding languages, they use the same single SQL language and it will bring back results. It doesn’t matter where the data is stored for analysts as long as they can get the answers that they need,” she says. To learn more about how to convert your data into value for retail, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.

Following up on the global summary research of WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now journey into how This...

Manufacturing and the Search for More Intelligence, from WSJ. Custom Studios

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives identify the economic potential they view in harnessing big data, we will now explore how the manufacturing industry manages and analyzes big data to improve processes and supply chains. More than any other industry, executives in manufacturing put a high priority on drawing intelligence from their big-data stores. Factories are becoming more automated and smarter, allowing machines to “talk” to another and quickly exchange the data necessary to improve the manufacturing process, reduce labor cost, and speed production. Operations managers use advanced analytics to mine historical process data, identifying patterns and relationships among discrete process steps and inputs. This allows manufacturers to optimize the factors that prove to have the greatest effect on yield. Making better use of dataallows manufacturers to understand and address the cross-functional drivers of cost, such as warehouses being incentivized to keep stockouts down while production lines are incentivized to reduce costs. For example, Riverbed Technology improved low test yields by empowering engineers to proactively discover root cause analysis with a greater variety and range of detailed data and data sets faster and better, spending less time on inefficient data techniques. Watch Riverbed Director Keith Lavery, who says, “The business value that’s been received is around getting the product out the door faster and being able to reduce the manpower in testing the applications prior to the leaving the factory floor.” Using data more effectively can identify instances where companies are working at cross-purposes, such as consumer goods manufacturer Proctor & Gamble (P&G). To better understand the myriad of P&G brand performance and market conditions, P&G needed to clearly and easily understand its rapidly growing and vast amounts of data across regions and business units. The company integrated structured and unstructured data across research and development, supply chain, customer-facing operations, and customer interactions, both from traditional data sources and new sources of online data. P&G Associate Director Terry McFadden explains, “With Oracle Big Data Appliance, we can use the powerful Hadoop ecosystem to analyze new data sources with existing data to drive profound insight that has real value for the business.” To learn more about how to convert your data into value for manufacturing, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.  

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives identify the economic potential they view in harnessing big data, we will now explore...

Health Care Looks to Unlock the Value of Data, from WSJ. Custom Studios

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now examine how personalized medicine in the health care industry hinges on analytics. As an industry, health care faces numerous challenges, and health care companies see data and analytics as a path to resolving them—from improving clinical practices to increasing business efficiencies. Health care providers are facing a growing need to manage costs and understand patients more holistically to improve the quality of care and patient outcomes. In general, the industry’s desire is to move towards evidence-based medicine as opposed to trial-and-error approaches. In order to meet these goals, organizations are analyzing and managing vast volumes of clinical, genomic, financial, and operational data while rapidly evolving the information architecture. Health information organizations have long gathered information about patient encounters, but only in the last few years has much of this information entered the digitized world in the form of EMRs (electronic medical records), wearable devices, smartphones, and social media adoption, which allow quick access to data on a near real-time basis. The data can also help expose different treatments and their associated outcomes. Even though clinical research data is fueling an initial set of analytics platforms, provider organizations are looking beyond clinical information alone to provide superior care while reducing cost. An example of one Oracle customer innovating with big data to improve patient outcomes is the University of Pittsburgh Medical Center. Watch UPMC’s Vice President Lisa Khorey, who states, “With Oracle’s Exadata, Advanced Analytics, and purpose-built applications, we have a high performance platform that can personalize treatment and improve health care outcomes.” Vice President John Houston also speaks to how health care is transforming around mobile, privacy, and regulations, and how Oracle’s cloud platform supports their health system. To learn more about how to convert your data into value for health care payers and life sciences manufacturers, you can download the enterprise architect’s guide here and here, respectively, and visit oracle.com/big-data for more information.

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now...

Converting Big Data into Economic Value from WSJ. Custom Studios

Data is now a kind of capital, on par with financial capital for creating new products, services, and business models. The implications for how companies capture, keep, and use data are far greater than the spread of fact-based decision-making through better analytics. In some cases, data capital substitutes for traditional capital and explains most of the market valuation premium enjoyed by digitized companies. But most companies don’t think they’re ready for big data. Oracle recently commissioned Wall Street Journal Custom Studios and the private research think tank Ipsos to conduct an online survey of over 700 senior executives along with interviews with subject matter experts to understand their biggest opportunities, challenges, and areas of investment with big data. You can read the global summary of the research, “Data Mastery: The Global Driver of Revenue,” and snapshot here. The key findings are that garnering insights in the new world of big data is a top three priority for 86 percent of all respondents—and the number one priority for a third of the participants. In addition, 81 percent plan to expand their business analyst staff. The bottom line is 98 percent of executives who responded believe they are losing an average of 16 percent of annual revenue as a result of not effectively managing and leveraging business information—information that is available on an unprecedented scale and rate from a variety of cloud, mobile, social, and sensor technology devices and platforms. The newer information generated on searches, clickstreams, sentiment, and performance by people and things—in combination with customer and operations data that businesses have traditionally managed—has enormous potential for business value and customer experiences. Stay tuned for more of the industry findings from the research in this blog series that will also feature Oracle customer-success stories and architecture guides to help you convert big data into economic value. Learn more now at oracle.com/big-data.  

Data is now a kind of capital, on par with financial capital for creating new products, services, and business models. The implications for how companies capture, keep, and use data are far greater...

Big Data and the Future of Privacy - paper review (Part 3 of 3)

This is part 3 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a data-driven future and suggest some of the legal and ethical principles that need to be built into that future. Authors Richards and King suggest a three-pronged approach to protecting privacy as information rules: regulation soft regulation big data ethics New regulation will require new laws and practitioners of big data can seek to influence those laws but ultimately only maintain awareness and adherence to those laws and regulations. Soft regulation occurs when governmental regulatory agencies apply existing laws in new ways, such as the Federal Trade Commission is doing as described in a previous post. It also occurs when entities in one country must comply with the regulatory authority of another country to do business there. Again this is still a matter of law and compliance. The authors argue that the third prong, big data ethics, will be the future of data privacy to a large extent because ethics do not require legislation or complex legal doctrine. "By being embraced into the professional ethos of the data science profession, they [ethical rules] can exert a regulatory effect at a granular level that the law can only dream of." As those that best understand the capabilities of the technology, we must play a key role in establishing a culture of ethics around its use. The consequences of not doing so are public backlash and ultimately stricter regulation. Links to Part 1 and Part 2

This is part 3 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a...

Big Data and the Future of Privacy - paper review (Part 2 of 3)

This is part 2 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a data-driven future and suggest some of the legal and ethical principles that need to be built into that future. Authors Richards and King identify four values that privacy rules should protect that I will summarize here from my own perspective. Identity Identities are more than government ID numbers and credit car accounts. Social Security numbers and credit cards can be stolen, and while inconvenient and even financially damaging, loss of those doesn't change who we are. However when companies use data to learn more and more about us without limit, they can cross the boundaries we erect to control our own identity. When Amazon and Netflix give you recommendations for books, music, and movies, are they adapting to your identity or are they also influencing it? If targeted marketing becomes so pervasive that we live in bubbles where we only hear messages driven by our big data profiles, then is our self-determination being manipulated? As the authors state, "Privacy of varying sorts - protection from surveillance or interference - is what enables us to define our identities." This raises the question of whether there is an ethical limit to personal data collection and if so, where is that limit? Equality Knowledge is power and data provides knowledge. Knowledge resulting from data collection can be used to influence and even control. Personal data allows sorting of people and sorting is on the spectrum with profiling and discrimination. One possible usage of data-driving sorting is price discrimination. Micro-segmented customer profiles potentially allow companies to charge more to those that are willing to pay more because they can identify that market segment. Another ominous usage of big data is to get around discrimination laws. A lender might never ask your race on a loan application but it might be able figure out your race from other data points that it has access to. We must be careful that usage of big data does not undermine our progress towards equality. Security As pointed out earlier, the sharing of personal data does not necessarily remove an expectation of privacy. Personal privacy requires security by those that hold data in confidence. We provide personal information to our medical providers, banks, and insurance companies but we also expect them to protect that data from disclosures that we don't authorize. Data collectors are obligated to secure the data they posses with multiple layers of protection that guard data from both internal and external attack. Trust Privacy promotes trust. When individuals are confident their information is protected and will not be misused, they are more apt to share. Consider doctor/patient confidentiality and attorney/client privilege. These protections promote trust that enable an effective relationship between the two parties. Conversely, when companies obtain information under one set of rules and then use it in another way by combining it with other data in ways the consumer did not expect, it diminishes trust. Trust is earned through transparency and integrity. See Part 3 for a three-pronged approach to protecting privacy. Link to Part 1

This is part 2 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a...

Announcing Oracle Big Data Spatial and Graph

We recently shipped a new big data product: Oracle Big Data Spatial and Graph. We’ve had spatial and graph analytics as an option for Oracle Database for over a decade. Now we’ve taken that expertise and used it to bring Spatial and Graph analytics to Hadoop and NoSQL. But first, what are spatial and graph analytics? I'll just give a quick summary here. Spatial analytics involves analysis that uses location. For example, Big Data Spatial and Graph can look at datasets that include, say, zip code or postcode information and add or update city, state and country information. It can filter or group customer data from logfiles based on how near one customer is to one another. Graph analytics is more about how things relate to each other. It’s about relative, rather than absolute relationships. So you could use graph analytics to analyze friends of friends in social networks, or build a recommendation engine to recommend products to (related in the network) shoppers. Next question is why move this capability to Hadoop and NoSQL? First, we wanted to support the different kinds of data sets and the different workloads, which included being able to process this data natively on Hadoop and in parallel using MapReduce or in-memory structures. Secondly, our overall big data strategy has always been to minimize data movement, which means doing analysis and processing where the data lies. Oracle Big Data Spatial and Graph is not just suitable for existing Oracle Database customers - if you need spatial or graph analytics on Hadoop this will meet your needs even if you don’t have any other Oracle software. But of course, we’re hoping that existing customers will be as interested in it as Ball Aerospace: "Oracle Spatial and Graph is already a very capable technology. With the explosion of Hadoop environments, the need to spatially-enable workloads has never been greater and Oracle could not have introduced "Oracle Big Data Spatial and Graph" at a better time. This exciting new technology will provide value add to spatial processing and handle very large raster workloads in a Hadoop environment. We look forward to exploring how it helps address the most challenging data processing requirements." - Keith Bingham, Chief Architect and Technologist, Ball Aerospace

We recently shipped a new big data product: Oracle Big Data Spatial and Graph. We’ve had spatial and graph analytics as an option for Oracle Database for over a decade. Now we’ve taken that expertise...

Big Data and the Future of Privacy - paper review (Part 1 of 3)

What is privacy and what does it really mean for big data? Some say that privacy and big data are incompatible. Recall Mark Zuckerberg's comments in 2010 that the rise in social media means that people no longer have an expectation of privacy. I recently read a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors argue the opposite. Their ideas provide more food for thought in the quest for guiding principles on privacy for big data solutions. What do we mean when we talk about data privacy? Can data be private if it is collected and stored by another party? The paper's authors, Neil M. Richards and Jonathan H. King, take on these questions but point out privacy is difficult to define. We often think of private data as being secret or unobserved yet we share information with others with an expectation of privacy and protection. Rather than getting wrapped up in the nuances of a legal definition they suggest that for personal information in digital systems, information exists in intermediate states between public and private and the information should not lose legal protection in those intermediate states. The authors suggest that a practical approach to dealing with data privacy is to focus on the rules that govern that data. See Parts 2 and 3 for an overview of their suggested values and rules for data privacy.

What is privacy and what does it really mean for big data? Some say that privacy and big data are incompatible. Recall Mark Zuckerberg's comments in 2010 that the rise in social media means that...

Oracle Named World Leader in the Decision Management Platform

Decision management platforms are increasingly essential to competitive advantage in the era of big data analytics providing a more comprehensive organizational view from strategy to execution---generating fast and high returns on investment. The decision management market has only recently emerged as platforms of an integrated set of business rules and advanced analytic modeling for more accurate, high-volume decision making. Decision management has evolved into a more collaborative process of business analysts and modeler developers scientifically experimenting in digital channels with dynamic content, imagery, products, and services most common to deliver next best marketing offers. The content and context aware-decisions prompt customer-facing staff with recommendations and websites, emails, mobile apps, as well as supply chain systems are continuously tailored to take more precise, local, and personalized actions. IDC has released its Marketscape for the Worldwide Decision Management Software Platform 2014 Vendor Assessment and named Oracle Real-Time Decisions (RTD) as among first to market and the world leader. · Appealing to the business user or CMO while maintaining the flexibility and control by advanced analytic modelers · High-performance analytics tuned for a single engineered system · Self-learning or machine-learning optimizing capabilities · Integration and portability across deployment options through Java “The returns organizations cited during interviews for this study were impressive — so much so that no single organization would permit publication of the outcomes because the specific decision management solutions were viewed as key to creating a competitive advantage." said Brian McDonough, research manager, Business Analytics Solutions, IDC. "Organizations can feel confident that they can see real and impressive business benefits from any of these solutions today."    

Decision management platforms are increasingly essential to competitive advantage in the era of big data analytics providing a more comprehensive organizational view from strategy to execution---genera...

Big Data Privacy and the Law

In a previous post I discussed a presentation given at Strata+Hadoop. Another one of the Law, Ethics, and Open Data sessions at Strata+Hadoop that I had a chance to attend was by two attorneys, Alysa Z. Hutnik and Lauri Mazzuchetti, from a private law practice talking about Strategies for Avoiding Big Privacy “Don’ts” with Personal Data. I found it very interesting and you can see their slides here. They provided the regulatory perspective on personal data and I must add that lawyers are really good at making you aware of all the ways you can end up in court. Technology is moving so quickly and governmental legislative bodies move so glacially, that regulation will likely always lag behind. That doesn't mean that companies are off the hook when it comes to personal data and privacy regulation. I learned that in the absence of specific legislation, governments will find ways to regulate using existing law. In the US, the Federal Trade Commission has taken up the cause of consumer data privacy consistent with their mission to "protect consumers in the commercial sphere" and, according to the speakers, identified three areas that it focused on in 2014: Big data Mobile Technology Protecting sensitive information The FTC is adding Internet of Things to that list for 2015 with a report released in January titled Internet of Things: Privacy & Security in a Connected World based on a workshop they held in November 2013. In terms of regulating security and privacy, the FTC states in the report that it will "continue to use our existing tools to ensure that IoT companies continue to consider security and privacy issues as they develop new devices." When the FTC refers to its "existing tools", it means enforcement of "...the FTC Act, the FCRA, the health breach notification provisions of the HI-TECH Act, the Children’s Online Privacy Protection Act, and other laws that might apply to the IoT." The report also said that "...staff will recommend that the Commission use its authority to take action against any actors it has reason to believe are in violation of these laws." It's clear that the industry cannot put its head in the sand by overlooking or ignoring privacy concerns. The speakers made a good case for considering the legal implications when working with personal data and they made some recommendations. Think privacy from the start by designing-in privacy and security. Suggested methods include limiting data, de-identifying data, securely storing retained data, restricting access to data, and safely disposing of data that is no longer needed. Empower consumer choice. In apps, give users tools that enable choice, make it easy to find and use those tools, and honor the user's choices. Regularly reassess your data collection practices. Consider your purpose in collecting the data, the retention period, third-party access, and the ability to make a personally identifiable profile of users. Be transparent. Do not hide or misrepresent what data you are collecting and what you are doing with that data. Be open about the third party access to your data, including what happens after termination and/or deletion of user accounts. Platform providers should provide frequent and prominent disclosures using just-in-time principles and also by providing a holistic view of data collection. Also, consumers should be able to easily contact providers and there should be a process for responding to consumer concerns. Providers also need to find ways to effectively educate users about privacy settings. I won't cover all of their recommendations but there are lessons here that we can apply as we build out big data applications.

In a previous post I discussed a presentation given at Strata+Hadoop. Another one of the Law, Ethics, and Open Data sessions at Strata+Hadoop that I had a chance to attend was by two attorneys, Alysa...

Unmask Your Data. The Visual Face of Hadoop (part 5 of 5)

Following on from the previous posts we looked at the fundamental capabilities for an end-to-end big data discovery product. We discussed 1) the ability to 'find' relevant data in Hadoop, 2) the ability to 'explore' a new data set to understand potential and in our post recent post, the capability to 3) transform big data to make it better. We now turn our focus to our final concepts: 4. Discover (blend, analyze and visualize). Once the data is ready we can only then start to really analyze it. The BDD discover page is an area that allows users to build interactive dashboards that expose new patterns in the data by dragging and dropping from a library of visual components onto the page. Anything from a beautiful chart, heat map or word cloud to a pivot table, a raw list of individual records or search results. Users can join and combine with other data sets they find or upload to the data lake to widen perspectives and deepen the analytic value. Dashboards created are fully interactive. Users can filter through the data by selecting any combination of attribute values in any order, they can further refine using powerful keyword search to look for specific terms or combinations of terms in unstructured text. A discovery tool needs to encourage free-form interaction with the data to enable users to ask unanticipated questions and BDD is founded upon this idea. 5. Share (insights and data for enterprise leverage). Big data analytics is a team sport so sharing and collaboration of work is fundamental to the discovery process. To honor this we made, all the Big Data Discovery projects and dashboard pages built by end users are shareable with other users (if granted permission) so they can work together. Users can even share specific analysis within project pages via bookmarks or take visual snapshots of the pages and then put these into galleries and publish to tell stories about the data. But it's not just the analysis that are shareable in BDD, one of the most frequently requested capabilities we heard from customers is that the product 'plays nice' with other tools in the big data ecosystem. Perhaps our data scientist wants to use the data sets we prepared in BDD to build a new predictive model in R or perhaps we want to lock down, secure and share a discovery with thousands of users via the enterprise BI tool. BDD enables this throughout the product by constantly providing the ability to write results back to Hadoop and even automatically register the improved data in Hive so it can instantly be consumed by any other tool that connects to the data lake. These 5 concepts (see previous posts for first 3) are fundamental to Big Data Discovery and this is what we mean by ‘end-to-end’. The ability to quickly find relevant data to start working with and then explore it to evaluate and understand potential. Users can transform and enrich the data, without moving it out of Hadoop, to make it better. Only then are we ready to discover new insights by blending and interacting with the data and finally sharing our results for leverage across the enterprise in terms of people and tools and connect to the big data ecosystem. If you have implemented a data lake or have plans to we hope these ideas resonate with you and compel you to take a deeper look at Oracle Big Data Discovery, the “visual” face of Hadoop.

Following on from the previous posts we looked at the fundamental capabilities for an end-to-end big data discovery product. We discussed 1) the ability to 'find' relevant data in Hadoop, 2) the...

Unmask Your Data. The Visual Face of Hadoop (part 4 of 5)

Following on from the previous post we started to discuss the fundamental capabilities that an end-to-end big data discovery product needs to contain in order to allow anyone (not just highly skilled 'techies') to turn raw data in Hadoop into actionable insight. We discussed 1) the ability to 'find' relevant data in Hadoop and 2) the ability to 'explore' a new data set to understand potential. We now turn our focus to: 3. Transform (to make big data better). We already discussed that data in Hadoop typically isn't ready for analytics because it needs changing in some way first. Perhaps we need to tease out product names or id's buried in some text or replace some missing values, concatenate fields together, turn integers into strings or strings into dates. Maybe we want to infer a geographic hierarchy from a customer address or IP address or a date hierarchy from a single time stamp. The BDD Transform page allows any user to directly change the data in Hadoop without moving it or picking up the phone and calling IT and then waiting for ETL tools to get the data ready for them. Via an Excel-like view of the data, Transform allows users to quickly change data with a simple right-click and preview the results of the transformation before applying the change. For the more sophisticated data wrangler, they can leverage a library of hundreds of typical transforms to implement on the data to get it ready. They can even make the data better and richer by adding new data elements from large text fields based on the results of clever term and named entity extraction algorithms. Any transform or enrichment used can be previewed before applying, but when it is applied BDD leverages the power of a massively scalable open source data processing framework called Apache Spark behind the scenes so the transforms can be applied at scale upon data sets in Hadoop that contain billions and billions of records. All of this complexity is masked away from the user so they can just sit back and wait for the magic to happen. In the next and final post we will discuss the final 2 critical capabilities for an effective big data discovery tool. Until then... please let us know your thoughts!

Following on from the previous post we started to discuss the fundamental capabilities that an end-to-end big data discovery product needs to contain in order to allow anyone (not just highly skilled...

Unmask Your Data. The Visual Face of Hadoop (part 3 of 5)

Based on the challenges outlined in part 2 of this series, what Oracle wanted to provide first and foremost is a single product to address them. Meaning, one product that allows anyone to turn raw data in Hadoop into actionable insight, and fast. We don't want to force customers to learn multiple tools and techniques, then constantly switch between these tools to solve a problem. We want to provide a set of end-to-end capabilities that offers the user everything they need to get the job done. Next, the user interface for the product needed to be intuitive and easy to use, extremely visual and highly compelling. We want our customers to want to use the product. To be excited to use it when they get to work in the morning. We think based on initial reactions we've received that we have achieved this. During a recent meeting with one of the worlds largest insurance companies I was particularly pleased (and slightly amused) to hear one customer remark "you guys are seriously in danger of making Oracle look sexy". Ok, seems like we are headed in the right direction then. So what are the fundamental set of capabilities that lie beneath our compelling user interface? Just how do we allow people to turn "raw data into actionable insight"? What does this entail? To explain this I typically talk about 5 concepts that relate directly to areas within the product: 1. Find (relevant data). When a user logs into the product the first page they are presented with is called Catalog. Right after BDD is installed it indexes and catalogs all of the data sets in Hadoop automatically then continually watches for new data to add to the catalog. In the interface users are then able to quickly find relevant data to start working with using familiar techniques such as keyword search and guided navigation. Just type a word like "log" or "weather" and BDD returns all the data sets that match the keyword as well as summarizing all the associated metadata so the user can further refine and filter through the results using attributes like who added the data, when it was added, how large the data is and what it contains (such as geographic or time based attributes) they can even refine using tags that have been added by other users over time. For any individual data set the user can see how it's been used in other projects and what other data sets it was combined with. We wanted the catalog experience to make finding data to start working with in Hadoop as "easy as shopping online". 2. Explore (to understand potential). Once an interesting data set is found the logical question is "does it have analytical potential?". Even before we start working with a data set we need to help users understand if it's worth investing in it. The Explore page in BDD does just that. It allows a user to walk up to a data set they've never seen before and understand what is in it in terms of its attributes. Without any work at all, they are instantly able to see all the attributes at once, displayed as different visual tiles depending on the data type (number, string, date, geo, etc), quickly see raw statistics about the individual attributes (distribution, max, min, middle values, number of distinct values, missing values and so on) and even start to uncover patterns and relationships between combinations of attributes by placing them together in a scratchpad to quickly visualize initial discoveries. If it is worth investing in a new analytic project, organizations need to be able to understand the potential of the data quickly to avoid months of wasted time and millions of wasted dollars. Stay tuned for part 4 of the series when we will address the remaining 3 concepts for effective big data discovery on raw data in Hadoop.

Based on the challenges outlined in part 2 of this series, what Oracle wanted to provide first and foremost is a single product to address them. Meaning, one product that allows anyone to turn...

Unmask Your Data. The Visual Face of Hadoop (part 2 of 5)

Over the last 18 months or so, while Oracle Big Data Discovery was in early stages of design and development I got to ask lots of customers and prospects some fundamental questions related to why they struggle to get analytic value from Hadoop and after a while common patterns started to emerge that we ultimately used as the basis of the design for Big Data Discovery. So here’s what we learned. 1. Data in Hadoop is not typically ‘ready’ for analytics. The beauty of Hadoop is that you just put raw files into it and worry about how unpack them later on. This is what people mean when they say Hadoop is "schema on read". This is both good and bad. On the one hand it’s easy to capture data, but on the other, it requires more effort to evaluate and understand it later on. There is usually a ton of manual intervention required before the data is ready to be analyzed. Data in Hadoop is typically flowing from new and emerging sources like social media, web logs and mobile devices and more. It is unstructured and raw, not clean, nicely organized and well governed like it is in the data warehouse. 2. Existing tools BI and data discovery tools fall short. We can't blame them because they were never designed for Hadoop. How can tools that speak ‘structured’ query language (SQL) be expected to talk to unstructured data in Hadoop? For example, how do they extract value from the text in a blog post or the notes a physician makes after evaluating a patient? BI tools don't help us find interesting data sets to start working with in the first place. They don't provide profiling capabilities to help us understand the shape, quality and overall potential of data even before we start working with it. And what about when we need to change and enrich the data? We need to bring in IT resources and an ETL tools for that. Sure, BI tools are great at helping us visualize and interact with data, but only when the data is ready… and (as we outlined above) data in Hadoop isn’t usually ready. 3. Emerging tools are point solutions. As a result of the above challenges we have seen a ton of excitement and investment in new Hadoop native tooling offered by various startups, too numerous to mention here. We are tracking tools for cataloging and governing the data lake, profiling tools to help users understand new data sets in Hadoop. Data wrangling tools that enable end-users to change the data directly in Hadoop and a ton of analytic and data visualization products to help expose new insights and patterns. An exciting space for sure, but the problem is that in addition to the fact that these tools are new (and may or may not exist next month), they only cover one or two of the aspects of big data discovery lifecycle. No single product allows us to find data in Hadoop and turn it into actionable insight with any kind of agility. Organizations can't be expected to buy a whole collection of immature and non-integrated tools, then ask their analysts to learn them all and ask IT to figure out how to integrate them all together. Clearly then a fundamentally new approach is required to address the challenges outlined above and that's exactly what we've done with Oracle Big Data Discovery. Over the next 3 posts I will outline the specific capabilities we designed into the product to address the challenges outlined above. As always, I invite your comments and questions related to this exciting topic!

Over the last 18 months or so, while Oracle Big Data Discovery was in early stages of design and development I got to ask lots of customers and prospects some fundamental questions related to why they...

Unmask Your Data. The Visual Face of Hadoop (part 1 of 5)

By now you’ve probably heard a lot of buzz about our exciting big data vision and specifically our new product, Oracle Big Data Discovery (or BDD as it is affectionately abbreviated). It’s a new ‘Hadoop native' visual analytics tool that offers the ability for anyone to turn raw data in Hadoop into actionable insight in minutes, without the need to learn complex tools or rely only on highly specialized resources. For underserved business analysts, eager to get value out of the Hadoop data lake (or reservoir/swamp depending on your mood), finally there is a product providing a full set of capabilities easy enough to use to be truly effective. For the data scientist, already proficient with complex tools and languages but bottlenecked by messy and generally unprepared data, life just got a little easier. But most important of all, business analysts, data scientists and anyone else who is analytical minded, can now work together on problems as a team, on a common platform and this means productivity around big data analytics just took a huge step forward. Before we dig into BDD in more detail let's pause to understand some of the fundamental challenges that we are trying to address with the product. To do this we need to ask some basic questions:   Why has it been so difficult to get analytic value out of data in Hadoop? Why do data scientists seem to be bottlenecked around activities related to evaluation and preparation of data vs. actually discovering new insights? Why are business analysts, effective with BI tools for years, struggling to point these tools at Hadoop? I invite your comments to discuss these important questions related to unmasking data in Hadoop as I’ll be sharing more of the vision and investment Oracle has been making to delivering a fundamentally new approach to big data discovery.

By now you’ve probably heard a lot of buzz about our exciting big data vision and specifically our new product, Oracle Big Data Discovery (or BDD as it is affectionately abbreviated). It’s a new...

Announcing Oracle Data Integrator for Big Data

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level. Increasing Big Data Heterogeneity and Transparency This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include Support for Apache Spark, Support for Apache Pig, and Orchestration using Oozie. Click here for a detailed list of what is new in Oracle Data Integrator (ODI). Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce. Oracle Data Integrator for Big Data Webcast We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data Provides advanced scale and expanded heterogeneity for big data projects Uniquely compliments Hadoop’s strengths to accelerate decision making, and Ensures sub second latency with Oracle GoldenGate for Big Data. Click here to register.

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling...