mardi mai 31, 2016

DataLab in a Box: accelerate your Data Value to stay ahead

Since 2011 and McKinsey analysis launching the “Big Data” phenomenon and my first blog entry on this trend (here), many things evolved. It is no more a “hype”, and depending on which angle you are looking at it, “Big Data” evolution brings new jobs like Chief Data Officer, new enrichment and additional complexity for operations with the continuous evolution of tools revolving around it. Either way, there is no way to stop it if you don’t want to stay behind, or even become irrelevant. Digitalization of the World and Data are removing the previous known barriers to entry. If you are not convince yet, have a look at the valuation of AirBnB compare to AccorHotels. This is call now “Uberization”, and the word even made the French dictionary this year. All thanks to mastering Digitalization technologies and Data to offer the right services at the right time at the right price: “Data is eating the world”, and you should better be a master at it. Just getting back to my previous example about AirBnB and AccorHotels, AccorHotels is not standing still, and is investing heavily to master their Data.

Getting the right tools for the Job

Following McKinsey 2011 prediction, mastering Data starts with Data Scientists… but even here, Data Scientist Job is facing evolution (or revolution). In Jennifer Lewis Priestley's article "Data Science: The Evolution or the Extinction of Statistics?" , not only will you get an interesting view on this evolution, but I also invite you to look into the comment from Andrew Ekstrom , that I would summarized in one sentence: “get the right tools for the job, that can scale to crunch more and more data”.

And tools have evolved also pretty fast, for better enrichment and capabilities, but also bringing more additional complexity to keep up. At the end of the day, what you would like is a DataLab in a box, ready to use, with the right capabilities. Spending time building it, maintaining it, moving from Hadoop MapReduce, to Yarn, to Spark (to name a few), combining it together with NoSQL and SQL sources is complex (check Mark Rittman - Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop presentation for more details). Would not it be nice if you can get all of this ready to use, to be focus on the analysis to get the value of ALL your Data. To take another analogy, just think if you had to build your car before using it, not really convenient. Unfortunately it is often what most of IT tends to do.

DataLab in a Box

Applying the Converged Infrastructure to Big Data, what we call a Big Data Appliance, results in the ability to provide to you a car ready to drive, encompassing not only a scalable Hadoop platform combine with NoSQL, but as well SQL connectors, with a nice and ready to use visual exploration tool (Big Data Discovery) to get the value out of your Data at the hands of Data Scientists. All in all a DataLab in a Box, available both in the Cloud or on your premises. Many of our customers are leveraging it with success (I invite you to have a look here for references and use cases). That’s why Oracle was named leader in The Forrester Wave™ for Big Data Hadoop Optimized Systems, Q2 2016.

Now that you know how to get an Operational DataLab (in a Box), where do you start from here ?

The first steps to get to the Value

The 2 tips to keep in mind from many customers having already found the value in Big Data:

  1. Start with the Data that you already have, by bringing them into your DataLab
  2. Ask the right “SMART” question, top burning issue, for your Business line to be solved

With that, you should be in the right place to accelerate your Data Value to stay ahead.

To go further:

vendredi oct. 23, 2015

Real-Time Financial Risk Management: big data in-memory at scale

If you are working in financial market and are going to attend Oracle Open World in few days, you should take a valuable 1 hour of your time to join the head of R&D of Quartet FS, Antoine Chambille  for one of his sessions at JavaOne. He will be joint by our Java and ISV Engineering experts to explain how you can make Real-Time correlations with Java, to take the right decision.

As a matter of fact, there is a time where scale-out -web search engine like design pattern- doesn't work anymore. And when you need to correlate very large amount of data in real-time, this is becoming a very interesting challenge. This is what has been achieved by Quartet FS with Oracle technologies running Java in-memory at very large scale in real-time. But to know more and ask the burning questions that you should have, on how you can leverage this capabilities in your own context (even outside of Financial Market, like Retail), join Antoine Chambille and Oracle Engineering for the following sessions:

As a first preview, and for those not going to be at JavaOne, you can have a look at the first 22 minutes of last Quartet FS User Group's Technology Keynote, about performance.

jeudi oct. 02, 2014

#OOW14 Digital & SMAC: Social, Mobile, Analytics, Cloud

Tuesday at #OOW14 was again a day with new major announcements made by Thomas Kurian and demonstrated on stage by Larry Ellison around SMAC, which, as was saying Intel's CIO, will make everything smart and connected, and be a massive shift. And this will drive the Digital transformation.

Digital  Transformation

Dr. Didier Bonnet from Capgemini started the day with a very interesting introduction on Digital transformation and what needs to be done to really turn this investment into a true advantage. A true advantage that can be up to +26% of profitability for digital savvy leaders, according to a Capgemini study.

The 4 points to keep in mind being:

  1. Have a vision: identify strategic assets,create a transformating vision, define a clear intent and outcome and keep evolving
  2. Engage: wire with your customers/employees, drive adoption, and scale
  3. Provide governance: avoid duplication, put coherence into the program, prioritize, enable
  4. Have a Technology leadership: to enable the transforming experience leading to transforming operation and business model

Supporting this Digital Transformation guideline, you can understand "how", the SMAC (Social, Mobile, Analytics, Cloud) resonate, and drive the alignment of the Technology leadership enablers.

SMAC and Digital Transformation

Analytic plays a key role in providing insight at every stage. To get there, everybody is looking into creating Data Lake / Data Reservoir, but the question is: "how do you scan your data sources (structured / unstructured) being in Hadoop, NoSQL, or relational Databases and make them available to Business analysts as an information for knowledgeable insights efficiently ?" That's why we have developed and announced Big Data SQL, Big Data Discovery and Big Data Analytics, running on Oracle Engineered Systems (Big Data Appliance, Exadata, and Exalytics), and available on-premises or in Oracle Cloud as a Service.

In few words, Big Data SQL is a way to query all your Data in one single SQL statement from multiple sources, and very efficiently, as we did for Hadoop, what we did for Exadata: decompose the query to process it at the hadoop node level and return only the result to the database for aggregation. This lead to very rapidly scan your Oracle Database, your NoSQL and your Hadoop, and create the illusion that all the data is in one place. But this is only one piece of the puzzle, the other piece is to provide simple tools for non Data scientists to do discovery, for prediction and correlation in Hadoop, the visual face of hadoop with Big Data Discovery and at the Business Intelligence level with Big Data Analytics. Big Data Analytics is also a set of tools which help you explore data in your Databases and Hadoop, very fast thanks to in-memory and easily thanks to a simple user interface.

This other aspect of Digital, is of course Mobility. Here also major announcements were made around mobile, where we will provide a mobile applications development framework that help you to create application that can run anyware, on iOS, Android, mobiles and tablettes, as well as your usual desktop. Write-once, run anywhere. All supported with a mobile cloud service to provide APIs, shape, persitence and analytics around your devices... adding mobile security identity management and secure (corporate) applications container. All in all  true enabler for not only your mobiles & BOYD strategy but also for IoT (Internet of Things).

Last but not least, we also announced yesterday, Documents sharing & Social Network services in the Cloud... As you can see all the Technology leadership to shape your Digital Transformation.

dimanche sept. 29, 2013

#OOW2013: Internet of Things... and Big Data

As promised in my first entry few weeks ago, in preparing Oracle OpenWorld, I am coming back to IoT: Internet of Things... and Big Data. As this was the closing topic develop by Edward Screven, Chris Baker and Deutche Telekom, Dr. Thomas Kiessling. Of course, Big Data and Internet of Things (or M2M - Machine2Machine) have been topics not only covered the last day, but all along the conference, including in JavaOne, with 2 interesting sessions from Gemalto. Gemalto even developed a kit to test your own use cases for M2M. Internet of Things is opening new opportunities but also challenges to overcome to get it right, that at Oracle we classify in 3 categories : Acquire & Transmit, Integrate & Secure, and Analyze & Act.

Acquire & Transmit

Just think of potentially billions of devices that you need to remotely deploy, maintain, update, insure proper transmission of data (the right data at the right time - as your power budget is constrain) and even extend decision making closer to the source. With standards-based Java platform optimized for devices, we are already covering today all those requirements, and are already involved in major Internet of Things projects, like the Smart Grids or Connected Cars.

Integrate & Secure

Of course integrating -securely- all the pieces together is key. As you want it 1) to reliably work with potentially a very large amount of devices and 2) not be compromised by any means. Here again, at the device level, Java is providing the intrinsic security functions that you need, from secure code loading, verification, and execution, confidentiality of data handling, storage, and communication, up to authentication of entities involved in secure operations. And we are driving this secured integration up to the Datacenter, thanks to our comprehensive Identity and Access Management system, up to Data masking, fraud detection, and built-in network security and encryption.

Analyze & Act

Last but not least, is to analyze and correlate those Data and take appropriate actions. This is where M2M and Internet of Things link to Big Data. There are different things that characterize "Big Data" : Volume, Velocity (time & speed), Variety (data format), Value (what is really interesting in those data related to my business), Vizualization (how do I find something in this, of value ?), Veracity (insure that what I will add into my trusted data (DWH...) coming from those new sources is something validated. In M2M, we don't always have Volume, but we still have the other "Vs" to take care. To handle all this IoT generated information  inside the Datacenter, and do correlation with existing Data relevant to your customer business (being ERP, Supply Chain, quality tracking of supplier, improving purchasing process, etc...), you may need need tools. That's why Oracle developed the Oracle Big Data Appliance to build an "HPC for Data" grid including Hadoop & NoSQL to capture those IoT data, and Oracle Exalytics/Oracle Endeca Information Discovery, to enable the vizualisation/discovery phase. Once you pass the discovery phase we can act automatically in real time ! on the specific triggers that you will have identified, thanks to Oracle Event Processing solution.


As you see, Oracle Internet of Things platform enables you to quickly develop and deliver, securely, an end-to-end solution.

The end result is a quick time-to-market for an M2M project like the one presented on stage and used live during the conference. This project was develop in 4 weeks, with 6 individuals ! The goal was to control the room capacity and in/out doors live control depending on the participants flow in the room. And as you can see in the architecture diagram we are effectively covering from Java on the device up to Exalytics in the Datacenter.

jeudi oct. 04, 2012

#OOW 2012: Big Data and The Social Revolution

As what was saying Cognizant CSO Malcolm Frank about the "Futur of Work", and how the Business should prepare in the face of the new generation  not only of devices and "internet of things" but also due to their users ("The Millennials"), moving from "consumers" to "prosumers" :  we are at a turning point today which is bringing us to the next IT Architecture Wave. So this is no more just about putting Big Data, Social Networks and Customer Experience (CxM) on top of old existing processes, it is about embracing the next curve, by identifying what processes need to be improve, but also and more importantly what processes are obsolete and need to be get ride of, and new processes put in place. It is about managing both the hierarchical and structured Enterprise and its social connections and influencers inside and outside of the Enterprise. And this does apply everywhere, up to the Utilities and Smart Grids, where it is no more just about delivering (faster) the same old 300 reports that have grown over time with those new technologies but to understand what need to be looked at, in real-time, down to an hand full relevant reports with the KPI relevant to the business. It is about how IT can anticipate the next wave, and is able to answers Business questions, and give those capabilities in real-time right at the hand of the decision makers... This is the turning curve, where IT is really moving from the past decade "Cost Center" to "Value for the Business", as Corporate Stakeholders will be able to touch the value directly at the tip of their fingers.

It is all about making Data Driven Strategic decisions, encompassed and enriched by ALL the Data, and connected to customers/prosumers influencers. This brings to stakeholders the ability to make informed decisions on question like : “What would be the best Olympic Gold winner to represent my automotive brand ?”... in a few clicks and in real-time, based on social media analysis (twitter, Facebook, Google+...) and connections link to my Enterprise data.

A true example demonstrated by Larry Ellison in real-time during his yesterday’s key notes, where “Hardware and Software Engineered to Work Together” is not only about extreme performances but also solutions that Business can touch thanks to well integrated Customer eXperience Management and Social Networking : bringing the capabilities to IT to move to the IT Architecture Next wave.

An example, illustrated also todays in 2 others sessions, that I had the opportunity to attend. The first session bringing the “Internet of Things” in Oil&Gaz into actionable decisions thanks to Complex Event Processing capturing sensors data with the ready to run IT infrastructure leveraging Exalogic for the CEP side, Exadata for the enrich datasets and Exalytics to provide the informed decision interface up to end-user. The second session showing Real Time Decision engine in action for ACCOR hotels, with Eric Wyttynck, VP eCommerce, and his Technical Director Pascal Massenet.

I have to close my post here, as I have to go to run our practical hands-on lab, cooked with Olivier Canonge, Christophe Pauliat and Simon Coter, illustrating in practice the Oracle Infrastructure Private Cloud recently announced last Sunday by Larry, and developed through many examples this morning by John Folwer. John also announced today Solaris 11.1 with a range of network innovation and virtualization at the OS level, as well as many optimizations for applications, like for Oracle RAC, with the introduction of the lock manager inside Solaris Kernel. Last but not least, he introduced Xsigo Datacenter Fabric for highly simplified networks and storage virtualization for your Cloud Infrastructure.

Hoping you will get ready to jump on the next wave, we are here to help...

mercredi janv. 11, 2012

Big Data : opportunité Business et (nouveau) défi pour la DSI ?

Translate in English

Ayant participé à quelques conférences sur ce thème, voici quelques réflexions pour commencer l'année 2012 sur le sujet du moment...

Big Data : Opportunités Business

Comme le souligne une étude de McKinsey (« Big Data: The next frontier for innovation, competition, and productivity » ), la maîtrise des données (dans leur diversité) et la capacité à les analyser à un impact fort sur l’apport que l’informatique (la DSI) peut fournir aux métiers pour trouver de nouveaux axes de compétitivité. Pour ne citer que 2 exemples, McKinsey estime que l'exploitation du Big Data pourrait permettre d'économiser plus de €250 milliards sur l'ensemble du secteur public Européen (identification des fraudes, gestion et mesures de l'efficacité des affectations des subventions et des plans d'investissements, ...). Quant au secteur marchand, la simple utilisation des données de géolocalisation pourrait permettre un surplus global de $600 milliards, opportunité illustrée par Jean-Pierre Dijcks dans son blog : "Understanding a Big Data Implementation and its Components".

Volume, Vélocité, Variété...

Le "Big Data" est souvent caractérisé par ces 3x V :

  • Volume : pour certains, le Big Data commence à partir du seuil pour lequel le volume de données devient difficile à gérer dans une solution de base données relationnelle. Toutefois, les avancées technologiques nous permettent toujours de repousser ce seuil de plus en plus loin sans remettre en cause les standards des DSI (cf: Exadata), et c'est pourquoi, l'aspect volume en tant que tel n'est pas suffisant pour caractériser une approche "Big Data".
  • Vélocité : le Big Data nécessite donc également une notion temporelle forte associée à de gros volumes. C'est à dire, être capable de capturer une masse de données mouvante pour pouvoir soit réagir quasiment en temps réel face à un évènement ou pouvoir le revisiter ultérieurement avec un autre angle de vue.
  • Variété : le Big Data va adresser non seulement les données structurées mais pas seulement. L'objectif essentiel est justement de pouvoir aller trouver de la valeur ajoutée dans l'ensemble des données accessibles à une entreprise. Et à l'heure du numérique, de la dématérialisation, des réseaux sociaux, des fournisseurs de flux de données, du Machine2Machine, de la géolocalisation,... la variété des données accessibles est importante, en perpétuelle évolution (qui sera le prochain Twitter ou Facebook, Google+ ?) et rarement structurée.


...Visualisation et Valeur

A ces 3x V qui caractérisent le "Big Data" de manière générale j'en ajouterai 2 : visualisation et valeur !

Visualisation, car face à ce volume de données, sa variété et sa vélocité, il est primordial de pouvoir se doter des moyens de naviguer au sein de cette masse, pour en tirer (rapidement et simplement) de l'information et de la Valeur, afin de trouver ce que l'on cherche mais aussi,... bénéficier d'un atout intéressant au travers de la diversité des données non structurées couplées aux données structurées de l'entreprise : la sérendipité ou, trouver ce que l'on ne cherchait pas (le propre de beaucoup d'innovations) !

Les opportunités pour le Business se situent évidemment dans les 2 derniers V : savoir visualiser l'information utile pour en tirer une valeur Business ...

(nouveau) Défi pour la DSI

Le défi pour la DSI est dans la chaîne de valeur globale : savoir acquérir et stocker un volume important de données variées et mouvantes, et être capable de fournir les éléments (outils) aux métiers pour en tirer du sens et de la valeur. Afin de traiter ces données (non-structurées), il est nécessaire de mettre en oeuvre des technologies complémentaires aux solutions déjà en place pour gérer les données structurées des entreprises. Ces nouvelles technologies sont initialement issues des centres de R&D des géants de l'internet, qui ont été les premiers à être confrontés à ces masses d'information non-structurées. L'enjeu aujourd'hui est d'amener ces solutions au sein de l'entreprise de manière industrialisée avec à la fois la maîtrise de l'intégration de l'ensemble des composants (matériels et logiciels) et leur support sur les 3 étapes fondamentales que constitue une chaîne de valeur autour du Big Data : Acquérir, Organiser et Distribuer.

  1. Acquérir : une fois les sources de données identifiées (avec les métiers), il faut pouvoir les stocker à moindre coût avec de forte capacité d'évolution (de part la volumétrie concernée et la rapidité de croissance) à des fins d'extraction d'information. Un système de grille de stockage évolutif doit être déployé, à l'instar du modèle Exadata. La référence dans ce domaine pour le stockage en grille de données non-structurées à des fins de traitement est  HDFS (Hadoop Distributed Filesystem), ce système de fichiers étant directement lié aux algorithmes d'extraction permettant d'effectuer l'opération directement là où les données sont stockées.

  2. Organiser : associer un premier niveau d'index {clé,valeur} sur ces données non-structurées avec NoSQL (pour Not Only SQL) . L'intérêt ici, par rapport à un modèle SQL classique étant de pouvoir traiter la variété (modèle non prédéfinie à l'avance), la vélocité et le volume. En effet, la particularité du NoSQL est de traiter les données sur un modèle CRUD (Create, Read, Update, Delete) et non pas ACID (Atomicity, Consistency, Isolation, Durability), avec ses avantages de rapidité (pas besoin de rentrer les données dans un modèle structuré) et ses inconvénients (accepter pour des raisons de capacité d'acquisition de pouvoir être amené à lire des données "périmées", entre autres). Et ensuite pouvoir également extraire de l'information au travers de l'opération MapReduce s'effectuant directement sur la grille de données non-structurées (pour éviter de transporter les données vers des noeuds de traitement).

    L'information ainsi extraite de cette grille de données non-structurées devient une partie du patrimoine de l'entreprise et a toute sa place dans les données structurées et donc fiables et à "haute densité" d'information. C'est pourquoi, l'extraction d'information des données non-structurées nécessite également une passerelle vers l'entrepôt de données de l'entreprise pour enrichir le référentiel. Cette passerelle doit être en mesure d'absorber d'importants volumes d'information dans des temps très courts.

    Ces 2 premières étapes ont été industrialisées aussi bien sur la partie matérielle (grille/cluster de stockage) que logicielle (HDFS, Hadoop MapReduce, NoSQL, Oracle Loader for Hadoop) au sein de l'Engineered System d'Oracle : Oracle Big Data Appliance, le référentiel de données structurées pouvant quant à lui être implémenté au sein d'Exadata.

  3. Distribuer : la dernière étape consiste à rendre disponible l'information aux métiers, et leur permettre d'en tirer la quintessence : Analyser et Visualiser. L'enjeu est de fournir les capacités de faire de l'analyse dynamique sur un gros volume de données (cubes décisionnels) avec la possibilité de visualiser simplement sur plusieurs facettes.

    Un premier niveau d'analyse peut se faire directement sur les données non-structurées au travers du langage R, directement sur le Big Data Appliance.

    L'intérêt réside également dans la vision agrégée au sein du référentiel enrichi suite à l'extraction, directement au travers d'Exadata par exemple... ou via un véritable tableau de bord métier dynamique qui vient s'interfacer au référentiel et permettant d'analyser de très gros volumes directement en mémoire avec des mécanismes de visualisation multi-facettes, pour non seulement trouver ce que l'on cherche mais aussi découvrir ce que l'on ne cherchait pas (retour sur la sérendipité...). Ceci est fait grâce à l'identification (visuelle) d'axes de recherches que les utilisateurs n'avaient pas forcément anticipés au départ.

    Cette dernière étape est industrialisée au travers de la solution Exalytics, illustrée dans la vidéo ci-dessous dans le monde de l'automobile, où vous verrez une démonstration manipulant dynamiquement les données des ventes automobiles mondiales sur une période de 10 ans, soit environ 1 milliard d'enregistrements et 2 TB de données manipulées en mémoire (grâce a des technologies de compression embarquées).

HSM (Hierachical Storage Management) et Big Data

Pour terminer la mise en place de l'éco-système "Big Data" au sein de la DSI, il reste un point fondamental souvent omis : la sécurisation et l'archivage des données non-structurées. L'objectif est de pouvoir archiver/sauvegarder les données non-structurées à des fins de rejeu éventuel, et pour faire face à la croissance des volumes en les stockant sur un support approprié en fonction de leur "fraîcheur".  En effet, une grille de type Hadoop base sa sécurité sur la duplication de la données, mais si une donnée est corrompue, ses copies le sont aussi. En outre, cette grille est là pour permettre un traitement à un instant t (vélocité) sur les données, une fois ce traitement effectué, les données sur la grille sont souvent remplacées par des données plus récentes (voir l'exemple : "⁞Understanding a Big Data Implementation and its Components" qui traite bien du cas d'usage des données liées à un contexte temporel) . Dans certains cas d'usage, il peut être intéressant de pouvoir revisiter des données capturées ultérieurement avec un autre angle d'analyse, ou pour des besoins de vérification, et dans tous les cas pour pouvoir restaurer en cas d'incident de corruption. C'est là où le couplage avec une solution de stockage hiérarchique (HSM) est indispensable pour la capture initiale des données non-structurées et leur archivage à moindre coût face aux volumétries à traiter. C'est ce que nous couvrons au travers de notre solution Storage Archive Manager (SAM), solution d'ailleurs utilisée dans un projet "Big Data" français pour pouvoir archiver 1 PB de données non-structurées.

Pour aller plus loin :


Chief Technologist @Oracle writing on architecture and innovation encompassing IT transformation. You can follow me on twitter @ericbezille


« juin 2016