X

Learn about data lakes, machine learning & more innovations

Recent Posts

Innovation

How Algorithms Can Shape Our Data Future

In 2002, I saw an amazing movie, Minority Report with Tom Cruise. It really made an impression on me—not for the science fiction nature of it, but for the possibilities or the reality of it. I work in machine learning, data mining, and applied math area. I work with a lot of very smart people here at Oracle, we do amazing things. And when I got out of that movie, I thought, "Hmm, that's pretty possible. I don't see that as being so farfetched." The reason that the movie rings so true is Oracle's strategy, because at Oracle, our strategy is to move the algorithm, not the data. And when we do that, things like Minority Report, things like the type of new scenarios and possibilities that are written in Patrick Tucker's book, The Naked Future, all of those things become so much more possible. What happens in a world that anticipates your every move? At Oracle, we've taken a different approach all together. We've said data gets bigger and bigger and bigger each year, and data, at some point, becomes so large that it becomes almost immovable, and it makes no sense to move all the data to some other location to calculate a median, or to do a T test, or to run a decision tree, or a logistic regression, or a neural network, or you name it, whatever. What makes much more sense is to bring the algorithms to the data, and that's what we've done. So, imagine today, you're on your iPhone, and you wake up in the morning. I'm in the Boston area, and I get the local news, and I get some latest update about something that's going on in Boston. And I might get the local news, the local weather, might be my stated interests, my favorite sports teams, the Boston Celtics, the New England Patriots, the Red Sox, that kind of stuff, and I might get some national news updates. Traditional stuff, right? Now, imagine a slightly revised future based on an example in The Naked Future. I wake up and my device tells me, "When you meet your old girlfriend at the coffee shop this morning, act surprised to learn that she's getting married." Huh, that's interesting. So, I meet my old girlfriend at the coffee shop and I say, "Oh, by the way, congratulations on getting married," and she sort of recoils and says, "What do you mean I'm getting married? Who told you that? How did you know?" And he goes, scrambling, I say, "Well, I don't know, I think I saw it on your Facebook post." And she says, "I didn't post anything to anybody anywhere. How did you know? How did you know?" And it becomes kind of confrontational. And so, if you look at the very near future, the possibilities there, you can see how this could have easily happened. You're collecting a lot of different data from different locations, different places, and you have maybe the girlfriend changed her address recently, maybe she's moved in with a boyfriend or moved out of the house into an apartment, or who knows what. Maybe they've recently adopted a dog, maybe they've had a lot of Facebook pictures of the two of them together, maybe there's some tweets of, "I'm so in love," things like that, "Looking forward to spending our life together forever," things like that. And maybe there's an online ring purchase off Amazon or a jewelry store, some sort of purchase of a large ring. All these things are quite possible, so it is very real. So where is this going today? Well, it's still the basics, right, and in the basics, you must have good data, you must have a place to store your data. This is where I think Oracle can play a role here, but it's not just the data, it's the data and the domain knowledge, and that's the most important thing. You need to know the data, you need to know. It's not just your bonus amount this year, it's your bonus amount this year versus last year compared to your peers. It's the rate of change of the number of opioids that you're taking compared to what you used to be doing. It's all these temporal kinds of data and comparative and derive variables that are very specific. It has nothing to do with machine learning algorithms, but they are the most important thing to get you started. So, there are, of course, machine learning algorithms, and Oracle has, fortunately, great libraries of all of these. We have about 30 machine learning algorithms that run in the database. We have about 30 machine learning algorithms that run in Spark and Hadoop. You gotta have the data, you gotta have the domain knowledge with the data. Those kind of go together in my mind, the algorithms. And then what does that generate? That generates models, predictions, and insights, and it makes you feel like that, although that's a little bit science fiction-y. But really, from a more practical point of view, it gives you the ability to hit your customer with the right product at the right time, anticipate things, know what's a healthy outcome, and really have much greater insight into the, I guess, future of your customers. And so, that's all important, but the most important thing is to operationalize this, because if you don't deploy and operationalize your analytical methodology, you just have a list of customers on a piece of paper. You have an interesting report, you have an interesting pie chart, but you need to deploy that, you need to operationalize that. And if you remember what I said in the beginning about how Oracle brings that algorithms to the data, that changes everything. I have recorded my talk about how algorithms fuel these changes, real world examples, and a whole lot more. Click on the video below to view it. If you are interested in how to apply machine learning, algorithms, and Big Data strategies to you own business, visit Oracle Big Data.

In 2002, I saw an amazing movie, Minority Report with Tom Cruise. It really made an impression on me—not for the science fiction nature of it, but for the possibilities or the reality of it. I work in...

Data Lakes

Design Your Data Lake for Maximum Impact

Data lakes are fast becoming valuable tools for businesses that need to organize large volumes of highly diverse data from multiple sources. However, if you are not a data scientist, a data lake may seem more like an ocean that you are bound to drown in. Making a data lake manageable for everyone requires mindful designs that empower users with the appropriate tools. A recent webcast conducted by TDWI and Oracle, entitled "How to Design a Data Lake with Business Impact in Mind," identified the best use cases for using a data lake and then defined how to design one for an enterprise-level business. The presentation recommended keeping data-driven use cases at the forefront, making a data lake a central IT-managed function, blending old and new data, empowering self-service, and establish a sponsor group to manage the company's data lake plan with enough staffing and skills to keep it relevant. "Business want to make more fact-based data but they also want to go deeper into the data they have with analytics," says Philip Russom, a Senior Research Director for Data Management at TDWI. "We see data lakes as a good advantage for companies that want to do this as the data can be repurposed repeatedly for new analytics and use cases." Data lake usage is on the rise, according to TDWI surveys. A 2017 query revealed that nearly a quarter of those businesses questioned (23 percent) have a data lake already in production with another quarter (24 percent) expected to launch in 12 months with only 7 percent admitting they would not jump into a data lake. A significant number (21 percent) said they would be establish a data lake within three years. In the same survey, respondents were asked about the business benefit of deploying a Hadoop-based data lake. Half (49 percent) rated advanced analytics including data mining, statistics, and machine learning the primary use case, followed by data exploration and discovery. The third largest response saw big data source for analytics as the third most likely use case for a data lake. Use cases for data lakes include investigating new data coming from sensors and machines, streaming, and human language text. More complex uses for data lakes include multiplatform data warehouse environments, omnichannel marketing, and digital supply chain. As the best argument for deploying and using a data lake is to be able to blend old and new data together. This is especially helpful for departments like marketing, finance, and governance which require insight from multiple sources old and new. Russom noted multi-module enterprise resource planning, Internet of Things (IoT), insurance claim workflow, and digital healthcare would all be areas that could benefit from data lake deployments. When it comes to design, Russom suggests the following: Create a plan, prioritize use cases, and update as biz evolves Choose data platform(s) that support business requirements Get tools that work with platform and satisfy user requirements Augment your staff with consultants experienced with data lakes Train staff for Hadoop, analytics, lakes, clouds. Start with business use case that a lake can address w/ROI Bruce Edwards, a Cloud Luminary and Information Management Specialist with Oracle, added that the convergence of cloud, big data, and data science have enabled the explosion of data lake deployments. Having a central vendor that not only understands large scale data management the but can integrate existing infrastructures into core data lake components is essential. "What data lake users need is an open, integrated, self-healing, high performance tool," Edwards said. "These elements are all needed to allow businesses to begin their data lake journey. To experience the entire webcast, download the presentation from our website. if you’re ready to start playing around with a data lake, we can offer you a free trial right here.

Data lakes are fast becoming valuable tools for businesses that need to organize large volumes of highly diverse data from multiple sources. However, if you are not a data scientist, a data lake...

Big Data

What Does Data Science Need to Be Successful?

There are certain advances that have revolutionized the tech world – personal computing, cell technology, and cloud computing are just some of them. Now that we have the ability to store massive amounts of data in the cloud and then use it with advanced analytics, we can finally start working towards a machine learning future. Download your free ebook, "Demystifying Machine Learning" It’s time for data science to shine. Here are some stats: 61% of organizations identify machine learning as the most significant data initiative for the next year There is $3.5 to 5.8 trillion in potential annual AI-derived business value across 19 industries Businesses are seeing the potential too. Data science can have great impact in: Building and enhancing products and services Enabling new and more efficient operations and processes Creating new channels and business models But unfortunately, for many businesses much of that is still in the future. Despite making big investments in data science teams, many are still not seeing the value they expected. Why?   Data scientists often face difficulty in working efficiently. There are lengthy waits for resources and data. There’s difficulty collaborating with teammates. And there can be long delays of days or weeks to deploy work. The IT admins face issues too. They often feel a lot of pain because they’re responsible for supporting data science teams. Developers have difficulty with access to usable machine learning. Business execs don’t see the full ROI. And there’s more. A big part of the problem is that data science often happens in silos and isn’t well integrated with rest of the enterprise. There’s a movement to bring technologies, data scientists, and the business together to make enterprise data science truly successful. But to do that, you need a full platform. Here are some questions to think about: What does this platform need? What defines success? What do business execs need to be successful? To tackle enterprise data science successfully, companies need a data science platform that addresses all of these issues. And that’s why Oracle is excited about our recent acquisition of DataScience.com. DataScience.com creates one place for your data science tools, projects, and infrastructure. It organizes work, allowing users to easily access data and computing resources. It also enables users to execute end-to-end model development workflows. Quite simply, it addresses the need to manage data science teams and projects while providing the flexibility to innovate. What does this mean, exactly? It means you can now: Make data science more self-service Launch sessions instantly with self-service access to the compute, data, and packages you need to get to work quickly on any size analysis. Collaborate more efficiently Organize your work via a project-based UI and work together on end-to-end modeling workflows with all of your work backed up by Git. Get more work done faster Leverage the best of open source machine learning frameworks on a platform tightly integrated with high performance Oracle Cloud Infrastructure Now Oracle can integrate big data and data science tools all in one place, with a single self-service interface that makes enterprise data science possible—there are more possibilities than ever now. Companies are scrambling to make machine learning solutions work so they can realize the full potential of it—and with DataScience.com, we’re many steps closer to that machine learning future we all keep hearing about. If you have any other questions or you’d like to see our machine learning software, feel free to contact us. You can also refer back to some of the articles we’ve created on machine learning best practices and challenges concerning that. Or, download your free ebook, "Demystifying Machine Learning."

There are certain advances that have revolutionized the tech world – personal computing, cell technology, and cloud computing are just some of them. Now that we have the ability to store massive...

Data Lab

7 Data Lab Best Practices for Data Science

Perhaps you’re looking for a better way to perform data experimentation and facilitate data discovery. Or to start using machine learning to uncover more innovation opportunities through data. The answer to that, of course, is having a data lab. Data labs make data science and experimenting with new data more possible. Complex analytics like machine learning can put a strain on the service levels of production systems. But having a data lab ensures your data scientists can experiment and run analytics as they need to without putting a strain on systems and facing complaints from other teams. For many, setting up and implementing a data lab is a new project. In fact, you might even be setting up the first data lab ever at your company.   Download your free TDWI report, "Seven Best Practices for Machine Learning on a Data Lake" So how can you ensure that your data lab has the best chance of success? In this article, we lay out seven data lab best practices. Keep in mind, these best practices are designed to get you thinking beyond the nitty-gritty details of architecture and implementation, and more along the lines of widespread support and adoption. Data Lab Best Practice #1: Deliver a Quick Win Better one quick win in two months than three wins after four months. Your data lab is likely a high-visibility, expensive project. People want proof that it’s working and they want it now. So don’t be tempted to just play computer science sandbox. And instead, keep a business goal in mind that aligns with that of a key business stakeholder. You’ll want to show the value of your data lab from both an IT and business perspectives to gain as much support as you can. For IT, demonstrate that you’re minimizing the strain placed on production systems with your lab. For the business, demonstrate easy ways the company can start saving money or maximizing revenue, now. If you don’t have any ideas, meet with the business leader of the unit and brainstorm. Here are some for a jumping off point: How do I design a service to maximize ad revenue? What is the best combination of data that gives me the optimal segment that will be ever more likely to accept mobile offers? What data do I need for this? How do I combine it? Where do I get it? Concentrate on the quick wins, but keep the future improvements and more complicated projects in mind. Data Lab Best Practice #2: Consider Starting with Existing Data Remember there's value in your existing data, even if you've been collecting or cleaning new data at the same time. If you already have clean, labeled data available, consider creating a use case around that so you can get started faster.  Sometimes this might revolve around reorganizing your project scope. Let’s say ideally, your business unit would like a 360-view of the customer for more effective customer promotions. That’s a complicated project that requires a great deal of data. But Britain's National Health Service used existing data to help speed their quick wins. They examined payments, other transactions, and customer complaints as examples of fraud to investigate. Stopping fraud or recovering fraudulent claims is often a good quick win.  Once you have a few of those quick wins under your belt, you can start tackling more complicated projects that require more resources or more kinds of data. But especially in the early stages, it’s important to remember that most businesses won’t care about how complex or innovative your machine learning algorithm is. They want results. And the faster they can get those results, the better. Data Lab Best Practice #3: Try to Have Many (But Not Too Many) Projects in the Pipeline We’ve said you should have a few quick wins. We’ve also said you should start with existing data. And now we’re also saying that you should try to have many projects in the pipeline (but not too many - stay balanced). You should remember that not every data exploration project is going to have viable results that will mean change at the company. And if the idea only demonstrates incremental change, it might not display enough cost savings and ease of implementation for the ideas to gain traction. And even if the idea demonstrates change, it might not display enough cost savings and ease of implementation for that idea to gain traction. It’s not going to be enough for you to only demonstrate the many ways your data lab has identified value. The executive team will want to know how many of those ideas were implemented. That’s why you should have many data projects in the pipeline, to decrease your chances of failure. But try to have some focus too. At the opposite end of the spectrum is chasing after too many ideas and ending up with nothing because you didn’t focus resources. Data Lab Best Practice #4: Keep Your Executive Support Engaged We assume you had some sort of executive support to make it this far. But you will need to keep them engaged. That’s related to the previous point – deliver a few quick wins. But don’t stop there. See which other executives you can get on board. Can you deliver quick wins in another area too? You don’t want to stretch yourself and your resources too thinly. But at the same time, the ideal vision is a company full of executives clamoring for more machine learning projects, with plenty of support for your data lab because it's seen as such a valued part of the company.  To do this, you can deliver sessions on what machine learning can do. Pitch ideas for how you can help other parts of the business expand. Yes, this does entail extra work but if you’re determined to make your data lab a cornerstone of the business, it’s well worth it. Data Lab Best Practice #5: Operationalize Your Data You might be tempted to think that your job ends with finding insights. But that’s not the case. You need to push on your executives and other business leaders to put your findings into place. Take a look at the business units or business leaders that you’re doing work for. Do they praise your findings but never implement them? If so, it’s time to have a serious conversation or it’s time to find a new team to collaborate with.  Think about the actionable reports you can create, or change to existing apps and processes. Your findings could affect the creation of a brand-new service, app, or product. Remember, at the end of the day, it’s not about how many insights get uncovered. What your business cares about is how much money is being saved and how much revenue is being created. It’s best if you can point to actual revenue being generated by your skilled team. Data Lab Best Practice #6: Be Sure You Have a Platform That Scales Keep in mind, the cloud is the perfect place for an initiative like the data lab. You can provision your lab there, store massive amounts of data, and spin up and spin down flexible analytic workloads as needed. The best part of all? You’ll pay for only what you use, which minimizes your cost and risk. In addition to having a platform that scales, you’ll also need the resources and talent to execute. If you don’t, you could potentially have a backlog of big data projects from day one. That brings up to Best Practice #7.  Data Lab Best Practice #7: Support Your Data Scientists A good data scientist is worth his or her weight in gold. Make sure you support our data scientists and set them up for success. Assemble them in talented, diverse teams. Provide them with tools. And make sure that your management tolerates risk. It might take time for your data scientists to find the deep wins that everyone is looking for. So set expectations accordingly, while also ensure that you can find quick easy wins to keep everyone happy. Conclusion There you have it, our seven best practices for implementing a successful data lab. Data science may not be easy, but having a data lab makes it easier—and we hope this article will help you gain success more easily. If you’d like to ask us any further questions, feel free to contact us. Or if you’re ready to experiment with working with your data in the cloud, we offer a free guided trial to build and implement successful data lake.

Perhaps you’re looking for a better way to perform data experimentation and facilitate data discovery. Or to start using machine learning to uncover more innovation opportunities through data. The...

Big Data

What's the Difference Between AI, Machine Learning, and Deep Learning?

AI, machine learning, and deep learning - these terms overlap and are easily confused, so let’s start with some short definitions. AI means getting a computer to mimic human behavior in some way. Machine learning is a subset of AI, and it consists of the techniques that enable computers to figure things out from the data and deliver AI applications. Deep learning, meanwhile, is a subset of machine learning that enables computers to solve more complex problems. Download your free ebook, "Demystifying Machine Learning." Those descriptions are correct, but they are a little concise. So I want to explore each of these areas and provide a little more background. What Is AI? Artificial intelligence as an academic discipline was founded in 1956. The goal then, as now, was to get computers to perform tasks regarded as uniquely human: things that required intelligence. Initially, researchers worked on problems like playing checkers and solving logic problems. If you looked at the output of one of those checkers playing programs you could see some form of “artificial intelligence” behind those moves, particularly when the computer beat you. Early successes caused the first researchers to exhibit almost boundless enthusiasm for the possibilities of AI, matched only by the extent to which they misjudged just how hard some problems were. Artificial intelligence, then, refers to the output of a computer. The computer is doing something intelligent, so it’s exhibiting intelligence that is artificial. The term AI doesn’t say anything about how those problems are solved.  There are many different techniques including rule-based or expert systems. And one category of techniques started becoming more widely used in the 1980s: machine learning. What Is Machine Learning? The reason that those early researchers found some problems to be much harder is that those problems simply weren't amenable to the early techniques used for AI. Hard-coded algorithms or fixed, rule-based systems just didn’t work very well for things like image recognition or extracting meaning from text. The solution turned out to be not just mimicking human behavior (AI) but mimicking how humans learn. Think about how you learned to read. You didn’t sit down and learn spelling and grammar before picking up your first book. You read simple books, graduating to more complex ones over time. You actually learned the rules (and exceptions) of spelling and grammar from your reading. Put another way, you processed a lot of data and learned from it. That’s exactly the idea with machine learning. Feed an algorithm (as opposed to your brain) a lot of data and let it figure things out. Feed an algorithm a lot of data on financial transactions, tell it which ones are fraudulent, and let it work out what indicates fraud so it can predict fraud in the future. Or feed it information about your customer base and let it figure out how best to segment them. Find out more about machine learning techniques here. As these algorithms developed, they could tackle many problems. But some things that humans found easy (like speech or handwriting recognition) were still hard for machines. However, if machine learning is about mimicking how humans learn, why not go all the way and try to mimic the human brain? That’s the idea behind neural networks. The idea of using artificial neurons (neurons, connected by synapses, are the major elements in your brain) had been around for a while. And neural networks simulated in software started being used for certain problems. They showed a lot of promise and could solve some complex problems that other algorithms couldn’t tackle.  But machine learning still got stuck on many things that elementary school children tackled with ease: how many dogs are in this picture or are they really wolves? Walk over there and bring me the ripe banana. What made this character in the book cry so much? It turned out that the problem was not with the concept of machine learning. Or even with the idea of mimicking the human brain. It was just that simple neural networks with 100s or even 1000s of neurons, connected in a relatively simple manner, just couldn’t duplicate what the human brain could do. It shouldn't be a surprise if you think about it; human brains have around 86 billion neurons and very complex interconnectivity. What is Deep Learning? Put simply, deep learning is all about using neural networks with more neurons, layers, and interconnectivity. We’re still a long way off from mimicking the human brain in all its complexity, but we’re moving in that direction. And when you read about advances in computing from autonomous cars to Go-playing supercomputers to speech recognition, that’s deep learning under the covers. You experience some form of artificial intelligence. Behind the scenes, that AI is powered by some form of deep learning. Let’s look at a couple of problems to see how deep learning is different from simpler neural networks or other forms of machine learning. How Deep Learning Works If I give you images of horses, you recognize them as horses, even if you’ve never seen that image before. And it doesn’t matter if the horse is lying on a sofa, or dressed up for Halloween as a hippo. You can recognize a horse because you know about the various elements that define a horse: shape of its muzzle, number and placement of legs, and so on. Deep learning can do this. And it’s important for many things including autonomous vehicles. Before a car can determine its next action, it needs to know what’s around it. It must be able to recognize people, bikes, other vehicles, road signs, and more. And do so in challenging visual circumstances. Standard machine learning techniques can’t do that. Take natural language processing, which is used today in chatbots and smartphone voice assistants, to name two. Consider this sentence and work out what the last part should be: I was born in Italy and, although I lived in Portugal and Brazil most of my life, I still speak fluent ________. Hopefully you can see that the most likely answer is Italian (though you would also get points for French, Greek, German, Sardinian, Albanian, Occitan, Croatian, Slovene, Ladin, Latin, Friulian, Catalan, Sardinian, Sicilian, Romani and Franco-Provencal and probably several more). But think about what it takes to draw that conclusion. First you need to know that the missing word is a language. You can do that if you understand “I speak fluent…”. To get Italian you have to go back through that sentence and ignore the red herrings about Portugal and Brazil. “I was born in Italy” implies learning Italian as I grew up (with 93% probability according to Wikipedia), assuming that you understand the implications of born, which go far beyond the day you were delivered. The combination of “although” and “still” makes it clear that I am not talking about Portuguese and brings you back to Italy. So Italian is the likely answer. Imagine what’s happening in the neural network in your brain. Facts like “born in Italy” and “although…still” are inputs to other parts of your brain as you work things out. And this concept is carried over to deep neural networks via complex feedback loops. Conclusion So hopefully that first definition at the beginning of the article makes more sense now. AI refers to devices exhibiting human-like intelligence in some way. There are many techniques for AI, but one subset of that bigger list is machine learning – let the algorithms learn from the data. Finally, deep learning is a subset of machine learning, using many-layered neural networks to solve the hardest (for computers) problems. If you're ready to get started with machine learning, try Oracle Cloud for free and build your own data lake to test out some of these techniques. Or if you're still learning about machine learning, download our free ebook, "Demystifying Machine Learning."  

AI, machine learning, and deep learning - these terms overlap and are easily confused, so let’s start with some short definitions. AI means getting a computer to mimic human behavior in some way. Machine...

Analytics

The Business Benefits of Data Exchange

“It’s difficult to imagine the power that you’re going to have when so many different sorts of data are available,” predicted Tim Berners-Lee, inventor of the World Wide Web in 2007.   Ten years on, businesses have never had as much data–and power–as they do now. What’s more, data exchange (sharing and compiling data) between industries is on the rise, meaning this trend is set to continue. The sharing of this intelligence represents an opportunity for some companies to better understand their audiences and improve customer experience, and for others to unlock new revenue streams. Download your free ebook, "Driving Growth and Innovation Through Big Data." A Real Data Exchange Use Case Operators like Telefonica are using big data to understand television audiences and their usage patterns, allowing the operator to build personalized recommendations for them based on context, time of day, or device. Telefonica’s deep understanding of its customers is also valuable to content providers and media producers who want to tailor content to their audiences. This means that Telefonica is able to take anonymized television intelligence and share it with advertising agencies and media producers, which helps them better understand the market and the impact of their content on the audience. By taking a proactive approach to data monetization, Telefonica has been able to capture 30% of Spain’s lucrative digital media and advertising market, compared to the 2% telecoms operators contribute, on average, to the advertising value chain[1]. For more information on use cases, read our free guide, "Big Data Use Cases." Monetizing Location Insights from Data Exchange Retailers are also partnering with communications operators by using anonymized and aggregated location insights to improve store location planning and layout as well as assessing staffing requirements. Europe's third-largest mobile operator, Turkcell, is using data analytics to deliver location-based services and promotions via SMS, ensuring they’re received when and where they’re most relevant. Data exchange is also enabling advertisers to optimize location and content of billboards, based on the demographics present in each area. Looking further afield, insurance companies also see the benefits of this approach. For example, by analyzing a range of data sets, such as GPS data from customers’ cars, Generali can track telematics and accident data to identify driving behaviors and patterns that may have contributed to an accident, improving customer profiling and helping actuaries with fraud detection. Yet despite the market opportunity of data monetization representing 10% of the total revenue worldwide for telecom service providers, only 0.2% of the industry’s revenue today is generated this way[2]. Businesses are currently sitting on a goldmine of untapped data. More Benefits of Data Exchange The exchange of data between industries has a larger potential too: to better serve the way we live. Cities are beginning to analyze geo-data from telecom networks to inform the planning of new developments, transport links, parking sites, and traffic flow. As smart meters become more prevalent, utility companies will be using in-car telematics data to better manage the demand electric vehicles will place on the grid.  Emergency services are also looking at telematics data so they can dispatch teams more quickly in the event of an accident. Watch the model race demo below to see how emergency response systems can be linked to a vehicle so that if a crash does occur, the emergency systems know how serious the crash is and how many people are involved. This ensures those teams arrive with all the information available on the incident. Success in an ever growing data-led market will depend on the willingness of businesses to explore new ways of using data. As evidence is growing, it's clear that data exchange is a valuable tool for some businesses to grow revenues through new streams and for others to glean new consumer insights. But, as author Richard Bach once said, "Any powerful idea is absolutely fascinating and absolutely useless until we choose to use it." From data scientists and analysts, who work closely with company data each day, to business leaders exploring new ways to improve the way they work, Oracle has a set of rich integrated solutions for everybody in your organization.  Read our free ebook, "Driving Growth and Innovation Through Big Data" to learn more about companies that uncover new benefits across their business. This post is an article from Amine Mouadden (Director, Big Data Communications & Media, Oracle EMEA). Follow him at linkedin.com/in/mouadden   [1],2 External Data Monetization: CSPs Should Cautiously Invest In New Service Offerings To Increase Revenue, Analysys Mason, June 2016   

“It’s difficult to imagine the power that you’re going to have when so many different sorts of data are available,” predicted Tim Berners-Lee, inventor of the World Wide Web in 2007.   Ten years on,...

Big Data

What Is A Data Lab?

The data lab is a more recent term in the big data and data science world. But it’s an important one, because it can be a fast route to uncovering value in new big data as well as business data in an existing data warehouse. Let’s take a look at why you should consider a data lab, and some of the key requirements for success. There are lots of ways that big data differs from (or maybe “expands upon” is better) the data that sits in a data warehouse and is used to run your business. But perhaps the key difference is that you just don’t know what questions it is capable of answering. Think about monthly sales figures. You can query those to find out who sold what, who bought what and so on. Put more simply, you know what questions that data is capable of answering, and you can ask those questions using typical visualization or reporting tools But with new data sources things are different. You have location data, web log files, data from sensors, weather data, traffic flow and more. There’s value hidden in all that data, but you’re not sure what it is. That’s where the data lab comes in. What Is A Data Lab? The data lab is a separate environment built to allow your analysts and data scientists to figure out the value hidden in your data. The data lab helps you find the right questions to ask and, of course, put those answers to work for your business. Try building a fully functioning data lake - free But why a separate environment for the data lab? It’s all about resources. Consider the following scenario. It’s late at night on the last day of the quarter. In one part of the building, finance is busy closing the books, initiating the scripts and applications that will generate the reports for executives the next morning. It’s a critical time. Elsewhere in the building, somebody has lost track of the date as they’ve been working on a particularly vexing problem for days. But perhaps the end is in sight, because a particularly resource-intensive machine learning algorithm has been showing some promise and it’s time to try it out on the whole data set. If there’s one thing you need in a production environment, it’s predictability. You want workloads to run and finish on time. But when you’re experimenting and trying to figure things out, predictability is not on your list. In that example above, somebody could unintentionally do significant damage to the business with their experimentation. That’s just one reason why you need to move experimentation away from your production environment. Who Is Involved with A Data Lab? I’ll identify four key roles that you need to consider. Data Scientist The data scientist has the key role, with responsibility to create and train models that can be used to make predictions or identify data for further investigation Data Engineer The data engineer needs to bring in, transform, and format data so that it’s usable for machine learning. They also have a role in ensuring the accuracy and relevancy of the data (basically, can you rely upon it?) and may also have responsibility for ensuring regulatory or other forms of compliance. Business Analyst If the data scientist understands the data and the algorithms, very often it’s the business analyst who understands the business and its customers. The analyst helps guide which problems get tackled, as well as interprets actual results and their importance to the business. A successful data lab project will have these three roles, and others, working together as a team. Demand for those functions will vary over time and with different projects. And very often you’ll find people who can combine two or very occasionally three different functions. For example, surveys have shown that many data scientists spend as much as 80% of their time doing data engineering work (look up the term “data janitor” which on occasion is used pejoratively), readying data for use. I left off one role, that of developer. Somebody who perhaps is more involved with putting the results of the lab to work that in the core work of the lab itself. Monetizing the Data Lab The lab is not the end result. Rather, it’s a way to generate new insights that can be put to productive use. It’s important to figure out upfront how you’re going to turn insight into value. And if you’re starting a data lab project for the first time, you want that value to be visible quickly to maintain or gain organizational support for the work. In broad terms, here are three ways to go about monetizing your data lab. Build Actionable Reports Sometimes what you find is best communicated by some kind of report. Anything from a simple email to a full written document. For example, a simple write-up with the details would be enough for the investigation team to look into what looks like a fraudulent billing practice. Modify Existing Applications and Processes Perhaps what you find enables you to modify something you are already doing. If, for example, you found evidence of more widespread fraud this could be addressed by modifying an existing applications process, or flagging suspicious applications for further review. Or imagine that you found a pattern that pointed towards a higher likelihood of a sale, which would enable you to modify the recommendation process in a web application to point customers to things that were more likely to appeal to them. Create New Custom Applications Finally, you might spot something in the data that is quite new. Perhaps you can now predict high value customers before their spending ramps up. Maybe you’d like to create a new service or application that will engage them in a different way. The Cloud Is the Best Place For Data Lab You can build a data lab anywhere, but the cloud best enables you to meet some of the unique issues you uncover in a data lab environment. Experiments Don’t Always Produce Results By their very nature, you can’t guarantee results from experiments before you have done them. This risk can cause businesses to be reluctant to incur the time and cost of establishing an on-premises lab. Building a lab in the cloud is quicker, enabling that first win in less time. And perceived risk is less due to a smaller upfront investment. Workloads Can Vary Significantly Over Time The nature of experiments is that you can’t predict what’s going to happen next. You have some ideas, but ultimately what you do tomorrow is at least partly driven by what you find today. This translates into computing demands that vary over time, something easily managed in the cloud, but hard to do on premises. Different Kinds of Workloads Can Perform Optimally in Different Environments Workloads don’t just vary by time. Some may be computationally intensive, others might require massive storage, while many machine learning algorithms can benefit from GPUs. It can be hard to plan for all of this on premises, but in the cloud you can spin things up as you need them. Experimental Work Often Reaches an End Point While some lab environments can keep running indefinitely, many experiments come to an end. If you’re truly finished with your lab, it’s easy to de-commission. If you’re done with the computationally intensive work but want to keep the data, then cloud-based object storage can keep your data in a low-cost warm archive until you need it again. Build A Data Lake In The Cloud Today If you're thinking about a data lab in the cloud a good first step would be to build a data lake to experiment with. Try our free guided trial in the cloud and get started with your data lab.  

The data lab is a more recent term in the big data and data science world. But it’s an important one, because it can be a fast route to uncovering value in new big data as well as business data in an...

Big Data

The Key to Innovative Companies? A Love of Data

By John Abel If the business world seems to move much faster these days, that’s because it does. For a long time, change happened slowly. Take economic cycles—in the 1900s it took about 19 years to go from the bottom of a recession to the top of a peak. Life moved at a steady, controllable pace. Businesses knew who their competitors were, and had plenty of time to adapt to the newcomers. It was all so simple. What the Cloud Meant for Data But then new technological innovations came along, swiftly followed by the cloud. And with it gargantuan amounts of data to collect and crunch, at a fraction of the cost it used to be. No longer did innovation mean swallowing up budgets and time. Barriers to entering new markets came crashing down, accelerating ambitions and innovation even more. Now, adopting emerging technologies is a necessity, not a choice. Wait for something to hit the mainstream, and your business won’t be looking at the menu; it’ll be on it, and about to be devoured. Join Oracle at London Tech Week 2018 to learn about trends and technology  If that sounds too dramatic, then just ask the people who once worked for Kodak, Blockbusters, or Woolworth ... iconic brands that didn’t innovate. MySpace and Friends Reunited, anyone? Such is the speed of change that even the disruptors get disrupted. We used to be stunned when Amazon, eBay, and Groupon took just a decade to get to $10 billion in annual revenues. Now there are about 30 companies that have gone from zero to $1 billion in four months. They all have one thing in common. They are data crazy. The Fastest-Growing Companies Innovate With Data These companies have realized that the way to take on their competitors is to spot patterns and anomalies in data faster and more regularly than the next guy. One of the fastest-growing and most innovative companies right now (though in the time it takes you to read this article, it might all have changed!) is Jet2.com—and all they do is use AI and machine learning to predict the price changes of flights, and beat the market. When I started my career—when I’d buy my clothes at Woolworths and my camera film from Kodak—doing what Jet2.com does would have meant some serious spend. But it cost them barely anything to get started, just some basic data tools and algorithm know-how. And now they’re taking market share at supersonic speed. Let’s look at China—even as recently as 2011, their old big banks dominated the consumer payment space. Now, they don’t. And which payment solution startup is causing them to worry? WeChat—a messaging app whose capabilities have grown way beyond its original purpose. Out-mastering your enemy is hard enough when you know who they are. But when it could be anyone... How to Make Use of Your Data Opportunity or threat, defend or attack; either way, action has to be the answer. But hold on ... I’m a food retailer, I’m a car manufacturer, my business is making medicines, or building houses or giving financial advice. What do I know about all this data stuff and how do I innovate with it? You don’t need to. But you do need a partner who does. A partner who has the technology, the expertise, and the time you don’t. Who has the keys to unlock that Pandora’s box of secrets lurking in your data. Who has the appetite and know-how to innovate at breakneck speed, and take you with them. Who? How about the people behind world’s most popular database. Learn more about Oracle's presence at London Technology Week. 

By John Abel If the business world seems to move much faster these days, that’s because it does. For a long time, change happened slowly. Take economic cycles—in the 1900s it took about 19 years to go...

Innovation

7 Machine Learning Best Practices

Netflix’s famous algorithm challenge awarded a million dollars to the best algorithm for predicting user ratings for films. But did you know that the winning algorithm was never implemented into a functional model? Netflix reported that the results of the algorithm just didn’t seem to justify the engineering effort needed to bring them to a production environment. That’s one of the big problems with machine learning. At your company, you can create the most elegant machine learning model anyone has ever seen. It just won’t matter if you never deploy and operationalize it. That's no easy feat, which is why we're presenting you with seven machine learning best practices. Download your free ebook, "Demystifying Machine Learning" At the most recent Data and Analytics Summit, we caught up with Charlie Berger, Senior Director of Product Management for Data Mining and Advanced Analytics to find out more. This is article is based on what he had to say.  Putting your model into practice might longer than you think. A TDWI report found that 28% of respondents took three to five months to put their model into operational use. And almost 15% needed longer than nine months. So what can you do to start deploying your machine learning faster? We’ve laid out our tips here: 1. Don’t Forget to Actually Get Started In the following points, we’re going to give you a list of different ways to ensure your machine learning models are used in the best way. But we’re starting out with the most important point of all. The truth is that at this point in machine learning, many people never get started at all. This happens for many reasons. The technology is complicated, the buy-in perhaps isn’t there, or people are just trying too hard to get everything e-x-a-c-t-l-y right. So here’s Charlie’s recommendation: Get started, even if you know that you’ll have to rebuild the model once a month. The learning you gain from this will be invaluable. 2. Start with a Business Problem Statement and Establish the Right Success Metrics Starting with a business problem is a common machine learning best practice. But it’s common precisely because it’s so essential and yet many people de-prioritize it. Think about this quote, “If I had an hour to solve a problem, I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” Now be sure that you’re applying it to your machine learning scenarios. Below, we have a list of poorly defined problem statements and examples of ways to define them in a more specific way. Think about what your definition of profitability is. For example, we recently talked to a nation-wide chain of fast-casual restaurants that wanted to look at increasing their soft drinks sales. In that case, we had to consider carefully the implications of defining the basket. Is the transaction a single meal, or six meals for a family? This matters because it affects how you will display the results. You’ll have to think about how to approach the problem and ultimately operationalize it. Beyond establishing success metrics, you need to establish the right ones. Metrics will help you establish progress, but does improving the metric actually improve the end user experience? For example, your traditional accuracy measures might encompass precision and square error. But if you’re trying to create a model that measures price optimization for airlines, that doesn’t matter if your cost per purchase and overall purchases isn’t going up. 3. Don’t Move Your Data – Move the Algorithms The Achilles heel in predictive modeling is that it’s a 2-step process. First you build the model, generally on sample data that can run in numbers ranging from the hundreds to the millions. And then, once the predictive model is built, data scientists have to apply it. However, much of that data resides in a database somewhere. Let’s say you want data on all of the people in the US. There are 360 million people in the US—where does that data reside? Probably in a database somewhere. Where does your predictive model reside? What usually happens is that people will take all of their data out of database so they can run their equations with their model. Then they’ll have to import the results back into the database to make those predictions. And that process takes hours and hours and days and days, thus reducing the efficacy of the models you’ve built. However, growing your equations from inside the database has significant advantages. Running the equations through the kernel of the database takes a few seconds, versus the hours it would take to export your data. Then, the database can do all of your math too and build it inside the database. This means one world for the data scientist and the database administrator. By keeping your data within your database and Hadoop or object storage, you can build models and score within the database, and use R packages with data-parallel invocations. This allows you to eliminate data duplications and separate analytical servers (by not moving data) and allows you to to score models, embed data prep, build models, and prepare data in just hours. 4. Assemble the Right Data As James Taylor with Neil Raden wrote in Smart Enough Systems, cataloging everything you have and deciding what data is important is the wrong way to go about things. The right way is to work backward from the solution, define the problem explicitly, and map out the data needed to populate the investigation and models. And then, it’s time for some collaboration with other teams. Here’s where you can potentially start to get bogged down. So we will refer to point number 1, which says, “Don’t forget to actually get started.” At the same time, assembling the right data is very important to your success. For you to figure out the right data to use to populate your investigation and models, you will want to talk to people in the three major areas of business domain, information technology, and data analysts. Business domain—these are the people who know the business. Marketing and sales Customer service Operations Information technology—the people who have access to data. Database administrators Data Analysts—people who know the business. Statisticians Data miners Data scientists You need the active participation. Without it, you’ll get comments like: These leads are no good That data is old This model isn’t accurate enough Why didn’t you use this data? You’ve heard it all before. 5. Create New Derived Variables You may think, I have all this data already at my fingertips. What more do I need? But creating new derived variables can help you gain much more insightful information. For example, you might be trying to predict the amount of newspapers and magazines sold the next day. Here’s the information you already have: Brick-and-mortar store or kiosk Sell lottery tickets? Amount of the current lottery prize Sure, you can make a guess based off that information. But if you’re able to first compare the amount of the current lottery prize versus the typical prize amounts, and then compare that derived variable against the variables you already have, you’ll have a much more accurate answer. 6. Consider the Issues and Test Before Launch Ideally, you should be able to A/B test with two or more models when you start out. Not only will you know how you’re doing it right, but you’ll also be able to feel more confident knowing that you’re doing it right. But going further than thorough testing, you should also have a plan in place for when things go wrong. For example, your metrics start dropping. There are several things that will go into this. You’ll need an alert of some sort to ensure that this can be looked into ASAP. And when a VP comes into your office wanting to know what happened, you’re going to have to explain what happened to someone who likely doesn’t have an engineering background. Then of course, there are the issues you need to plan for before launch. Complying with regulations is one of them. For example, let’s say you’re applying for an auto loan and are denied credit. Under the new regulations of GDPR, you have the right to know why. Of course, one of the problems with machine learning is that it can seem like a black box and even the engineers/data scientists can’t say why certain decisions have been made. However, certain companies will help you by ensuring your algorithms will give a prediction detail. 7. Deploy and Automate Enterprise-Wide Once you deploy, it’s best to go beyond the data analyst or data scientist. What we mean by that is, always, always think about how you can distribute predictions and actionable insights throughout the enterprise. It’s where the data is and when it’s available that makes it valuable; not the fact that it exists. You don’t want to be the one sitting in the ivory tower, occasionally sprinkling insights. You want to be everywhere, with everyone asking for more insights—in short, you want to make sure you’re indispensable and extremely valuable. Given that we all only have so much time, it’s easiest if you can automate this. Create dashboards. Incorporate these insights into enterprise applications. See if you can become a part of customer touch points, like an ATM recognizing that a customer regularly withdraws $100 every Friday night and likes $500 after every payday. Conclusion Here are the core ingredients of good machine learning. You need good data, or you’re nowhere. You need to put it somewhere like a database or object storage. You need deep knowledge of the data and what to do with it, whether it’s creating new derived variables or the right algorithms to make use of them. Then you need to actually put them to work and get great insights and spread them across the information. The hardest part of this is launching your machine learning project. We hope that by creating this article, we’ve helped you out with the steps to success. If you have any other questions or you’d like to see our machine learning software, feel free to contact us. You can also refer back to some of the articles we’ve created on machine learning best practices and challenges concerning that. Or, download your free ebook, "Demystifying Machine Learning."  

Netflix’s famous algorithm challenge awarded a million dollars to the best algorithm for predicting user ratings for films. But did you know that the winning algorithm was never implemented into a...

Analytics

Big Data Preparation: The Key to Unlocking Value from Your Data

Making a success of big data analytics is a bit like constructing a skyscraper. Foundations need to be laid and the land prepared for construction, or else the building will rest on shaky ground. Download your free book, "Driving Growth & Innovation with Big Data" The success of any analytics project depends on the quality and relevance of the data it’s built upon. The issue today is that companies collect an exponentially large volume and variety of information in many different formats and are struggling to convert it all into useable insight. In short, they're having trouble preparing their big data and unlocking the value.  Difficulties with Big Data Preparation For instance, before analysis, a business may need to aggregate data from diverse sources, remove, or complete empty data fields, de-duplicate data, or transform data into a consistent format. These tasks have traditionally relied on the expertise of the IT department – even as ownership of analytics projects has shifted towards line of business leaders. But as volumes of data grow, preparing data in these ways becomes more laborious. With this mounting demand, IT teams can take weeks to fulfill requests. Businesses have recognized this and are investing in data preparation technologies. Two thirds say they have implemented a data preparation or wrangling solution to manage a growing volume of data, and 56% have done so to help them work with multiple data sources, according to research from Forrester. Today’s data preparation tools aren’t restricted to those with IT expertise and they allow companies to spread their analytics processes to individual lines of business. Not only does this dislodge their data bottleneck, but analyses are managed by subject matter experts with a keen eye for the most valuable insights. How Companies Use Big Data for Business Benefits As organizations are overwhelmed by the flood of data, it’s also important to unify data from the various sources and ensure they are accessible and consistent across the business. For example, CaixaBank is storing vast pools of data on one consolidated platform – commonly referred to as a data lake – so each of its business units can access, analyze, and digest relevant data as needed. From here, businesses can start experimenting with the data to explore new ideas. For instance, Telefonica worked with a single view of its data to test a new algorithm designed to create personalized TV-content optimized pricing models for customers. After successful testing, Telefonica made the algorithm live and has since seen higher TV viewing rates and improved customer satisfaction, while also reducing customer churn by 20%. In addition to unlocking the commercial value of data, there is a strong regulatory driver for companies to gain more control and oversight of their data. When the EU’s GDPR comes into effect this month, companies will face harsh penalties if they are not transparent about the way they collect, use, and share customer information. Conclusion To reach skyscraper heights and build the businesses of tomorrow, data preparation must rise up the corporate agenda and be a priority for all companies looking to unlock the value of their ever-increasing volumes of data. From data scientists and analysts, who work closely with company data each day, to business leaders exploring new ways to improve the way they work, Oracle has a set of rich integrated solutions for everybody in your organization.  Read our ebook, "Driving Growth & Innovation With Big Data" to understand how Oracle’s Cloud Platform for Big Data helps companies uncover new benefits across their business.

Making a success of big data analytics is a bit like constructing a skyscraper. Foundations need to be laid and the land prepared for construction, or else the building will rest on shaky ground. Downlo...

Analytics

5 Innovative Ways to Use Graph Analytics

According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising. The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business. It’s avoidable. By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions. We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them. Get started with data—register for a guided trial to build a data lake But first, a refresher on graph technology. What Is Graph Technology? With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines). By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article. The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases. How to Find Anomalies with Graph Technology If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories: Discrete data analysis where you evaluate individual users, actions, and accounts Connected analysis where relationships and integrated behaviors facilitate the fraud It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis. Through connected analysis and graph technology, you would: Combine and correlate enterprise information Model the results as a connected graph Apply link and social network analysis for discovery Now we’ll discuss examples of ways companies can apply this to solve real business problems. Fraud Detection Use Case #1: Finding Bot Accounts in Social Networks  In the world of social media, marketers want to see what they can discover from trends. For example: If I’m selling this specific brand of shoes, how popular will they be? What are the trends in shoes? If I compare this brand with a competing brand, how do the results mirror actual public opinion? On social media, are people saying positive or negative things about me? About my competitors? Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots. In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity. As an example, there are bots that retweet certain target accounts to make them look more popular. To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts. Here is the pattern for a naturally popular account: And here is the pattern for an unnaturally popular account: When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by: Finding accounts with a high retweet count Inspecting how other accounts are retweeting them Finding the accounts that also get retweets from only these bots Fraud Detection Use Case #2: Identifying Sock Puppets in Social Media In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending. To discover the bots, we had to augment the graph from Use Case #1. Here we: Added edges between the authors with the same messages Counted the number of repeated messaged and filtered to discount accidental unison Applied heuristics to avoid n2 edge generation per same message Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm. As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots. See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need. After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered: 89% were suspended 2.2% were deleted 8.8% were still serving as bots Fraud Detection Use Case #3: Circular Payment A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them. For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.  Fraud Detection Use Case #4: VAT Fraud Detection Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated. In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company. This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies? In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man. The graph view of VAT fraud detection: A more complex view: In that case, you would: Identify importers and exporters via simple query Aggregate of VAT invoice items as edge weights Run Fattest Path Algorithm And you will discover common “Middle Man” nodes where the flows are aggregated Fraud Detection Use Case #5: Money Laundering and Financial Fraud Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis. These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment. But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data. In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections. To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters). Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.   Conclusion In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages. And if you're ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake. 

According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising. The combination of...

Cloud

Association Rules in Machine Learning, Simplified

You’ve probably been to a supermarket that printed coupons for you at checkout. Or listened to a playlist that your streaming service generated for you. Or gone shopping online and seen a list of products labeled “you might be interested in….” that did indeed contain some stuff that you were interested in. Recommendation engines take data about you, similar consumers, and available products, and use that to figure out what you might be interested in and therefore deliver those coupons, playlists, and suggestions. Download your free ebook, "Demystifying Machine Learning." Recommendation engines can be extremely complex. For example, Netflix ran a $1M competition from 2006 to 2009 to improve their movie recommendation engine performance. Over 5,000 teams participated. The winning team combined results from 107 different algorithms or techniques to deliver the 10 percent improvement and claim the prize. So, there are many different ways to build a recommendation engine and most will combine multiple techniques or approaches. In this article, I want to cover just one approach, association rules, which are fairly easy to understand and require minimal skills in mathematics. If you can work with simple percentages, there’s nothing more complex than that below. Association Rules in the Real World Conceptually association rules is a very simple technique. The end result is one or more statements of the form “if this happened, then the following is likely to happen." In a rule, the "if" portion is called the antecedent, and the "then" portion is called the consequent. Remember those two terms because they are going to come up in the descriptions below. Let’s start with food shopping because association rules are very often used to analyze the contents of your shopping cart. As you make your shopping list, you probably buy a mix of pantry staples as well as ingredients for a specific meal or dish that you plan to prepare. Imagine you plan to make tomato sauce for pizza or a pasta dish. You’re probably going to buy tomatoes, onions, garlic, maybe olive oil or fresh basil. You’re far from the only person making tomato sauce and many others will have similar sets of ingredients. If we looked at all the various shopping baskets that people purchased, we could start to see some things in common. “If somebody buys canned tomatoes, then they are more likely to buy dried pasta (or onions or garlic or pizza dough or …)”. Armed with this knowledge, a supermarket could print you a coupon at checkout for something you didn’t purchase, hoping that you would come back. Or a manufacturer might offer you a coupon for their pre-made tomato sauce for those nights when you don’t want to make it from scratch. Although tomatoes might imply garlic and/or basil, the reverse may not be true. For example, somebody buying garlic and basil could be looking to make pesto, in which case they’d be more likely to buy pine nuts than tomatoes. But with the right analysis, it would be possible to find the rules governing which products were more likely to be associated with each other. Hence the name “association rules”. Let’s illustrate this process with some real numbers. And to do so we’ll move from buying groceries to watching movies. How Do Association Rules Work in Machine Learning, Exactly? The starting point for this algorithm is a collection of transactions. They could be traditional purchase transactions, but could also include events like “put a product in an online shopping cart,” “clicked on a web ad” or, in this case, “watched a movie.” I’ll use this very abbreviated data set of movie watching habits of five people. I’ve anonymized them to hide their identities (not that this approach always works). Here you see each person and the list of movies they have watched, here represented by numbers from 1-5. User Movies Watched A 1, 2, 4 B 1, 3 C 1, 4 D 2, 3, 4 E 3, 4 As you work your way down that table the first thing to stand out is that the first and third users both watched movies 1 and 4. From this data, the rule there would be “if somebody watches movie number 1, then they are likely to watch movie number 4.” You’ll need to understand the two terms I snuck in above: movie 1 is the antecedent and movie 4 is the consequent. Let’s look at this rule in more detail. How useful is this rule? There are 2 users out of 5 who demonstrate watched both movies 1 and 4. So we can say that this rule has support of 40% (2 out of 5 users). How confident are we that it’s a reliable indicator? Three users watched movie number 1, but only 2 of them also watched number 3. The confidence in this rule is 67 percent. Note that if you reverse the order rule (or swap the antecedent and consequent if you prefer) we can also say that “if somebody watched movie number 4 then they are likely to watch movie number 1.” However, while the support is also 40 percent, the confidence changes and is now only 50 percent (check the table above to see how that came about). This is the same process as in the example with tomato sauce and pesto above. What do these metrics mean? With just 5 users and 5 movies it might be hard to see, but imagine this is a subset of many millions of users and thousands of movies. If the support is very low, it basically means that this rule will not apply to many customers. For example it might mean that people who watch some obscure 70s documentary will also watch an equally obscure 80s film. In the movie recommendation space, this would translate to a niche rule that might not get used very often, but could be quite valuable to that very small subset of customers. However, if you were using rules to find the optimal placement of products on the shelves in a supermarket, lots of low support rules would lead to a very fragmented set of displays. In this kind of application, you might set a threshold for support and discard rules that didn’t meet that minimum. How to Understand Confidence in Association Rules Confidence is a little easier to understand. If there’s a rule linking two movies but with very low confidence, then it simply means that most of the time they watch the first movie, they won’t actually watch the second one. For the purpose of making recommendations or predictions, you’d much rather have a rule that you were confident about. You could also use a minimum threshold for confidence and ignore or discard rules below a certain threshold. Take another look at the first rule from above: if somebody watches movie 1 they will also watch movie 4. The confidence here is 67 percent which is pretty good. But take a look at the rest of the table. Four out of 5 users watched movie number 4 anyway. If we know nothing else about their other movie preferences, we know that there’s an 80 percent chance of them watching movie 4. So despite that confidence of 67 percent that first rule we found is actually not useful: somebody who has watched movie 1 is actually less likely to watch movie 4 than somebody picked at random. Fortunately, there’s a way to take this into account. It’s called “lift”. Lift in Association Rules Lift is used to measure the performance of the rule when compared against the entire data set. In the example above, we would want to compare the probability of “watching movie 1 and movie 4” with the probability of “watching movie 4” occurring in the dataset as a whole. As you might expect, there’s a formula for lift: Lift is equal to the probability of the consequent given the antecedent (that’s just the confidence for that rule) divided by probability of that consequent occurring in the entire data set (which is the support for the consequent), or more concisely: Lift = confidence / support(consequent) In this example, the probability of movie 4, given that movie 1 was watched, is just the confidence of that first rule: 67 percent or 0.67. The probability of some random person in the entire dataset (of just 5 users in this simple example) watching movie 4 is 80 percent or 0.8. Dividing 0.67 by 0.8 gives a lift of approximately 0.84. In general, if you have a lift of less than 1, it shows a rule that is less predictive than just picking a user at random which is the case with this rule as I explained in the first paragraph of this section. If you have a lift of around 1, then it’s indicating two independent events, e.g., watching one movie does not influence the likelihood of watching another. Values of lift that are greater than 1 show that the antecedent does influence finding the consequent. In other words, here is a rule that is useful. Testing and Binning Association Rules I’ll finish with a few more tips and extensions to the simple example above. First, how do you test the accuracy of your rules? You can just use the same approach that I previously outlined for classification: build your rules with a subset of the available transactions, and then test the performance of those rules against the remainder. And of course, you should monitor their performance if they are used to make recommendations to actual users. We worked with a simple rule above: “if user watched movie 1 then they are likely to watch movie 4.” This is referred to as a rule of length 2, because it incorporates two elements. Of course, we could have more complex rules: “if a user watched movies 1, 2 and 3, they then are likely to watch movie 4” is a rule of length 4. Or if you want to go back to grocery shopping for a similar rule, somebody who buys tomatoes, garlic, and pasta is likely to want some parmesan cheese to go with their spaghetti dinner. Some streaming sites ask users to rate the things they watch on a scale of 1-5 or 1-10. If we had that information we couldn’t use the numeric value directly; we’d want to “bin” the answers. For example, we might say that a score of 7-10 was considered “high” and so on. A rule then might then incorporate “watched movie A and rated it high”. The concept of binning applies to the last example, which takes us well away from movies and shopping to machines because these rules potentially have wider uses. Imagine you’re responsible for maintaining some machine that breaks from time to time. You have lots of sensor data and other information about its operation, and you’ve captured several failures. You could in principle treat failure as a consequent and search for the antecedents. You’d have to bin the sensor data in some way (flow rate of 7.3 to 11.4 goes into this bin etc.). In principle you could use association rules to find the conditions that are associated with failure and take corrective action, also referred to as root cause analysis. Replace mechanical failure with diagnosis and you could even use some form of association rules with medical data. Learn More Visit some of our previous articles for high level overviews of machine learning, a look at decision trees, or k-means clustering. If you’d like to find out more about Oracle’s ML technologies you can look at our support for R as well as Advanced Analytics inside Oracle Database.  If you're ready to get started with machine learning, try Oracle Cloud for free and build your own data lake to test out some of these techniques. 

You’ve probably been to a supermarket that printed coupons for you at checkout. Or listened to a playlist that your streaming service generated for you. Or gone shopping online and seen a list of...

Analytics

Interactive Data Lake Queries At Scale

Data lakes have been a key aspect of the big data landscape for years now. They provide somewhere to capture and manage all kinds of new data that potentially enable new and exciting use cases. You can read more about what a data lake is or listen to a webcast about building a data lake in the cloud. But maybe the key word in that first paragraph is “potentially”. Because to realize that value you need to understand the new data you have, explore it and query it interactively so you can form and test hypotheses. About Interactive Data Lake Query Interactive data lake query at scale is not easy. In this article, we’re going to take a look at some of the problems you need to overcome to make full productive use of all your data. That’s why Oracle acquired SparklineData to help address interactive data lake query at scale. More on that at the end of this article. Hadoop has been the default platform for data lakes for a while but it was originally designed for batch rather than interactive work. The development of Apache SparkTM offered a new approach to interactive queries because Spark’s modern distributed compute platform and is one or two orders of magnitude faster than Hadoop with MapReduce. Replace HDFS with Oracle’s object storage  (Amazon calls it S3 while Microsoft refers to Blob Storage) and you’ve got the foundation for a modern data lake that can potentially deliver interactive query at scale.  Try building a fully functioning data lake – free Interactive Query At Scale Is Hard OK, I said “potentially” again. Because even though you’ve now got a modern data lake, there are some other issues that make interactive query of multi-dimensional data at scale very hard: Performance Pre-aggregation Scale-out Elasticity Tool choice Let’s look at each one of these in turn. Interactive queries need fast response times. Users need “think speed analysis” as they navigate worksheets and dashboards. However, performance gets worse when many users try to access datasets in the data lake at the same time. Further, joins between fact and dimension tables can cause additional performance bottlenecks. Many tools have resorted to building an in-memory layer but this approach alone is insufficient. Which leads to the second problem. Another way to address performance is to extract data from the lake and pre-aggregate it. OLAP cubes, extracts and materialized pre-aggregated tables have been used for a while to facilitate the analysis of multi-dimensional data. But there’s a tradeoff here. This kind of pre-aggregation supports dashboards or reporting, but it is not what you want for more ad-hoc querying. Key information behind the higher-level summaries is not available. It’s like zooming into a digital photograph and getting a pixelated view that obscures the details. What you want is access to all the original data so you can zoom in and look around at whatever you need. Take a look at this more detailed explanation about pre-aggregating data for deep analysis.  Data lakes can grow quite large. And sooner or later you’re going to need to do analysis on terabytes, rather than gigabytes of data at a time. Scaling out to this kind of magnitude is the kind of stress test that plenty of tools fail at because they don’t have the distributed compute engine architecture that a framework like Spark brings natively to operate at this scale. Scaling out successfully is part of the problem. But you also need to scale back down again. In other words, you need an elastic environment, because your workload is going to vary over time in response to anything from the sudden availability of a new data set to the need to analyze a recently-completed campaign or the requirement to support a monthly dashboard update. Elasticity is partly a function of having a modern data lake where compute and storage can scale independently. But elasticity also requires that tools using the data lake have the kind of distributed architecture needed to address scale out. Finally, getting the most out of your data is not a job for one person or even one role. You need input from data scientists as well as business analysts, and they will each bring their requirements for different tools. You want all the tools to be able to operate on the same data and not have to do unique preparations for each different tool. Addressing Data Lake Query Problems Oracle acquired SparklineData last week, and we’re excited because Sparkline SNAP has some innovative solutions to these problems: It runs scale-out in-memory using Apache Spark for performance, scalability and more. It can deliver sub-second queries on terabyte data sets. It doesn’t need to pre-aggregate data or create extracts because it uses in-memory indexes with a fully distributed compute engine architecture. It’s fully elastic when running in a modern data lake based on object storage and Spark clusters. Different users can access data with their tools of choice including Zeppelin or Jupyter notebooks running Python or R, and BI tools like Tableau. This means that people can use their tool of choice but connect to the SparklineSNAP in-memory layer. We’re looking forward to integrating Sparkline SNAP into Oracle’s own data lake and analytics solutions and making it available to our customers as soon as possible.  Interactive Query Use Cases So when would you want to use this technology? There are lots of use cases, but here are three to think about: 1. Data from machines and click streams falls into event/time-series data that can quickly grow in size and complexity. Providing ad-hoc interactive query performance on multi-terabyte data to BI tools connecting live to such data is impossible with current data lake infrastructures. SparklineSNAP is designed to operate and analyze such large data sets in-place on the data lake without the need to move and summarize it for performance.  2. Perhaps all the data you want to work with isn’t currently in a data lake at all. If you have ERP data in multiple different applications and data stores, doing an integrated analysis is a nigh-on impossible task. But if you move it all into object storage and make it accessible to Sparkline SNAP, you can do ad hoc queries as you need, whether the original data came from a single source or from 60 different ones. 3. Finally, maybe you’re already struggling with all the extracts and pre-aggregation needed to support your current in-memory BI tool. With Sparkline SNAP you can dispense with all that and work on live data at any level of granularity. So not only can you save the time and effort of preparing the data, you can do a better analysis anyway. There’s more information in this article on pre-aggregating data for deep analysis. If you’d like to get started with a data lake, then check out this guided trial. In just a few hours you’ll have a functioning data lake, populated with data and incorporating visualizations and machine learning.

Data lakes have been a key aspect of the big data landscape for years now. They provide somewhere to capture and manage all kinds of new data that potentially enable new and exciting use cases. You...

Cloud

Integrating Autonomous Data Warehouse and Big Data Using Object Storage

While you can run your business on the data stored in Oracle Autonomous Data Warehouse, there’s lots of other data out there which is potentially valuable. Using Oracle Big Data Cloud, it’s possible to store and process that data, making it ready to be loaded into or queried by the Autonomous Data Warehouse. The point of integration for these two services is object storage which I will explore below. Of course, you need more than this for a complete big data solution. If that's what you're looking for, you should read about data lake solution patterns. Sign up for a free trial to build and populate a data lake in the cloud Use Cases for the Data Lake and Data Warehouse Almost all big data use cases involve data that resides in both a data lake and data warehouse. With predictive maintenance, for example, we would want to combine sensor data (stored in the data lake) with official maintenance and purchase records (stored in the data warehouse). When trying to determine the next best action for a given customer, we would want to work with both customer purchase records (in the data warehouse) and customer web browsing or social media usage (details of which would most likely be stored in the data lake). In use cases from manufacturing to healthcare, having a complete view of all available data means working with data in both the data warehouse and the data lake. The Data Lake and Data Warehouse for Predictive Maintenance Take predictive maintenance as an example. Official maintenance records and purchase or warranty information are all important to the business. It may be needed for regulators to check that proper processes are being followed or for purchasing departments to manage budgets or order new components. On the other hand, sensor information from machines, weather stations, thermometers, seismometers, and similar devices all produce data that is potentially useful to help understand and predict the behavior of some piece of equipment. If you asked your data warehouse administrator to store many terabytes of this raw, less well-understood, multi-structured data, they would not be very enthusiastic. This kind of data is much better suited for a data lake, where it can be transformed or used as the input for machine learning algorithms. But ultimately, you want to combine both data sets to predict failures or a component moving out of tolerance. Examples: How Object Storage Works with the Data Warehouse We talked previously about how object storage is the foundation for a modern data lake. But it’s much more than that. Object storage is used, amongst other things, for backup and archive, to stage data for a data warehouse, or to offload data that is no longer stored there. And these use cases require that the data warehouse can also work easily with object storage, including data in the data lake. Let’s go back to that predictive maintenance use case. After being loaded into the data lake (in object storage) the sensor data can be processed in a Spark cluster spun up by Oracle Big Data Cloud. “Processing” in this context could be anything from a simple filter or aggregation of results to a running a complex machine learning algorithm to uncover hidden patterns. Once that work is done, a table of results will be written back to object storage. At that point, it could be loaded into the Autonomous Data Warehouse or queried in place. Which approach is best? Depends on the use case. In general, if that data is accessed more frequently, or performance of the query is more important, then loading into the Autonomous Data Warehouse is probably optimal. Here you can think of object storage as another tier in your storage hierarchy (note that Autonomous Data Warehouse already has RAM, flash, and disk as storage tiers). We can also see a similar approach in an ETL offload use case. Raw data is staged into object storage. Transformation processes then run in one or more Big Data Cloud Spark clusters, with the results written back to object storage. This transformed data is then available to load into Autonomous Data Warehouse. Autonomous Data Warehouse and Big Data Cloud: Working Together Don’t think of Oracle Autonomous Data Warehouse and Oracle Big Data Cloud as two totally separate services. They have complementary strengths and can interoperate via object storage. And when they do, it will make it easier to take advantage of all your data, to the benefit of your business as a whole. If you're interested in learning more, sign up for an Oracle free trial to build and populate your own data lake. We have tutorials and guides to help you along. 

While you can run your business on the data stored in Oracle Autonomous Data Warehouse, there’s lots of other data out there which is potentially valuable. Using Oracle Big Data Cloud, it’s possible...

Cloud

Machine Learning Challenges: What to Know Before Getting Started

The rewards of machine learning can be compelling, and it may make you want to get started, now. At the same time, however, you'll want to consider machine learning challenges before you start your own project. This article isn’t meant to scare you away; rather, it’s meant to ensure you’re prepared and that you’re carefully thinking about what you’ll need to consider before you get started. Register for a free trial to build a data lake and try machine learning techniques  We spoke with Brian MacDonald, Data Scientist on Oracle’s Information Management Platform Team, about the pitfalls he’s seen and what companies can do to avoid them. These machine learning challenges include: Addressing the skills gap Knowing how to manage your data Operationalizing the data 1. Address the Machine Learning Skills Gap The biggest difficulty, of course, is the skills gap that lies with using machine learning in a big data environment. There’s a certain community of people who think that big data makes life beautiful and it will be easy to get started. The biggest challenge you’re going to find is discovering the right people. There is a big demand for people who are skilled in machine learning and a small pool to choose from. But as we described in our article about machine learning success, having executive support is key to this. If you have executive support, you’re also going to have the funding to find and recruit those valuable people. Here’s something to think about. If you’re in a situation where you’re very sensitive to cost because skilled data scientists are expensive, then you probably don’t have a big enough business problem to make machine learning worth doing. Let’s say a skilled data scientist costs your company $300-400,000 (including all benefits and incentives). If that person can’t help you solve a problem that’s worth at least a million a year, then you probably don’t need that person. Right? On the other hand, if you truly believe this person (or team of people) can help you solve a problem in the tens of millions, then what are you waiting for? It is difficult to find people. But if it’s truly important to your company, you can find them. Here’s another issue to think about: the tools and software. While there are of course tools that will help, you’ll rarely be able to find the exact, perfect machine learning tools you need that are ready to go for you, right out of the box. You’ll have to think about the tooling you’re going to use. Python, R, SQL, TensorFlow? And if you use those, how will they work with your data lake? And how will you handle the setup and configuration that can create challenges? Think through the details before you get started and ensure you have enough funding. 2. Know How to Manage Your Big Data Machine learning is a messy process. And just having a big data platform doesn’t automatically mean it will be easier. In fact, it might make it messier, because you’ll have more data. That data enables you to do more, but it also means more data prep that has to be done. You’ll have to think holistically about how you’re going to approach the problem. Here are some questions to think through: Where is your data coming from? How are you going to approach the problem? How are you hoping to handle your data preparation? And once that’s done, how will you build your models and operationalize everything? If you don’t already have a good BI practice or an analytics practice and if you’re not using data in all the ways you can think of already—well, jumping over to machine learning is really going to be a challenge. Already having data-driven decision making is absolutely critical. If you don’t have that, we recommend having that in place before you get started with machine learning. If you do decide to start, here are some other considerations. Think about them carefully before you get started: Rapid Change In the machine learning world, innovation is coming quickly which means rapid change. What’s good today may not be so good tomorrow, and you can’t always rely on the software because it’s a more volatile space. You might get more issues with different versions and conflicts. The Sheer Volume of Data With machine learning, you’ll have to deal with data—lots and lots of different kinds of data. Understanding whether you use all of it, the processes, whether to sample, etc.—all of that can be a challenge, especially when you’re getting deeper into your data and dealing with data movement. Ensure you’re up to facing that challenge, and have your plan in place. 3. Operationalizing Your Big Data What’s the biggest issue most data scientists face? It’s operationalizing the data. Let’s say you’ve built a model and it can predict factors that lead to churn. How do you get that model out to the people who can affect those numbers? How can you get it to the CRM or mobile app? If you have a model that predicts equipment failure, how can you get it to the operator in time to prevent that failure? There are many challenges with taking a model and making it actionable. And it’s probably the biggest technical challenge that exists for data scientists these days. You can build the most beautiful models in the world. But will your c-suite truly care if it’s not actually making an impact on the company’s bottom line? You might think your part of the bargain is just to make the data available. But it’s not. You have to make sure your data is actually going to be used. Gaining executive support is hugely helpful for this. So machine learning isn’t really easy. But it can accomplish big things. To inspire you and remind you of what’s possible, we’re sharing a real-life customer example and their machine learning project. Real-Life Machine Learning and Big Data Example This company is one of the largest providers of wireless voice and data communications services in the United States. Business Challenges: Credit Risk: Their equipment leasing and loan program through their financing arm has to write off large amounts of bad debt every year. They wanted to reduce bad loans and defaults, which will significantly add to their bottom line in millions every year. In addition, ability to impact pending collections will dramatically help with cash flow. Customer Experience and Personalization: Customer churn costs this company millions a year. Early identification and targeting of both potential churn and new high value customers through personalization and segmentation can dramatically increase the number of net new subscribers, and reduce churn. Operational Effectiveness: This company sought enhanced targeted marketing and campaign effectiveness through network optimization and data monetization. Technology Challenges: This telecom company wanted to detect fraudulent activity much earlier and integrate data from multiple structured and unstructured sources to improve customer scoring. This would enable the company to provide customized offers and reduce risk.  They also wanted the ability to store and analyze large volumes of customer data to help the business develop a better ability to segment customers and predict their behavior for personalized offers. They sought to optimize pricing through new advanced what-if analysis. In order to accomplish this, the company purchased a wide variety of Oracle big data products including Oracle Golden Gate for Big Data, which is part of Oracle Data Integration Platform Cloud. Addressing the skills gap, managing the data, and operationalizing it are challenges that need to be dealt with – but they can be. And the results can be incredible. Read more on tips on success with machine learning for more information. And if you'd like to try building a data lake and use machine learning on the data, Oracle offers a free trial. Register today to see what you can do.

The rewards of machine learning can be compelling, and it may make you want to get started, now. At the same time, however, you'll want to consider machine learning challenges before you start your...

Analytics

K-Means Clustering in Machine Learning, Simplified

Following on from our previous article about the decision tree algorithm, today we're exploring clustering within the realm of machine learning and big data. Specifically, let’s look at the commonly used k-means algorithm. If you want a refresh on clustering (and other techniques), take a look at some of our other articles about machine learning. Download your free ebook, "Demystifying Machine Learning." K-Means Clustering in the Real World Clustering as a general technique is something that humans do. But unlike decision trees, I don’t think anybody really uses k-means as a technique outside of the realm of data science. But let’s pretend for a second, that you really wanted to do just that. What would it look like? Imagine it’s the end of the long summer vacation and your child is heading back off to college. The day before leaving, said child delivers a request: “Hey, I forgot to do my laundry; can you wash it so it’s clean for college”? You head down to the bedroom and are greeted (that’s not really the right word here, but you get the idea) with a pile of dirty clothes in the middle of the floor that’s nearly as tall as you. What do you do (apart from burning it all)? Of course, you know that some of these clothes are washed in hot water, some in cold, that some can be put in the dryer and some need air drying, and so on.  To make this a clustering scenario, we have to assume that your child, who is going to be doing all the work here, has no clue about how to group things for laundry. Not a stretch in this scenario. Let’s treat this as a learning opportunity and put them to work. Tell them that you want them to group all the clothes into three piles using a very specific approach. Ask them to pick three different items of clothing at random that will be the starting points for those three separate piles (clusters in fact). Then, go through that massive initial pile and look at each item of clothing in turn. Ask them to compare the attributes of “water temperature” and “drying temperature” and “color” with each of the three starting items and then place the new item into the best pile. The definition of “best pile” is based not on the whole pile, but purely on that starting item (technically that starting item is known as a centroid, but we’ll get to that later). The items won’t be identical, so pick the best match. If you want to be really detailed about this, ask them to place each item closer to or farther away from the starting item, based on how similar they are (the more similar, the closer together). Now you’ve got three piles, which means your child is ready to use the washing machine. Not so fast, we’re just getting started. This is a very iterative process, and the next step would be to determine the new centroids of each of those piles and repeat the process. Those new centers would be calculated and may not correspond to an actual item of clothing. And we don’t really know if three piles is the optimal number, so once your child has completed iteration with three piles, they should go and try four, five, and more. Clustering in general, and perhaps this algorithm in particular, is not a good technique for sorting laundry. Let’s look at a more realistic example using K means clustering and start working with data, not dirty socks and laundry labels. How Does K-Means Clustering Work in Machine Learning, Exactly? I’ll reuse the same data table that we had for the decision trees example. But this time, instead of trying to predict customer churn, we’re going to use clustering to see what different customer segments we can find. Customer ID Age Income Gender Churned 1008764 34 47,200 F Yes I’m initially going to work with just two columns or attributes: age and income. This will allow us to focus on the method without the complexity of the data. Having two attributes enables us to work with a two-dimensional plane, and easily plot data points. Is the data normalized? Let’s say that ages range up to 100 and incomes up to 200,000. We’ll scale the age range 0..100 to 0..1, and similarly salary 0..200,000 to 0..1. (Note that if your data has outliers like one person with an income of 800,000, there are techniques to deal with that; I just won’t cover them here). The first thing we do is pick two centroids, which means that we’re going to start with K=2 (now you know what the K in k-means represents). These represent the proposed centers of the two clusters we are going to uncover. This is quite a simple choice: pick two rows at random and use those values. If you’re building a mental representation of this process in your head, then you’ve got a piece of paper with two axes, one labeled “Age” and the other labeled “Income.” That chart has two color coded points marked on it, representing those two centroids. Now we can get to work. We start with the first row and determine the Euclidean distance (which is a fancy way of saying we’ll use Pythagoras’s theorem) between the point in question and both centroids. It will be closer to one of them. If you’re visualizing this in your head, you can just plot the new point on the chart, and color code it to match the closer centroid (diagram below). Repeat this process for every row that you want to work with. You’ll end up with a chart with all points plotted and color coded, and two clusters will be apparent. Once all the points have been allocated to a cluster we have to go back and calculate the centers of both clusters (a centroid doesn’t have to be in the center of the cluster, and in the early iterations could be some distance away from the eventual center). Then go back and repeat the calculations for each and every point (including those initial two rows that were your starting centroids). Some of those points will actually be nearer to the other cluster and will effectively swap clusters. This, of course, will change the centers of your clusters, so when you’ve completed all the rows for a second time, just recalculate the new centroids and start again. In the diagram below you can see two clusters with the eight-point star showing the location of the new centroids which will be the basis for the next iteration. The arrows show the “movement” of the centroid from the previous iteration. As with all iterative processes you need to figure out when to stop. Two obvious stopping points would be: Stop after some number of iterations (10, for example) Stop when the clusters are stable (when there is little or no movement of the centroids after each iteration, which would also mean few or no points “swapping clusters”) Is K=2 or two clusters optimal for this data set? Good question. At this stage you don’t know. But you can start finding out by repeating the whole process above with K=3 (i.e. start with 3 random centroids). And then go to four, and five and so on. How to Optimize Your K-Means Clustering Fortunately, this process doesn’t go on forever. There’s one more calculation to do: for each value of K, measure the average distance between all the data points in a cluster and the centroid for that cluster. As you add more clusters you will tend to get smaller clusters, more tightly grouped together. So that “average within-cluster distance to centroid” will decrease as K increases (as the number of clusters increases). Basically this metric gives you a concrete measure of how “good” a cluster you’ve got, with lower numbers meaning a more tightly grouped cluster with members that are more similar to each other. If you increase K to equal the total number of data points you have (giving you K clusters, each with one member) then that distance will be zero but that’s not going to be useful. The best way to find the optimal value of K is to look at how that “average within-cluster distance to centroid” decreases as K increases and find the “elbow”. In this chart I’d say that the elbow is at K=3, though you might prefer to use K=4. There doesn’t look to be much point going with five or more clusters because the incremental improvement is minimal. In this example I used just two attributes to build the clusters. This has the advantage of being easy to visualize on a chart. What would those clusters represent? In this simple example with just ages and incomes, it might not be very illuminating, but maybe I’d find that I have a segment of younger customers with relatively higher incomes (and presumably higher disposable incomes) that I could target with promotions for suitable luxury goods. In general, you’d probably get more value from segments defined with more attributes. A Real-Life Example for K-Means Clustering With three attributes things are a little harder to visualize in 2D. Below there’s a cluster diagram for cars using fuel economy (in miles per gallon), peak power (horsepower) and engine displacement (in cubic inches) for a selection of cars from the early 1980s. Real use cases for k-means clustering might employ 100s or 1000s of attributes and while hard to visualize on a single chart, they are easily computed. They are also likely to be much more useful. In the customer segmentation example starting about half way down this machine learning article somebody obviously used many more attributes than just age and income. If you’d like to find out more about Oracle’s machine learning technologies you can look at our support for R as well as Advanced Analytics inside Oracle Database.  If you're ready to get started with machine learning, try Oracle Cloud for free and build your own data lake to test out some of these techniques.   

Following on from our previous article about the decision tree algorithm, today we're exploring clustering within the realm of machine learning and big data. Specifically, let’s look at the commonly...

Analytics

3 Tips for Machine Learning Success

Machine learning is a big buzzword right now. If you’d like to learn more about what it is, we have a blog series you can read about machine learning techniques. But if you’re interested in using your big data for machine learning and starting a real project, this is the blog post for you. We talked to Brian MacDonald, Data Scientist on Oracle’s Information Management Platform Team about tips for success for your machine learning project. Three Tips for Machine Learning Success Our tips for machine learning success include: Have a specific business problem Gain executive support Identify short-term, measureable business benefits Let’s get into it. Register for a free trial to build a data lake and try machine learning techniques  Success Tip #1: Have a Specific Business Problem We most often see success with machine learning projects when the company has a very specific business problem and they’re willing to do anything that’s needed to solve that problem. Brian explained, “Now one of the companies I worked with wanted to create a better churn model because they were losing customers rapidly, which was affecting profits. They had a lot of data, and they believed they could use machine learning to help them identify the customers who were about to leave. If you have a $100 million problem, spending $30 million isn’t a big deal.” That’s the kind of company that tends to see success. Having that laser focus and need to find a solution helps ensure success because there's no other option.  On the other hand, some organizations have problems in general but they’re not sure how to go about it. They might hear the buzz about machine learning and decide they can use it. But although machine learning can do wonders, it can also be very complex and is rarely an easy fix. Once they discover that, these organizations sometimes decide against going forward. Or if they start, they sometimes founder midway.  When you’re going into your project, you need to keep the end goal in mind at all times. Take a look at our big data solutions if you want inspiration. Success Tip #2: Gain Executive Support This is related to the first tip above. But the number one factor we see for success is having high-level execs who support your machine-learning initiative. You want a C-suite that says machine learning analytics and data are important to your business. If you have that support and vision, your program is much more likely to be successful. On the other side, when your program is driven by IT, what you tend to hear is something like, “We’ve never done this internally and I don’t know how to sell it.” This type of approach is less likely to be successful, not because the technology isn’t working, but because of internal issues. For example, because machine learning is in essence automated decision making, sometimes people can view it as a means to replacing their own jobs. If employees at your company are worried about replacing jobs or lowering the head count they, you’re not going to get a reception that’s as strong. And keeping this in mind is important, because then you can decide how to counter this kind of attitude. But that’s another reason why having executive support is so important. It becomes a way to go around that attitude, more easily.  In essence, you often need backing and money to make a machine learning initiative a true success. Success Tip #3: Identify Short-Term, Measurable Business Benefits When you’re starting out, you’ll want to start with a concrete business benefit like increasing sales. That’s an example of a business benefit that’s tangible, that everyone can see, and which won’t take too long to identify. The length of time it will take really depends on your goal, but it should be less than a year. Some say it should take no longer than four to eight weeks for the project to prove success. If there's no success, then it's time to move on. If you don’t have a business value that’s measurable, question why you’re doing this because at some point you’re going to have to justify your project. Some people might say things like, “We think machine learning is the future” or “We need to develop those skills.” Well, that’s investing in building skills and R&D for the future, and that’s a business benefit. However, whether you have the assets to spend on that kind of research really depends on your company size and corporate strategy, and you should really try to align with that before you start. Real-Life Machine Learning and Big Data Example Here at Oracle, we’ve been fortunate enough to see many success stories with machine learning. But here’s one example from the energy industry that stands out. This company is a leading supplier of systems for power generation and transmission, and is one of the world's largest producers of energy-efficient, resource-saving technologies. Business challenge Using data to potentially predict future failures in power generation units. These predictions can then be used to sell services to their customers, who are the owners of those units. Technology challenge The company wanted to differentiate themselves from their competitors by making the power generation equipment better serve their customers. One approach they’re taking is to use the data from the power generation units to predict future failures and help customers improve maintenance schedules to eliminate outages and costly expenses. This company purchased Oracle Advanced Analytics, which is also available in the cloud and part of the Autonomous Data Warehouse, to help them add predictive modeling capabilities to the services they offer to their customers.  They were successful in large part because they were so focused. They had a very specific business problem, they got their executives behind the goal, and they identified short-term, measurable business benefits. There’s another item you might want to add to that list: purchasing the right machine learning technology, which can often contribute greatly to the success of your project. So choose carefully and wisely, and contact us if you’re interested in our machine learning capabilities. We're here to help you make your machine learning project successful.  And if you'd like to try building a data lake and use machine learning on the data, Oracle offers a free trial. Register today to see what you can do.  

Machine learning is a big buzzword right now. If you’d like to learn more about what it is, we have a blog series you can read about machine learning techniques. But if you’re interested in using your b...

Big Data

Big Data In The Cloud: Why And How?

Lowered Total Cost of Ownership, Total Flexibility, Hyper-Scale, and the Oracle Advantage Algorithms have a huge influence on our daily lives and our future. Everything from our social interactions, news, entertainment, finance, health, and other aspects of our lives are impacted by mathematical computations and nifty algorithms, and big data is a significant part of what makes this possible. We’re now in the era of machine learning and artificial intelligence, one more time. But unlike our previous attempt in the 1960s and 1980s, things are different this time. Of course they are. Thanks to Moore's Law, transistor density continues to increase while storage costs continues to drop. But that, by itself, is not enough to ensure success. Sign up for a free trial to build and populate a data lake in the cloud This time around, we have distributed computing which has also come a long way with new computing paradigms such as Hadoop, Spark, TensorFlow, etc. being developed every day, both within academia and large corporations. These advancements have enabled us to build systems that are way more powerful than anything that can be achieved with a single machine with the most powerful processor inside, which is what was attempted in the 60s. It enables us to do more with our big data than we’ve ever been able to before. But there are two other advances that are playing a huge role in this revolution: Open-source software Cloud computing Data is Fueling the Future Explosive digitization of the physical world is creating an unprecedented amount of data. There’s a deluge of data that needs to get stored and processed. This data comes from: Online social networks User generated content Mobile computing Embedded sensors in everyday objects Automation of routine tasks And much more The action of analyzing and processing this data is what makes the algorithms smart and efficient, which then gets applied to others areas of our life. And thus, there is this virtuous cycle of self-fulfillment. According to some estimates, 2/3 of all digital content is user generated and has been created in the last few years. According to Intel, autonomous cars will generate 4000GB of data per car per day. Soon this will be more production and consumption of data than humans can generate and consume. This is the foundation of a smarter, better, and more efficient planet. If algorithms are the engine that is going to drive us to a better future, then data is the gas in the tank. But how are companies using their gas? Challenges to Utilizing Big Data Traditionally, companies have preferred to build out their own server farm, deploy, run, and manage systems themselves. But as this data volume grows and the goal of extracting value out of this massive data set involves complex and sophisticated machine learning and AI algorithms, it is becoming more challenging in terms of operations and total cost of ownership (TCO) to maintain this deployment. In addition to the TCO, there are challenges with agility and flexibility. From a hardware perspective, there is the sunk cost of buying machines and provisioning for peak load which affects utilization. Longer procurement cycles mean the predictions for growth will have to be accurate and there is no room for error. This limits elasticity of the infrastructure and thus curbs models for experimentation and ad-hoc applications. In short, here are some major challenges with the traditional model: Low hardware utilization Lack of multi-tenancy support No self-serve model Slow onboarding new applications/users Low bandwidth network High OPEX Lack of big data skills and expertise The Answer: Oracle Big Data Cloud For organizations managing this growing volume of data and trying to gain insights/value, the right answer is turning to public cloud computing using open source software (OSS). However, in certain cases, due to reasons such as organizational concerns, security issues, regulations in industries, or sovereignty rules such as the EU’s GDPR, not all big data deployments can move to the public cloud as is. Hence, the power, value, and flexibility of Oracle’s Big Data Cloud, which is a modern platform for big data management with support for modern as well as traditional frameworks for: Analytics Business intelligence Machine learning Internet of Things (IoT) Artificial intelligence (AI) This is the only PaaS service of its kind that addresses the scenarios mentioned previously through two very special offerings: Big Data Cloud: The most comprehensive, secure, performant, scalable, and feature-rich public cloud service for big data in the market today. And we have only gotten started building out this platform so expect more goodness down the road. Cloud At Customer: For customers who cannot move to the public cloud, Oracle Cloud Machine can bring the public cloud to their own data center and provide the same benefits including having Oracle manage the cloud machine. This is a unique service that no other cloud provider offers. Oracle Big Data Cloud brings the best of open source software to an easy-to-use and secure environment that is seamlessly integrated with other Oracle PaaS and IaaS services. Customers can get started in no time and do not require in-house expertise to put together a solution stack by assembling open source components and other third-party solutions. 7 Key Features of Oracle Big Data Cloud: 1. Advanced Storage: Build a data lake with all your data in one centralized store with advanced storage options. Smart caching allows for extreme performance. Provide your entire organization with access to all of the data sets in a secure and centralized environment. There is built-in data lineage and governance support. It is the easiest way to scale out storage independent of compute clusters. 2. Advanced Compute: Spin up or down compute clusters (Apache Hadoop, Apache Spark, or any analytic stack) within minutes. Auto-scale your clusters based on triggers or metrics. Use GPUs for deep learning. 3. Built-in ML and AI Tools: Data science tools such as Zeppelin come with the service to enable scientists to experiment and explore data sets. As mentioned earlier, there are compute shapes available with full GPU support for advanced algorithms and training in deep learning. A diverse catalog of machine learning libraries such as OpenCV, Scikit, Pandas, etc. makes it easy to build your next intelligent product. 4. Strong Security: Oracle Identity Cloud Service provides a way to allow granular access on a per-user basis and there are audit facilities built in as well. There is full encryption support for data-in-motion and data-at-rest. Sophisticated SDN allows customers to define their own network segments with advanced capability such as custom VPN, whitelisted IP, etc. 5. Integrated IaaS and PaaS Experience: Easy access to other Oracle Cloud Application Development services such as Oracle Event Hub Cloud Service, Oracle Analytics Cloud, Oracle MySQL Cloud, etc. Customers also have the option of using Oracle Cloud Infrastructure to back up Oracle Storage Cloud or create private VPNs to connect on-premise applications with services running in Oracle Public Cloud. 6. Fully Automated: The entire lifecycle of your infrastructure is automated. Our goal is to help you focus on the real differentiator, your data and your application. The platform will take care of all the undifferentiated work of provisioning, managing, patching, etc., so you can focus on your business. 7. World-Class Support: With an integrated approach, Oracle provides a one-stop shop for all things big data including support for Hadoop. Customers will not have to deal with multiple vendors to manage their stack. For more information on Oracle’s Cloud Platform – Big Data offerings, please visit the Oracle Big Data Cloud webpages. And for the most advanced public cloud service, you can visit the Oracle Cloud Platform pages. Or, sign up for an Oracle free trial to build and populate your own data lake. We have tutorials and guides to help you along.  Please leave a comment to let us know how we are doing.  

Lowered Total Cost of Ownership, Total Flexibility, Hyper-Scale, and the Oracle Advantage Algorithms have a huge influence on our daily lives and our future. Everything from our social interactions,...

Big Data

What Is Object Storage?

Hadoop was once the dominant choice for data lakes. But in today’s fast-moving world of technology, there’s already a new approach in town. And that’s the data lake based on Apache Spark clusters and object storage. In this article, we’ll be taking a deep dive into why that has happened and the history behind it, and why exactly Apache Spark and object storage together is truly the better option now.  The Backdrop to the Rise of Object Storage With Apache Spark A key big data and data-lake technology, Hadoop arose in the early 2000s. It has become massively popular in the last five years or so. In fact, because Oracle has always been committed to open source, our first big data project five or six years ago was based on Hadoop. Try building a fully functioning data lake - free Simply put, you can think of Hadoop as having two main capabilities: a distributed file system, HDFS, to persist data. And a processing framework, MapReduce, that enables you to process all that data in parallel. Increasingly, organizations started wanting to work with all of their data and not just some of it. And as a result of that, Hadoop became popular because of its ability to store and process new data sources, including system logs, click streams, and sensor- and machine-generated data.  Around 2006 or 2007, this was a game changer. At that time, Hadoop made perfect sense for the primary design goal of enabling you to build an on-premises cluster with commodity hardware to store and process this new data cheaply. It was the right choice for the time - but it isn't the right choice today. The Emergence of Spark The good thing about open source is that it’s always evolving. The bad thing about open source is that it’s always evolving too. What I mean is that there’s a bit of a game of catch-up you have to play, as the newest, biggest, best new projects come rolling out. So let’s take a look at what’s happening now. Over the last few years, a newer framework than MapReduce emerged: Apache Spark. Conceptually, it’s similar to MapReduce. But the key difference is that it’s optimized to work with data in memory rather than disk. And this, of course, means that algorithms run on Spark will be faster, often dramatically so. In fact, if you’re starting a new big data project today and don’t have a compelling requirement to interoperate with legacy Hadoop or MapReduce applications, then you should be using Spark. You’ll still need to persist the data and since Spark has been bundled with many Hadoop distributions, most on-premises clusters have used HDFS. That works, but with the rise of the cloud, there’s a better approach to persisting your data: object storage. What Is Object Storage? Object storage differs from file storage and block storage in that it keeps data in an “object” versus a block to make up a file. Metadata is associated to that file which eliminates the need for the hierarchical structure used in file storage—there is no limit to the amount of metadata that can be used. Everything is placed into a flat address space, which is easily scalable.  Object storage offers multiple advantages. Essentially, object storage performs very well for big content and high stream throughput. It allows data to be stored across multiple regions, it scales infinitely to petabytes and beyond, and it offers customizable metadata to aid with retrieving files. Many companies, especially those running a private cloud environment, look at object stores as a long-term repository for massive, unstructured data that needs to be kept for compliance reasons. But it’s not just data for compliance reasons. Companies use object storage to store photos on Facebook, songs on Spotify, and files in Dropbox. The factor that likely makes most people’s eyes light up is the cost. The cost of bulk storage for object store is much less than the block storage you would need for HDFS. Depending upon where you shop around, you can find that object storage costs about 1/3 to 1/5 as much as block storage (remember, HDFS requires block storage). This means that storing the same amount of data in HDFS could be three to five times as expensive as putting it in object storage. So, Spark is a faster framework than MapReduce, and object storage is cheaper than HDFS with its block storage requirement. But let’s stop looking at those two components in isolation and look at the new architecture as a whole. The Benefits of Combining Object Storage and Spark What we recommend especially, is building a data lake in the cloud based on object storage and Spark. This combination is faster, more flexible and lower cost than a Hadoop-based data lake. Let's explain this more. Combining object storage in the cloud with Spark is more elastic than your typical Hadoop/MapReduce configuration. If you’ve ever tried to add and subtract nodes to a Hadoop cluster, you’ll know what I mean. It can be done but it’s not easy, while that same task is trivial in the cloud. But there’s another aspect of elasticity. With Hadoop if you want to add more storage, you do so by adding more nodes (with compute). If you need more storage, you’re going to get more compute whether you need it or not. With the object storage architecture, it’s different. If you need more compute, you can spin up a new Spark cluster and leave your storage alone. If you’ve just acquired many terabytes of new data, then just expand your object storage. In the cloud, compute and storage aren’t just elastic. They’re independently elastic. And that’s good, because your needs for compute and storage are also independently elastic. What Can You Gain With Object Storage and Spark? 1. Object Storage + Spark = Business Agility All of this means that your performance can improve. You can spin up many different compute clusters according to your needs. A cluster with lots of RAM, heavy-duty general-purpose compute, or GPUs for machine learning – you can do all of this as needed and all at the same time. By tailoring your cluster to your compute needs, you can get results more quickly. When you’re not using the cluster, you can turn it off so you’re not paying for it. Use object storage to become the persistent storage repository for the data in your data lake. On the cloud, you’ll only pay for the amount of data you have stored, and you can add or remove data whenever you want. The practical effect of this newfound flexibility in allocating and using resources is greater agility for the business. When a new requirement arises, you can spin up independent clusters to meet that need. If another department wants to make use of your data that’s also possible because all of those clusters are independent. 2. Object Storage + Spark = Stability and Reliabilty There’s a joke doing the rounds that while some people are successful with Hadoop, nobody is happy with it. In part that’s because operating a stable and reliable Hadoop cluster over an extended period of time delivers more than its share of frustration. If you have an on-premise solution, upgrading your cluster typically means taking the whole cluster down and upgrading everything before bringing it up again. But doing so means you’re without access to that cluster while that’s happening, which could be a very long time if you run into difficulties. And when you bring it back up again, you might find new issues. Rolling upgrades (node by node) are possible, but they’re still a very painful and difficult process. It's not widely recommended. And it's certainly not for the faint of heart. And it’s not just upgrades and patches. Just running and tuning a Hadoop cluster potentially involves adjusting as many as 500 different parameters. One way to address this kind of problem is through automation. Indeed, Oracle has taken this path with the Oracle Big Data Appliance. But the cloud gives you another option. Fully managed Spark and object storage services can do all that work for you. Backup, replication, patching, upgrades, tuning, all outsourced. In the cloud, the responsibility for stability and reliability is shifted from your IT department to the cloud vendor. 3. Object Storage + Spark = Lowered Total Cost of Ownership Shifting the work of managing your object storage/Spark configuration to the cloud has another advantage too. You're essentially outsourcing the annoying part of the work that no one else wants to do anyway. It's a way to keep your employees engaged and working on exciting projects while saving on costs and contributing to a lowered TCO.  The lower TCO for this new architecture is about more than reducing labor costs, important though that is. Remember that object storage is cheaper than the block storage required by HDFS. Independent elastic scaling really does mean paying for what you use. Conclusion We boil down the advantages of this new data lake architecture built on object storage and Spark to three:  Increased business agility More stability and reliability Lowered total cost of ownership All of this is incredibly significant. So if you'd like to try building a fully functioning data lake with this new data lake architecture on Oracle Big Data Cloud, give the free trial a spin. We provide a step-by-step guide to walk you through it. Or, contact us if you have any questions, and we'll be happy to help.

Hadoop was once the dominant choice for data lakes. But in today’s fast-moving world of technology, there’s already a new approach in town. And that’s the data lake based on Apache Spark clusters and...

Analytics

Big Data: The Future Is Cloud

Big data technologies have it rough. MapReduce may have been the favorite child for a few years—but Apache Spark has been rising rapidly. This is how it is with big data. The technology changes rapidly, new projects usurp old ones—and that’s what makes it so exciting. Today, we’re going to talk about the trends that drive the big data and cloud convergence, and what’s significant about it. Download your free ebook on creating an effective cloud strategy But first, we’re going to look at some of the industry trends that are making big data truly viable, important and central to many organizations. Some History for the Big Data/Cloud Convergence Some history: SQL relational data has been fundamental to computing within the business context for the last 30 years. These transactional systems have powered enterprise data management and e-commerce, and really were the initial engines for the internet era. This remains a key part of the infrastructure for data management today. But compute technology has expanded. And as more devices and more Internet services have become available, new types of data have been generated – the multi-source, multi-structured data from machine sensors, connected devices, clickstreams, system logs, etc. You know, it’s the data we all love that doesn’t fit well into the relational paradigm and scales out in incredible ways, going from terabytes of data to petabytes. It’s the data CEOs point to and ask, “Why aren’t you doing anything with this?” while you’re thinking, “Easier said than done.” It’s this new type of data, and these new sources of data that drove the popularity of NoSQL databases and Hadoop over the past 10 years. To get value for the business, these traditional data sources and these new sources of data need to be brought together for more kinds of insights, so shiny technologies like machine learning and artificial intelligence can use them together. The Future of Big Data Lies in the Cloud So here’s the next generation of big data technology: it should be possible to manage both traditional and new data sets together on a single cloud platform. This allows you to use the data storage, the object store that’s native to the cloud infrastructure, and the compute capabilities to the cloud infrastructure, out of the box. No more setting up and managing Hadoop clusters, no more provisioning hardware. This is big. It’s a paradigm shift in how you think about data management because now, the cloud is the data platform. It also enables you to allow any user to work with any kind of data quickly, securely and efficiently in a way that fits your immediate business needs. So, what do you need to make this happen? Integrate, Manage, and Analyze Your Big Data You have all these data sets that are being generated in data sources across the business landscape, across the Internet landscape. The first thing you need to do is integrate them and bring them into your system. The second thing you need to do, at a high level, is manage them. You need to have a place to store them. And third, you absolutely need analytics. You need high-powered analytics that allow you to understand the data, visualize the data, make sense of the data, and then build proactive models based on machine learning that allow you to get ahead of the business requirements and interact with data sets as events are happening in real time. Next I’m going to drill down into each of these areas to give you a view of what’s needed in each of these areas. First, I’m going to talk about big data integration. Big Data Integration Data integration has always been important, whether it was with traditional databases or with data warehouses. In the same way, today it’s still important with big data. But it’s more complicated than ever, with more data sources, types, problems and frameworks. You’ve always had data integration, but now you have to make it work with big data. You need to be able to: Touch the data as it’s being generated Bring the data into the system through event streams Process the data as it’s coming Make sure the data is formatted and available in a form that can be consumed immediately to get analytic value from the data sets One of the problems you don't want to experience as you're working with large data sets is data quality problems. When you're bringing data in, you want to have assurance that what you're working on is meaningful, so as you start to apply machine learning algorithms, as an example, you have confidence in the answers you're getting because you have confidence in the data. As a baseline requirement, you need to be able to bring the data in. You need to be able to transform it. You also need to be able to work with streaming data sets and non-relational data sets. Then you also need to work with both of these in a way that you can guarantee the overall data quality in the system. And that’s why you need powerful data integration. Big Data Management After the hard work of data integration is done, you need to be able to manage it. You need to be able to put it somewhere and keep it secure, but make it available to those authorized to use it. The new paradigm data lake is really built on the cloud object store. You can store any kind of data in the object store. You can store it in any form you want, and you can bring whatever processing requirements and process engines you need on demand to those data sets. This is a key evolution in the big data architecture as we know it today. I’ll explain how it’s significant. If you’re familiar with big data platforms that have been deployed in the past three to five years, often people had to go out, provision hardware, fix capacity, deploy a Hadoop platform – and all along, they were constrained by the capabilities of the Hadoop platform vendor they were using. But cloud infrastructure allows you to deal with your compute requirements, spin up resources and spin them down automatically. You don’t have to handle upgrades. You don’t have to worry about capacity planning. If your central data lake technology is based in the object store, you can push out to alternative storage systems, like relational databases or NoSQL stores as needed. After the data’s stored and available in the data lake, you can process it with various open source technologies. But that’s not the most exciting part. Hadoop became popular because of its storage capability and its compute ability with MapReduce. But for Hadoop, storage and compute are inextricably tied together when it come to scaling up and down. If you need more compute capability, you have to pay for more bulk storage too, and vice versa. Today’s modern data lake architecture, which is only possible in the cloud, has Apache Spark as its framework and object storage as its bulk storage. This is big, because they can scale elastically and most importantly, they can elastically and independently of each other. This means freedom from the necessity of scaling both whether it was truly needed or not. As another benefit, object storage is cheaper and more flexible than HDFS which relies block storage. In fact, block storage can often be two to five times more expensive than object storage. With object storage on the cloud, you can bring the compute to the data while you need it. And when you’re done with the workloads and the processing that’s required with that particular compute cluster, you can spin it down which helps you control the cost more. Elasticity is a native feature of the cloud, and it shifts the way you think about provisioning and the need to plan for capacity. It removes many of the constraints and shackles that have been in place for existing big data systems today. Essentially, a data lake built in the cloud is more cost effective, faster, and more flexible. Big Data Analytics Having more data gives you the potential to understand your customers better and tackle problems you’re trying to solve. But you still have to discover which questions can be answered. Existing analytics tools are enhanced to help you understand the new kinds of data sets you’re collecting. Visualization falls into that category—it enables you to explore the format of your data, transform it, tweak it, and better prepare it. And machine learning is a buzzword right now, sure, but it’s such a big buzzword because of what it can accomplish. You can take your big data and train models based on that data, and gain better results because you have so much data to feed it. But machine learning can also be used to improve the analytic tools themselves, so you can uncover new things about your data that you haven’t been able to uncover before, which is truly exciting. You can use machine learning to examine your data and automatically suggest useful visualizations and ways to think about and explore your data. And, similar to recommendation engines on e-commerce sites on the internet that suggest other items you might be interested in, machine learning can enable discovery in the patterns of usage of the data itself, so you can have recommendations in real time about issues that a business user might want to know about. For example, for a sales executive, the system may automatically send information on the probability of achieving a sales target based on a deal that just closed and intuitively sense that these are the kinds of information that he or she might be interested in. Three Key Takeaways on Big Data If you’ve made it this far into the article, congratulations! But even if you don’t remember anything else, I would like you to remember three things: There really is a generational shift now in how we’re thinking about big data processing. The big data platform of the future is highly performant, scalable, elastic—and in the cloud. You really don’t need to stand up and maintain your own big data infrastructure anymore, since all the capabilities you need are available in the cloud today. You need a complete big data platform to help you with this, all the way from the ingest and integrate capabilities to analytics. All of these should work together end-to-end in an integrated fashion on the cloud infrastructure, while using next-generation data lake architecture. AI and machine learning – they may be hyped, but they’re not just a flash in the pan. These capabilities are available today. They’re performant, and if you choose the right platform, they’re made easy for you to start taking advantage of today. If you're interested in making your cloud strategy for big data more effective, download your free Forrester white paper today, "Going Big Data? You Need a Cloud Strategy." Or, try building a data for free with an Oracle trial.    

Big data technologies have it rough. MapReduce may have been the favorite child for a few years—but Apache Spark has been rising rapidly. This is how it is with big data. The technology changes...

Machine Learning

A Simple Guide to Oracle’s Machine Learning and Advanced Analytics

Many times I'm asked for more information on how to get started with Oracle’s Machine Learning and Advanced Analytics. I put together this simple guide of the most popular and useful, in my opinion, links to product Information and getting started links and resources including: Oracle Machine Learning Zeppelin based SQL notebooks, included in the Oracle Autonomous Data Warehouse Cloud (ADWC) Oracle Advanced Analytics Database Option (OAA), included in Oracle Database Cloud High and Extreme Editions Oracle Data Mining (SQL API Machine Learning functions) Oracle Data Miner "workflow" UI (for Citizen Data Scientists) SQL Developer extension Oracle R Enterprise (R API to ODM SQL ML functions, R to SQL "push down" and R integration) Oracle R Advanced Analytics for Hadoop (ORAAH) (part of the Big Data Connectors)  OOW'17 Oracle's Machine Learning & Advanced Analytics Presentations      Oracle Advanced Analytics Overview Information  Oracle's Machine Learning and Advanced Analytics 12.2c and Oracle Data Miner 4.2 New Features presentation Oracle Advanced Analytics Public Customer References Oracle’s Machine Learning and Advanced Analytics Data Management Platforms:  Move the Algorithms;  Not the Data white paper on OTN   YouTube recorded Oracle Advanced Analytics Presentations and Demos, White Papers  Oracle's Machine Learning & Advanced Analytics 12.2 & Oracle Data Miner 17.2 New Features YouTube video Library of YouTube Movies on Oracle Advanced Analytics, Data Mining, Machine Learning (7+ “live” Demos e.g.  Oracle Data Miner 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.) Overview YouTube video of Oracle’s Advanced Analytics and Machine Learning   Getting Started/Training/Tutorials Link to OAA/Oracle Data Miner Workflow GUI Online (free) Tutorial Series on OTN Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN Link to Try the Oracle Cloud Now!   Link to Getting Started w/ ODM blog entry Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.  Oracle Data Mining Sample Code Examples ORAAH Online Training      Additional Resources, Documentation & OTN Discussion Forums Oracle Advanced Analytics Option on OTN page OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog Oracle SQL based Basic Statistical functions on OTN Oracle R Advanced Analytics for Hadoop (ORAAH) on OTN   Analytics and Data Summit 2018, March 20-22, 2018, at Oracle HQ in Redwood Shores, CA. All Analytics. All Data. No Nonsense.  User Conference March 20 - 22, 2018, Redwood Shores, CA Hope this helps! Charlie Charlie Berger | Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics, Oracle Corporation Phone: +7817440324 | Mobile: +6033204560 | 10 Van de Graaff Drive | Burlington, MA 01803 LinkedIn:  www.linkedin.com/in/CharlieDataMine Oracle Machine Learning and Advanced Analytics on OTN Oracle Big Data Blog, Oracle Data Mining Blog, Twitter:  CharlieDataMine Oracle Advanced Analytics internal PM Beehive Workspace Analytics and Data Summit 2018  All Analytics.  All Data.  No Nonsense.  User Conference, Mar 20-22, 2018 - Join us!

Many times I'm asked for more information on how to get started with Oracle’s Machine Learning and Advanced Analytics. I put together this simple guide of the most popular and useful, in my opinion,...

Cloud

Demystifying Machine Learning: An Overview

Have you ever had a credit card transaction declined when it shouldn’t have been? Or been on the receiving end of a personalized email or web ad? Have you ever noticed a site giving you recommendations for things you might be interested in when you're shopping online? And my last example, have you ever had an offer from a company designed to stop you from leaving them as a customer? If any of these things have happened to you, then you’ve probably been on the receiving end of a machine learning algorithm, employed by a company you do business with (or in some cases, have merely considered doing business with). An Overview of Machine Learning We’re going to take you behind the scenes and give you a layman’s view of machine learning so you can see what kind of problems they can solve. If you’re a data scientist, then you might be more interested in this big data journey about accelerating data science, which is more detailed. But this article is designed for technical people who hear the buzzword, who know that it's something important, but don't really know what it is or what it can do. You'll get just enough information to make you dangerous. Download your free ebook, "Demystifying Machine Learning." What Is Machine Learning? A McKinsey article describes machine learning as "...based on algorithms that can learn from data without relying on rules-based programming". Put another way, with big data you've got a lot of data. Determining what do to with it and figuring out what it’s telling you isn’t easy. So you can understand the appeal of machine learning, which basically allows you to find processing power and the right algorithm and tell them to figure things out for you. The analogy is how we learn as human beings, experiencing the world around us and working things out for ourselves. When I taught my kids how to ride a bike, I didn't give them "The Rules of Bike Riding". I put them on a bike, held onto it/them, and let them work it out. They took data inputs from their eyes, their ears and, on one occasion a large bush, and started to discover what would keep the bike upright. So it is with machine learning. Take the data, work with it and see what comes out. Uses of Machine Learning Supposed you've been tasked with finding out more about your customer base. In the snapshot above, you can see they're a diverse and pretty happy lot. But what else do you know? Well, a simple query in your database might reveal things like age, gender or how they like to be contacted (mail, email, phone, text). You could run a query with some analytics to calculate, say, RFM, a measure of customer value based on how much customers spend and how often they spend. You can see who is more valuable to you, but you wouldn't really know what to do with that data. Machine learning algorithms could do much more. For example, you could group your customers into segments that show similar behavior, or you could also figure out how likely they are to purchase a given new product of yours. In the picture above, we have customers in five different behavioral segments: "retired cosmopolitan", "affluent executive", "new home mom", "young, successful startup" and "executive product collector". And with your data, you might even know their likelihood of purchasing a product that you're promoting.  Now you have something potentially more powerful. Armed with this information, you can: Tailor your marketing campaigns Use different language for those different groups Prioritize campaigns Market that product only to the subset of customers likely to buy it Machine learning gives you much more insight into your customers and, perhaps most importantly, it can predict what they might do or respond to. These days, we’re starting to take deep learning to another level and use it to solve real business problems—which is very new and exciting. But how does machine learning actually do that? Read parts two and three of this mini-series to learn more. I cover regression, classification, clustering and anomaly in part one, and market basket analysis, time series data and neural networks in part two, with just a bit on neural networks thrown in for good measure. If you'd like to experiment with machine learning techniques in a data lake, you can get started right now with an Oracle free trial. 

Have you ever had a credit card transaction declined when it shouldn’t have been? Or been on the receiving end of a personalized email or web ad? Have you ever noticed a site giving...

5 Things at OpenWorld That Made Me Rethink Oracle

If I had to describe my first Oracle OpenWorld in two words, they would be disruptive innovation. I’m not going to lie. When I first joined Oracle, I thought of it as a 40-year-old database company with an interesting past, but I didn’t have a clear view of its future. But that view has firmly cleared after attending Oracle OpenWorld 2017, my first OpenWorld event. I saw first-hand that Oracle is building cutting-edge, transformative technology into every layer of its product portfolio stack: IaaS (Infrastructure-as-a-Service), PaaS (Platform-as-a-Service), SaaS (Software-as-a-Service), and now DaaS (Data-as-a-Service). What’s more, Oracle has designed its technology to help customers personalize their paths to the cloud through one seamlessly integrated platform that helps users operate critical business applications, all from a single vendor. The result is an IT environment that efficiently and cost-effectively drives business transformation—one that is both secure and scalable. Most importantly, Oracle is taking the lead to help customers unleash their innovation. One way that Oracle is doing this is through automation, which was a pervasive topic at OpenWorld. Attendees wanted to know how automation is going to help them, and they got lots of input! On a related note, Mark Hurd’s keynote at Oracle OpenWorld helped validate how Oracle was unleashing innovation in key ways. “Eighty percent is generous — 85% of IT budgets is spent on just basically keeping the existing systems running, so [there is] very little innovation [budget left for customers],” explained CEO Mark Hurd in his keynote address. “Our objective has been to have the most complete suite of SaaS applications, the most complete suite of PaaS services, and the next generation of Infrastructure-as-a-Service that all work together to complement each other. That’s what we’ve built out, and that’s what we now have.” In other words, Oracle is helping their customers say bye to complexity and hi to simplicity. All of the keynotes energized me, but I also attended various sessions and had the privilege of speaking to many customers and partners around the show floor. They shared specifically why and how Oracle is revolutionizing their businesses. Here are my top five highlights from the event: 1. Keynote by Cloud Business EVP Dave Donatelli During his keynote, “Oracle's Integrated Cloud Platform, Intelligent Cloud Applications, and Emerging Technologies for Business,” Dave shared how Oracle offers a number of options to help organizations evolve their cloud strategies on their own terms. Many of the customers I chatted with shared their journeys, referencing Dave’s six journeys-to-the-cloud model: Optimize your data center: Oracle Engineered Systems is powered by Exadata X-7, which is available as the traditional on-premises machine, as an Oracle public cloud machine placed behind your firewall, or as an Oracle cloud service deployed in our cloud. Cloud at Customer: In this path, the same hardware and software is used in the Oracle Public Cloud but located in your data center and offered as a cloud-like subscription model with no maintenance requirements. Oracle Cloud Infrastructure (IaaS): This is a complete layer of compute, storage, and networking. An important part of what happens here is data management and processing with tools such Oracle’s Big Data Appliance. Create a new cloud with Oracle PaaS and IaaS: Oracle makes it possible for developers to create applications with new technologies such as blockchain and machine learning, and deploy those applications wherever they like. Transform operations with SaaS: Oracle has written all of our SaaS products to have a common data model, which makes it easier to add business processes to your cloud environment on your schedule and without being locked into inflexible packages. It’s also easy to start with Oracle’s “vanilla” SaaS solution and easily customize it to add automation like bot-powered voice commands or augmented-reality for training. Build a “born-in-the-cloud” business: This isn’t just for start-ups. Oracle customers are launching new divisions and units as cloud-first businesses to find the speed, agility, and growth trajectory they need to be successful. Dave also covered the critical importance of the DaaS layer in enterprise technology for targeting and personalizing messaging and offerings to customers, and then measuring results to adjust for even more effective marketing. This is something many companies are doing to grow their revenue and market share, he said. 2. Customer Validation of Oracle Innovation I was struck by the stories of our users. They made it clear that Oracle innovations are making them more successful—for example, Oracle Exadata Database Machine X7’s ability to automatically transform table data into in-memory DB columnar data in Flash cache, enables enterprise research and other big data processing needs at scale. Some of my tweets from the event summarize their stories: Oracle’s range of #PaaS and #IaaS services are enabling researchers to do enterprise research at scale” -@CERN #OOW17 #CloudReady RECVUE is powered by @OracleBigData to help process over 50 million transactions/day for revenue & billing management. #OOW17 #CloudReady I had a chance to hear from Securex, Cloud Architect Winner of the Year at Oracle’s Global Leaders program, as they shared the following: “Today IT must deliver the capabilities for the business to drive agile data analysis and BI in a self-service manner. So we turned to Oracle and the cloud.” I also talked to a lot of customers who were concerned about scaling security. The big announcement of Oracle’s Autonomous Database and Highly Automated Cyber Security, and how they work together to secure data faster and better than any alternatives, caused a lot of excitement. “Oracle has helped make big data a driver of business success,” said Luis Esteban from CaixaBank, winner of the Oracle Innovation and Cloud Ready Infrastructure award. “We now drive better quality, customer knowledge, sales, and mitigate fraud threats.” 3. Big Data, Machine Learning, and Cloud Strategy, Oh My! Big data continues to be top of mind for so many businesses. I attended some key sessions intended to highlight the opportunities of big data and provide practical solutions to common challenges. General Session on Big Data Strategy Of course, many people were looking for ways to glean more value from the volumes of data their organizations collect. Today, successful big data projects are enabling more than 50% of organizations to see increases in revenue or reductions in cost. The big takeaway from this session was that Oracle’s Big Data cloud offerings can scale on-demand so they shift how an enterprise plans for capacity and analysis. Big Data and Machine Learning and the Cloud The lasting impression I have from this session is that it drew back the curtain on the technology that enables key business use cases for big data: innovation, customer insight, operational efficiency, fraud, risk, and compliance. It examined real-world examples of each use case and the technical architecture that supports them. For example, Oracle Big Data Manager is a new feature of the Oracle Big Data offering that uses machine learning to help users identify their most profitable sales opportunities, customers, markets, etc. 4. Feet on the Street: The Show Floor What would a live event be without selfies? I was privileged to tour the show floor and capture some pictures with folks interested in sharing why they came to Oracle OpenWorld: Vital from Charter Communications shares why he’s at #OOW17: “To see how I can leverage #ML, #AI capabilities in @Oracle’s cloud offerings so I can drive more efficiency and worker productivity for the team that I manage.” Why do you love @Oracle? “We have 12.5 million employees so PeopleSoft has been incredibly helpful!” #OOW17 “I’m most impressed with how @Oracle is integrating analytics tools into a single Oracle analytics cloud.” - Fors partner #oow17 #CloudReady   5. I Had Fun! Not only did I see just how powerful Oracle technology is for helping organizations modernize, innovate, and compete in a digital world, but I had fun! I will always remember these words from Larry Ellison during one of the keynote sessions. They represent for me the disruptive innovation I saw in action, and how Oracle is making that disruptive innovation work for customers: "We unify the data, we analyze the data, and we automatically detect and protect your data—all in one unified system." Find out more about these big data innovations, such as the new release of Big Data Appliance X-7 discussed in this blog "Announcing: Big Data Appliance X7-2 - More Power, More Capacity." If you’d like to catch up on the Oracle OpenWorld 2017 keynotes you may have missed, visit https://www.oracle.com/openworld/on-demand.html.

If I had to describe my first Oracle OpenWorld in two words, they would be disruptive innovation. I’m not going to lie. When I first joined Oracle, I thought of it as a 40-year-old database company...

Analytics

So Much Data, So Little Hassle: Building an Infrastructure to Tame Your Data

The promise of big data is essentially unlimited. Organizations across the globe are just scratching the surface of vast data mines to reveal new insights and opportunities. At the same time, distributed data storage and processing power in the cloud means lower cost and more linear scalability to meet needs. But with great promise comes great complexity. Enterprises planning a data management infrastructure to access and analyze big data face challenges that include disparate data access, security and data governance issues, and an IT skills gap. What’s needed is a single view into your data so that data scientists can spend more time analyzing it and less time merging it all together. The solution lies in integrating a unified query system into your streamlined infrastructure that automatically handles processing and joining data behind the scenes to present a clear picture of the data—without hunting across silos or working with multiple APIs and query languages. For example, Oracle Big Data SQL lets data scientists use familiar SQL queries to mine data across Hadoop, NoSQL, and Oracle Database quickly and seamlessly—essentially making big data as manageable as small data. Here’s how three Oracle customers are using innovative infrastructure solutions to tame and draw value from their data: CERN: Visualizing Scientific Discovery CERN’s Large Hadron Collider (LHC) is the world’s largest and most powerful particle accelerator, with 50,000 sensors and other metering devices generating more than 30 petabytes of data annually. This information tsunami is taxing the 250 petabytes of disk storage space and 200,000 computing cores in CERN’s data centers—a problem exacerbated by an essentially flat IT budget. At the same time, research scientists must extract and interpret data from the Hadoop platform, typically without the specialized technical skills such queries require. CERN is using the visualization tools in Oracle Big Data Discovery to transform raw data into insight—without the need to learn complex tools or rely only on highly specialized resources. They use this data to ensure that CERN’s accelerators are operating at their full potential and, if not, to identify what’s required to return them to capacity. Institut Català de la Salut: Dashboards to Drive Better Healthcare With almost 40,000 employees, Institut Català de la Salut is the largest public healthcare provider in the Catalonia region of Spain. In addition to providing care to more than 6 million citizens at hospitals and walk-in clinics across the region, Institut Català conducts research and trains specialists and students. As part of its digital transformation, the organization implemented a high-performance database solution to house and manage vast amounts of strategic, tactical, and operating data on the healthcare services delivered at its network of facilities. Institut Català incorporated Oracle Exadata into its infrastructure to gain the processing power users needed to access real-time data for business intelligence dashboards and reports. Since then, Institut Català has been able to generate more complex data models and reporting than its previous architecture was able to support. The result? Deeper insights across its entire healthcare network, enabling more informed business decisions systemwide based on patient data, staff performance, and real-time inventory information. Procter & Gamble: End-to-End Visibility into Product Performance The consumer packaged goods giant Procter & Gamble may be 178 years old, but it has no intention of letting an outdated infrastructure hinder its data processing and analysis capabilities. P&G’s business teams needed access to a wide variety of big sources about its 66 brands in order to answer high-level questions (“Why is this happening?”) in real time. The company quickly realized that growing volumes of data from structured and unstructured sources would not fit neatly into canonical data models, nor was it willing to spend the vast sums needed to store it all. P&G concluded that it could benefit from a hybrid public-private cloud topology to exploit the flexibility, scale, and cost savings of the public cloud while managing the governance of certain data types with private cloud. P&G chose Oracle Big Data Appliance with Hadoop for its scalability, cost-effectiveness, and ability to handle both conventional and unconventional data sources, including market signals, item sales, market share, surveys, social, demographics, and weather, not to mention new sources that aren’t yet on its radar. In fact, the new solution exposed 150 terabytes of never-before-seen data that has given the company fresh insight into the marketplace. See the Value of an Unobstructed View Data can realize its full value only when it drives insight, and it can only do that when it converges into a single, clear view. If your data remains locked away on disparate platforms with no easy way to access it, you need an integrated infrastructure that can set it free. Learn more about how Oracle Engineered Systems can help you get a single view into all your data. Join Us at Oracle OpenWorld 2017, October 1-5, in San Francisco Don’t miss the excitement of Oracle OpenWorld 2017! Explore the many informative and practical sessions we have scheduled, and take advantage of some of these opportunities to learn more about Oracle’s Big Data offerings and its Engineered Systems: General Session: Oracle Big Data Strategy [GEN5453]  Big data is going mainstream. Today, successful big data projects are enabling more than 50 percent of organizations to see increases in revenue or reductions in cost. In this session explore big data opportunities, discuss what it takes to be successful, and learn about Oracle’s big data strategy and product family. Enterprise Research at Scale: CERN’s Experience with Oracle's Big Data Platform [CON1298] Oracle has been deeply involved with the research community for more than 25 years and continues to lead the industry. It also works to make sure it maintains focus on solving the real problems of customers that rely on Oracle technology, such as CERN. Recent advancements made in the deployment of high-performance computing infrastructure and advanced analytics solutions are focused on accelerating enterprise research at scale. Advancements in key technologies including big data, machine learning/AI, and IoT, coupled with a far more cost-effective and elastic cloud delivery model have radically changed what is possible in data-driven research. Attend this session to learn from CERN's experience with Oracle’s cloud and big data solutions. Oracle Data Visualization: Fast, Fluid, Visual Insights with Any Data [HOL7782] More and more organizations recognize the need to empower users with the ability to ask any analytics question with any data in a truly agile, self-service manner. In this session learn how to use Oracle Data Visualization to quickly discover analytics insights through visualizations that can be built against a variety of data sources. See how easy it is to compose visual stories to communicate findings, without the need for complex IT tools. Also check out these sessions: Extending Garanti Bank’s Data Management Platform with Oracle Big Data SQL [CON1962] Big Data and Machine Learning and the Cloud, Oh My! [CON5462]

The promise of big data is essentially unlimited. Organizations across the globe are just scratching the surface of vast data mines to reveal new insights and opportunities. At the same...

Analytics

3 Companies Harness Big Data to Drive Big Returns

Executives must confront an unsettling reality in the era of digital business: despite growing volumes of data and increasingly sophisticated tools to analyze activities and events, unlocking answers to key questions remains daunting. The irony isn't lost on anyone. Yet, becoming a real-time data-driven company can prove elusive. Better decision-making and fully optimized processes don't just happen because an organization focuses on data, or puts sophisticated tools in place. It's no longer acceptable to spend weeks or months poring over data to glean its significance. Customers, partners, and employees expect answers immediately—and they increasingly demand that the actions stemming from data are personalized and relevant. All this requires big data analytics capabilities that are only possible with a strategic, integrated IT infrastructure. Agility and flexibility are paramount. The organizations that do successfully mine their data see tremendous benefits—and competitive advantage. According to a study conducted by the Business Application Research Center (BARC), more than 40% of businesses globally are using data analytics to mine the intelligence from the enormous amounts of data they can now collect. Overall, these businesses reported an 8% increase in revenue combined with a 10% reduction in costs. They cited four key areas where business sees the biggest benefits: 69%: better strategic decisions 54%: improved control of operation process 52%: better understanding of customers 47%: cost reductions Unlocking Results One company that achieved best practice results is Starwood Hotels and Resorts Worldwide. Data growth and demands grew too great for the company's legacy system, which couldn't deliver the information hotel managers and administrators required at the moment it was needed. In addition, obtaining a report from the central reservation system could take upwards of 18 hours, according to Richard Chung, director of data integration for Starwood. The company, which operates 1,200 properties in nearly 100 countries and 200,000 employees, transitioned to Oracle Exadata Database Machine running on Oracle Linux. The result? The company can now complete extract, transform, and load (ETL) operations for business reports in 4 to 6 hours. Moreover, real-time data feeds, which were previously impossible, have led to process improvements as great as 288x. Starwood isn't alone. Icelandic IT services firm Advania figured out how to generate business and financial reports 70 percent faster while improving employee productivity. The company, with more than 10,000 clients scattered across 17 offices in Scandinavia, was previously limited by an old ERP system and legacy IT. However, by migrating to Oracle E-Business Suite and consolidating hardware platforms through Oracle SuperCluster on Oracle Solaris, with an Oracle database, the firm not only unleashed faster and better decision-making, it improved performance for customers. Retail giant 7-Eleven also transformed its data capabilities by revamping technology and processes. The company selected an engineered platforms from Oracle, including Oracle Exadata Database Machine, Oracle Exalogic and Oracle Fusion Middleware, to create a private cloud. This allows the company to connect with 8,500 locations and 8 million daily customers in real time. In fact, 7-Eleven can now personalize and contextualize interactions with customers. "We can understand what people are buying and how their behaviors are being changed when they are given offers," said chief technology and digital officer Steve Holland.  Dialing Into Disruption  Here's what you should keep in mind: data—or more specifically how it's used—is increasingly a point of competitive differentiation. It separates the digital innovators and disruptors from everyone and everything else. Yet reporting tools, analytics software and even leading-edge artificial intelligence systems can't transform coal into diamonds. There's a need to construct a framework that can deliver real-time business insights along with a single view into the enterprise. Although much of the task involves rethinking and reinventing processes, something McKinsey & Company explores in an excellent article about how to develop a data-driven strategy, there's also a need for a fast, flexible and responsive IT platform—one that taps clouds, virtualization, and advanced technical capabilities—to slide the dial to real-time. For organizations that succeed in constructing a robust data-driven framework, one in which business leaders can view the right data in real time, the answers to pressing business questions become visible. More importantly, innovation and disruption become possible.   There's More to Learn About Driving Innovation Through Data. Join Us at Oracle OpenWorld 2017.   There’s a wealth of exciting and enlightening sessions at Oracle OpenWorld 2017. If you want to learn more about gaining a data-driven advantage, we think you’ll find these sessions especially worthwhile:   Accelerate Innovation with Oracle Cloud Platform Data Management [CON4804] Developing an Analytical Approach to Support Enterprise Transformation [CON7169] Data-Driven Insights: Exelon Utilities Data Analytics Platform’s Hybrid Cloud [CON6287]

Executives must confront an unsettling reality in the era of digital business: despite growing volumes of data and increasingly sophisticated tools to analyze activities and events, unlocking answers...

CALL FOR SPEAKERS is Now Open for Oracle BIWA Summit'18 User Community Meeting in March, 2018

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center at Oracle Headquarters Campus, Redwood Shores, CA Share your successes… We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2018, featuring Oracle Spatial and Graph Summit, March 20 - 22, 2018 and share your successes with Oracle technology. The call for speakers is now open through December 3, 2017.  Submit now for possible early acceptance and publication in Oracle BIWA Summit 2018 promotion materials.  Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2018. Oracle Spatial and Graph Summit will be held in partnership with BIWA Summit.  BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) User Community and the Oracle Spatial and Graph SIG – a Special Interest Group in the Independent Oracle User Group (IOUG). BIWA Summits attract presentations and talks from the top Business Intelligence, Data Warehousing, Advanced Analytics, Spatial and Graph, and Big Data experts. The 3-day BIWA Summit 2017 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2017. Call for Speaker DEADLINE is December 3, 2017 at midnight Pacific Time. Presentations and Hands-on Labs must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated.  Speakers whose abstracts are accepted will be expected to submit their presentation as PDF slide deck for posting on the BIWA Summit conference website.  Accompanying technical and use case papers are encouraged, but not required. Complimentary registration to Oracle BIWA Summit 2018 is provided to the primary speaker of each accepted presentation. Note:  Any additional co-presenters need to register for the event separately and provide appropriate registration fees.    Please submit session proposals in one of the following areas: Machine Learning Analytics Big Data Data Warehousing and ETL Cloud Internet of Things Spatial and Graph (Oracle Spatial and Graph Summit) …Anything else “Cool” using Oracle technologies in “novel and interesting” ways Proposals that cover multiple areas are acceptable and highly encouraged.  On your submission, please indicate a primary track and any secondary tracks for consideration.  The content committee strongly encourages technical/how to sessions, strategic guidance sessions, and real world customer end user case studies, all using Oracle technologies. If you submitted a session last year, your login should carry over for 2018. We will be accepting abstracts on a rolling basis, so please submit your abstracts as soon as possible. Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle technology on premises, on Cloud, hybrid Cloud, and infrastructure: Cloud & Infrastructure Spatial & Graph Analytics Big Data & Machine Learning Internet of Things Database  Cloud Service Big Data Cloud Service Data Visualization Cloud Service Hadoop Spark Big Data Connectors (Hadoop & R) IaaS, PaaS, SaaS Spatial and Graph for Big Data and Database GIS and smart cities features Location intelligence Geocoding & routing Property graph DB Social network, fraud detection, deep learning graph analytics RDF graph Oracle Data Visualization Big Data Discovery OBIEE OBIA Applications Exalytics Real-Time Decisions Machine Learning Advanced Analytics Data Mining R Enterprise Fraud detection Text Mining SQL Patterns Clustering Market Basket Analysis Big Data Preparation Big Data from sensors Edge Analytics Industrial Internet IoT Cloud Monetizing IoT Security Standards   What To Expect 400+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking New at this year’s BIWA Summit: Strategy track – targeted at the C-level audience, how to assess and plan for new Oracle Technology in meeting enterprise objectives Oracle Global Leaders track – sessions by Oracle’s Global Leader customers on their use of Oracle Technology, and targeted product managers on latest Oracle products and features Grad-student track – sessions on cutting edge university work using Oracle Technology, continuing Oracle Academy’s sponsorship of graduate student participation  Exciting Topics Include:  Database, Data Warehouse, and Cloud, Big Data Architecture Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL Novel and Interesting Use Cases of Spatial and Graph, Text, Data Mining, ETL, Security, Cloud Working with Big Data:  Hadoop, "Internet of Things", SQL, R, Sentiment Analysis Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics —All Better Together Example Talks from BIWA Summit 2017:  [Visit www.biwasummit.org to see the  Full Agenda from BIWA’17 and to download copies of BIWA’17 presentations and HOLs.] Machine Learning Taking R to new heights for scalability and performance Introducing Oracle Machine Learning Zeppelin Notebooks Oracle's Advanced Analytics 12.2c New Features & Road Map: Bigger, Better, Faster, More! An Post -- Big Data Analytics platform and use of Oracle Advanced Analytics Customer Analytics POC for a global retailer, using Oracle Advanced Analytics Oracle Marketing Advanced Analytics Use of OAA in Propensity to Buy Models Clustering Data with Oracle Data Mining and Oracle Business Intelligence How Option Traders leverage Oracle R Enterprise to maximize trading strategies From Beginning to End - Oracle's Cloud Services and New Customer Acquisition Marketing K12 Student Early Warning System Business Process Optimization Using Reinforcement Learning Advanced Analytics & Graph: Transparently taking advantage of HW innovations in the Cloud Dynamic Traffic Prediction in Road Networks Context Aware GeoSocial Graph Mining Analytics Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Make the most of Oracle DV (DVD / DVCS / BICS) Data Visualization at SoundExchange – A Case Study Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Does Your Data Have a Story? Find out with Oracle Data Visualization Desktop Social Services Reporting, Visualization, and Analytics Using OBIEE Leadership Essentials in Successful Business Intelligence (BI) Programs Big Data Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Why Apache Spark has become the darling in Big Data space? Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Cloud Computing Oracle Big Data Management in the Cloud Oracle Cloud Cookbook for Professionals Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Deploying Oracle Database in the Cloud with Exadata: Technical Deep Dive Employee Onboarding: Onboard – Faster, Smarter & Greener Deploying Spatial Applications in Oracle Public Cloud Analytics in the Oracle Cloud: A Case Study Deploying SAS Retail Analytics in the Oracle Cloud BICS - For Departmental Data Mart or Enterprise Data Warehouse? Cloud Transition and Lift and Shift of Oracle BI Applications Data Warehousing and ETL Business Analytics in the Oracle 12.2 Database: Analytic Views Maximizing Join and Sort Performance in Oracle Data Warehouses Turbocharging Data Visualization and Analyses with Oracle In-Memory 12.2 Oracle Data Integrator 12c: Getting Started Analytic Functions in SQL My Favorite Scripts 2017 Internet of Things Introduction to IoT and IoT Platforms The State of Industrial IoT Complex Data Mashups: an Example Use Case from the Transportation Industry Monetizable Value Creation from Industrial-IoT Analytics Spatial and Graph Summit Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Build Recommender Systems, Detect Fraud, and Integrate Deep Learning with Graph Technologies Building a Tax Fraud Detection Platform with Big Data Spatial and Graph technologies Maps, 3-D, Tracking, JSON, and Location Analysis: What’s New with Oracle’s Spatial Technologies Deploying Spatial Applications in Oracle Public Cloud RESTful Spatial services with Oracle Database as a Service and ORDS Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Smart Parking for a Smart City Using Oracle Spatial and Graph at Los Angeles and Munich Airports Analysing the Panama Papers with Oracle Big Data Spatial and Graph Apply Location Intelligence and Spatial Analysis to Big Data with Java  Example Hands-on Labs from BIWA Summit 2017: Using R for Big Data Advanced Analytics and Machine Learning Learn Predictive Analytics in 2 hours!  Oracle Data Miner Hands on Lab Deploy Custom Maps in OBIEE for Free Apply Location Intelligence and Spatial Analysis to Big Data with Java Use Oracle Big Data SQL to Analyze Data Across Oracle Database, Hadoop, and NoSQL Make the most of Oracle DV (DVD / DVCS / BICS) Analyzing a social network using Big Data Spatial and Graph Property Graph Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’17.   Dan Vlamis and Shyam Nath , Oracle BIWA Summit '18 Conference Co-Chairs

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center...

Analytics

What's new in the latest Big Data Cloud Service-Compute Edition Release

Big Data Cloud Service – Compute Edition new release update version 17.2.5 is generally available. What’s New: Big Data File System: Big Data Cloud Service - Compute Edition includes the Oracle Big Data File System (BDFS), an in-memory file system with support for tiered storage that accelerates access to data stored in Cloud Storage and enables Big Data workloads to run much faster. Customers no longer have to choose between the performance of a HDFS based Data Lake and the agility/lower-cost of a Cloud Storage based Data Lake. Bootstrap Scripts: Bootstrap Script feature is available with the release 17.2.5. Bootstrap scripts help customers to spin up customized big data clusters.  This capability helps customers to install binaries, load data/libraries, customize configurations and/or perform any other action (that can be executed through script) after the default cluster provisioning. A sample bootstrap illustrating the install of R is also included. MapReduce Jobs: Previous release supported creation and running of MapReduce Jobs as an experimental feature.   This MapReduce feature in the current release is no more experimental and is in production. Customers can submit MapReduce jobs using big data cluster console, the REST API or CLI (Command Line Interface) Highlights since GA Deployment Profiles: Deployment Profiles are pre-defined set of services optimized for specific use case or workload. This helps users to avoid the complexity of choosing various Hadoop components for their big data workloads. High Performance Block Storage: Now customers can utilize High Performance SSDs in conjunction with their Big Data - Compute Edition clusters. (Part # B87608) Cloud@Customer: Oracle Cloud Machine X6 supports Big Data Cloud Service –Compute Edition deployment. Starting FY18 BDCS-CE runs on Oracle Cloud Machine (OCM). Customers that can’t go to public cloud for various reasons now can leverage BDCS-CE running on OCM in Customers’ data center. BDCS-CE can be consumed both as subscriptions and as a metered capacity on the OCM. Additional Hadoop Components: Big Data Cloud Service-Compute Edition continues to add support for additional Hadoop components. Since GA, we have added support for Apache Hive, Apache Spark-R and Apache Mahout in BDCS-CE. The Apache Zeppelin version has also been updated to Zeppelin 0.7.x To learn more about Big Data Cloud Service - Compute Edition check out these resources:  BDCS-CE Public Website BDCS-CE Introduction Video BDCS-CE Getting Started Video BDCS-CE Demos & Videos New Data Lake Workshop

Big Data Cloud Service – Compute Edition new release update version 17.2.5 is generally available. What’s New: Big Data File System: Big Data Cloud Service - Compute Edition includes the Oracle Big Data...

Big Data

Harness the Power of Big Data

Companies today have built a voracious appetite for data and insights. They demand information at their fingertips, but face the growing complexities of data ingestion, data processing, data management, and data security. Here's where most companies find their dreams for agility come to a screeching halt. Time to Crack the Code Business and IT leaders can together untangle this challenging situation by finding the answer to the fundamental question: How can my company best harness the power of data in a way that easily delivers real-time, streamlined insights without compromising security? It's time to crack the code and see how companies like CaixaBank have reaped the fruits of success. via GIPHY Removing the Complexity: You Need Data that Listens to You For businesses to succeed, they need the speed and simplicity of the public cloud. Specifically, users need to be able to grow storage capacity or increase compute on-demand without needing to configure for peak workloads. The challenge is that most companies face cloud compliance or corporate policy regulations that inhibit the effective use of public cloud services. So teams end up building their own private cloud in an attempt to run large and diverse data workloads at speed and scale. But that often results in the purchase of complex infrastructure that is difficult to maintain, patch, and upgrade. This runs counter to what organizations actually wants to do—which is getting to extract their data ASAP. What enterprises end up with is a complex data environment that is hard to manage and control. Data-driven leaders find themselves with an abyss of uncooperative data that doesn't listen to their analytical needs. The Good News The winners are those who can combine the data sources they want into the shape they need for the task at hand; they achieve data liquidity. In the past, the public cloud had been great for data liquidity because the public cloud is an elastic and easy to provision service that scales up and down as needed and users only need to pay on a per-usage basis. But today's policies often demand that data stay behind the firewall and this eliminates the long-term option of leveraging the public cloud. The good news? Enterprises can finally securely manage large data workloads with ease and speed as well as deliver actionable insights by tapping into the Oracle Big Data Cloud Machine. Customers like CaixaBank rely on the Oracle Big Data Cloud Machine to enjoy the same on-demand, subscription benefits of the public cloud in their data center behind their firewall. Now, companies can rapidly unleash the value of data with ease by leveraging the latest addition to Oracle's cloud offering—the Oracle Big Data Cloud Machine.

Companies today have built a voracious appetite for data and insights. They demand information at their fingertips, but face the growing complexities of data ingestion, data processing,...

Analytics

Oracle's SQL Based Statistical Functions - FREE in every Oracle Database On-Premise on in Cloud

Included in every Oracle Database is a collection of basic statistical functions accessible via SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fits, cross tabs with Chi-square statistics, and analysis of variance (ANOVA). The basic statistical functions are implemented as SQL functions and leverage all the strengths of the Oracle Database.  The SQL statistical functions work on Oracle tables and views and exploit all database parallelism, scalability, user privileges and security schemes.  Hence the SQL statistical functions can be included and exposed within SQL queries, BI dashboards and embedded in real-time Applications.    The SQL statistical functions can be used in a variety of ways.  For example, users can call Oracle's SQL statistical functions to obtain mean, max, min, median, mode and standard deviation information for their data; or users can measure the correlations between attributes and measure the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. The SQL Aggregate functions return a single result row based on groups of rows, rather than on single rows while the SQL Analytical functions compute an aggregate value based on a group of rows.  SQL statistical functions include:  Descriptive statistics (e.g. median, stdev, mode, sum, etc.) Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test Correlations analysis (parametric and nonparametric e.g. Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient) Ranking functions Cross Tabulations with Chi-square statistics Linear regression ANOVA (Analysis of variance) Test Distribution fit (e.g. Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test, etc.) Aggregate functions Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers) LAG/LEAD functions Reporting aggregate functions   STATS_T_TEST_INDEPU Example: The following example determines the significance of the difference between the average sales to men and women where the distributions are known to have significantly different (unpooled) variances: SELECT SUBSTR(cust_income_level, 1, 22) income_level,      AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,      AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,      STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,      STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value    FROM sh.customers c, sh.sales s WHERE c.cust_id = s.cust_id    GROUP BY ROLLUP(cust_income_level)    ORDER BY income_level, sold_to_men, sold_to_women, t_observed;  INCOME_LEVEL           SOLD_TO_MEN SOLD_TO_WOMEN T_OBSERVED TWO_SIDED_P_VALUE  ---------------------- ----------- ------------- ---------- -----------------  A: Below 30,000          105.28349    99.4281447 -2.0542592        .039964704  B: 30,000 - 49,999       102.59651    109.829642 2.96922332        .002987742  C: 50,000 - 69,999      105.627588    110.127931  2.3496854        .018792277  D: 70,000 - 89,999      106.630299    110.47287  2.26839281        .023307831  E: 90,000 - 109,999     103.396741    101.610416 -1.2603509        .207545662  F: 110,000 - 129,999     106.76476    105.981312 -.60580011        .544648553  G: 130,000 - 149,999    108.877532    107.31377  -.85219781        .394107755  H: 150,000 - 169,999    110.987258    107.152191 -1.9451486        .051762624  I: 170,000 - 189,999    102.808238    107.43556  2.14966921        .031587875  J: 190,000 - 249,999    108.040564    115.343356 2.54749867        .010854966  K: 250,000 - 299,999    112.377993    108.196097 -1.4115514        .158091676  L: 300,000 and above    120.970235    112.216342 -2.0726194        .038225611                          107.121845    113.80441  .689462437        .490595765                          106.663769    107.276386 1.07853782        .280794207  14 rows selected.  (See link below to SQL Language Reference for STATS_T_TEST_*)    Most statistical software vendors charge license fees for these statistical capabilities.  Oracle includes them in every Oracle Database. Users can reduce annual license fees and perform the equivalent basic statistical functionality while keeping big data and analytics simple in a single, unified, consistent, scalable and secure Oracle Database platform.  Because the statistical functions are native SQL functions, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways. Additionally, Oracle Advanced Analytics' Oracle R Enterprise component exposes the SQL statistical functions through the R statistical programming language and allows R users to use R statistical functions e.g. Summary but then pushes down the R functions to the equivalent SQL statistical functions for avoidance of data movement and significant in-database performance gains. The SQL Developer Oracle Data Miner workflow GUI extension also leverages the SQL statistical functions in the Explore, Graph, SQL Query and Transform nodes.

Included in every Oracle Database is a collection of basic statistical functions accessible via SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for...

Big Data

The New Data Lake - You Need More Than HDFS

  A data lake is a key element of any big data strategy and conventional wisdom has it that Hadoop/HDFS is the core of your lake. But conventional wisdom changes with new information (which is why we're no longer living on an earth presumed to be both flat and at the center of the universe), and in this case that new information is all about object storage. Guest blogger Paul Miller, Big Data and Analytics Manager at Oracle, has this post on object storage as the foundation of the new data lake. And if you'd like to try building one yourself, head over to our New Data Lake Workshop (it's free!) which will guide you through the process. After a short time, you'll have a functioning, modern data lake, ready to go. Object Store is the New Data Lake There are many ways to persist data in cloud platforms today such as Object, Block, File, SMB, DB, Queue, Archive, etc. As an overview, here are Oracle’s, AWS' and Azure’s primary storage solutions. Object Based Distributed Storage: Key/Content driven interface Oracle Object Store AWS S3 Azure Blob Storage File Based Distributed Storage: Nested file/folders interface Oracle BDCS-CE Storage (HDFS) AWS EMR HDFS/EMRFS Azure Data Lake Store (HDFS) Block Based Storage: Raw disk like 1s and 0s interface  Oracle Cloud Block Volume Storage AWS Elastic Block Storage (EBS) Azure Disk Storage Of the three persistence strategies outlined above, Object Based Distributed Storage is the center piece for public cloud platforms. Amazon paved a mindset centered around cloud native application developers using object store (AWS S3) as their persistent store. Object store is now the integration point where cloud and on-premise applications can easily persist and distribute data globally in a canonical way. Oracle, recognizing this fact, made a massive investment in developing an object store that is fast and easy to use within the Oracle Public Cloud. When it comes to analytics, cloud native persistence and backup targets, Oracle Object Store is critical. How Object Storage Works Object storage is a scalable redundant foundational storage service. Objects and files are written to multiple disk drives spread throughout servers in the Oracle Public Cloud, with the Oracle’s software responsible for ensuring data replication and integrity across the cluster. Because Oracle uses provisioning logic to maintain availability locally and across different data centers, they are able to provide 11 9s data durability. Should anything fail, Oracle handles the replication of the container's content from other active nodes to new locations in the Oracle Public Cloud ecosystem. When it comes to using the latest in greatest tools for data science and fast data processing, object store enables agility, cost saving and deployment time saving capabilities by: 1.    Detaching compute from storage allowing for the environments to grow independently - check out what we are doing with Big Data Cloud Service CE or IoT Cloud Service 2.    Persisting all the data in a low cost, globally distributed store that speeds processes up while making it more durable 3.    Maintaining a core, distribution based environment (Cloudera) while being able to use the latest and greatest Hadoop projects on demand (Apache) The Benefits of Object Store Hadoop HDFS' strategy of intrinsically tying storage and compute is increasing becoming an inefficient use of resources when it comes to enterprise data lakes. Think of object store as the lowest tier in your storage hierarchy. Object store allows you to decouple storage from compute giving organizations more flexibility, durability and cost savings. Store everything in object store and read only the data you need into the application or processing tier (Java CS, Node.js, Coherence Data Grid, DBaaS, Spark RDD, Essbase, etc) on demand. At the end of the day, the cost of copying this data as needed is small compared with the cost savings and the increased flexibility. These key factors placed object store at the center of our Oracle Analytics and Big Data Reference Architecture: Don't forget to visit our other blog article on data lake best practices. Or if you're ready to get started, try building a data lake for free with an Oracle trial. 

  A data lake is a key element of any big data strategy and conventional wisdom has it that Hadoop/HDFS is the core of your lake. But conventional wisdom changes with new information (which is why...

Analytics

"It's tough to make predictions...

... especially about the future" as a wise man once said (though check #36). But we've been doing this for a few years now, and 2017's list finally made it to oracle.com/bigdata or here's a direct link to the PDF. With some additional help from Yogi, we did a webcast with O'Reilly back in December which is still up for you to view if you'd like some more background. "You can observe a lot just by watching" aptly describes machine learning which was the subject of our first prediction. Simplifying hugely, ML is just the process of using an algorithm to examine data and come up with new insights. Initially the preserve of data scientist, ML is becoming more widely used and embedded in other tools and applications: everything from music recommendations to IT. And speaking of IT tools,  Oracle Management Cloud already embeds ML to do things like flagging unusual resource usage, identify configuration changes and forecast outages before they happen. Systems management is a classic big data problem, with lots of different data sources and formats, real-time data streams, and now the opportunity to apply sophisticated analytics to deliver benefits that weren't possible before. Expect new capabilities like that in many more products this year. We'll do some more background posts about these predictions throughout the year. When exactly will that happen? Don't know. After all, it's tough to make predictions...  

... especially about the future" as a wise man once said (though check #36). But we've been doing this for a few years now, and 2017's list finally made it to oracle.com/bigdata or here's a direct...

Analytics

CALL FOR ABSTRACTS: Oracle BIWA Summit '17 - THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Oracle User Conference 2017

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool"  Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA   What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Successes Can You Share? We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016. Submit now for possible early acceptance and publication inOracleBIWA Summit 2017 promotion materials. Presentations must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated. Speakers whose abstracts are accepted will be expected to submit at a later date a presentation outline and presentation PDF slide deck. Accompanying technical and use case papers are encouraged, but not required. Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2017. BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) SIG, the Oracle Spatial and Graph SIG—both Special Interest Groups in the Independent Oracle User Group (IOUG), and the Oracle Northern California User Group. BIWA Summits attract presentations and talks from the top BI, DW, Advanced Analytics, Spatial, and Big Data experts. The 3-day BIWA Summit 2016 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2016. Call for Speaker DEADLINE is October 1, 2016 at midnight Pacific Time. Complimentary registration to Oracle BIWA Summit 2017 is provided to the primary speaker of each accepted abstract.  Note: One complimentary registration per accepted session will be provided. Any additional co-presenters need to register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration. Please submit speaker proposals in one of the following tracks: Advanced Analytics Business Intelligence Big Data + Data Discovery Data Warehousing and ETL Cloud Internet of Things Spatial and Graph …Anything else “Cool” using Oracle technologies in “novel and interesting” ways    Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle Business Intelligence, Data Warehousing, IoT and Analytical products: Cloud & Big Data DW & Data Integration BI & Data Discovery & Visualization Advanced Analytics & Spatial Internet of Things Oracle Database Cloud Service Big Data Appliance Oracle Data Visualization Cloud Service Hadoop abd Spark Big Data Connectors (Hadoop & R)   Oracle Data as a Service Engineered Systems Exadata Oracle Partitioning Oracle Data Integrator (ETL) In-Memory Oracle Big Data Preparation Cloud Service   Big Data Discovery Data Visualization OBIEE OBI Applications Exalytics Cloud Real-Time Decisions Oracle Advanced Analytics Oracle Spatial and Graph Oracle Data Mining & Oracle Data Miner Oracle R Enterprise SQL Patterns Oracle Text Oracle R Advanced Analytics for Hadoop Big Data from sensors Edge Analytics Industrial Internet IoT Cloud Monetizing IoT Security Standards   What To Expect 500+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking Exciting Topics Include:  Database, Data Warehouse, and Cloud, Big Data Architecture Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL Novel and Interesting Use Cases of Everything! Spatial, Text, Data Mining, ETL, Security, Cloud Working with Big Data: Hadoop, "Internet of Things", SQL, R, Sentiment Analysis Oracle Big Data Discovery, Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics—All Better Together Example Talks from BIWA Summit 2016:   [Visit www.biwasummit.org to see the last year’s Full Agenda from BIWA’16 and to download copies of BIWA’16 presentations and HOLs.]   Advanced Analytics Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency, Frank Heilland, Oracle Sales and Support Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments, Julia Minkowski, Fiserv Enabling Clorox as Data Driven Enterprise, Yigal Gur, Clorox Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud, Charlie Berger, Oracle Stubhub and Oracle Advanced Analytics, Brian Motzer, Stubhub Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold, Mark Hornick, Oracle Large Scale Machine Learning with Big Data SQL, Hadoop and Spark, Marcos Arancibia, Oracle Oracle R Enterprise 1.5 - Hot new features!, Mark Hornick, Oracle BI and Visualization Electoral fraud location in Brazilian General Elections 2014, Alex Cordon, Henrique Gomes, CDS See What’s There and What’s Coming with BICS & Data Visualization, Philippe Lions, Oracle Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option, Kai Yu, Dell BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres, Tim Vlamis, Vlamis Defining a Roadmap for Migrating to Oracle BI Applications on ODI, Patrick Callahan, AST Corp. Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c, Philippe Lions, Oracle Big Data How to choose between Hadoop, NoSQL or Oracle Database , Jean-Pierre Djicks, Oracle Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer, Mark Rittman, Rittman Mead Oracle Big Data: Strategy and Roadmap, Neil Mendelson, Oracle High Speed Video Processing for Big Data Applications, Melliyal Annamalai, Oracle How to choose between Hadoop, NoSQL or Oracle Database, Shyam Nath, General Electric What's New With Oracle Business Intelligence 12c, Stewart Bryson, Red Pill Leveraging Oracle Big Data Discovery to Master CERN’s Control Data, Antonio Romero Marin, CERN Cloud Computing Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology, Giovanni Corcione, Oracle Oracle DBaaS Migration Road Map, Daniel Morgan, Forsythe Meta7 Safe Passage to the CLOUD – Analytics, Rich Solari, Privthi Krishnappa, Deloitte Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud, Jeffrey Schauer, JS Business Intelligence Data Warehousing and ETL Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) , Panel Discussion, Andy Mendelsohn, Oracle, Steve Feuerstein, Oracle, George Lumpkin, Oracle The Place of SQL in the Hybrid World, Kerry Osborne and Tanel Poder, Accenture Enkitec Group Is Oracle SQL the best language for Statistics, Brendan Tierney, Oralytics Taking Full Advantage of the PL/SQL Compiler, Iggy Ferenandez, Oracle Internet of Things Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive, Robert Liekar, M&S Consulting Spatial Summit Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management, Dave Ellerbeck, Global Information Systems Oracle Spatial and Graph: New Features for 12.2, Siva Ravada, Oracle High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph, Qingyun (Jeffrey) Xie, Oracle Example Hands-on Labs from BIWA Summit 2016: Scaling R to New Heights with Oracle Database, Mark Hornick, Oracle, Tim Vlamis, Vlamis Software Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.1, Charlie Berger, Oracle, Brendan Tierney, Oralytics, Karl Rexer, Rexer Analytics Predictive Analytics using SQL and PL/SQL, Oracle Brendan Tierney, Oralytics, Charlie Berger, Oracle Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases, Pravin Patil, Kapstone Lunch & Partner Lightning Rounds Fast and Fun 5 Minute Presentations from Each Partner--Must See!   Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’16. Dan Vlamis and Shyam Nath , Oracle BIWA Summit '17Conference Co-Chairs

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool"  Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores,...

Three Successful Customers Using IoT and Big Data

When I wrote about the convergence of IoT and big data I mentioned that we have successful customers. Here I want to pick three that highlight different aspects of the complete story. There are a lot of different components to a complete big data solution. These customers are using different pieces of the Oracle solution, integrating them with existing software and processes.  Gemü manufactures precision valves used to make things like pharmaceuticals. As you can imagine, it's critical that valves operate correctly to avoid adding too much or too little of an active ingredient. So Gemü turned to the Oracle IoT Cloud Service to help the monitor those valves in use in their customers' production lines. This data helps Gemü and their partners ensure the quality of their product. And over time, this data will enable them to predict failures or even the onset of out of tolerance performance. Predictive maintenance is a potentially powerful new capability and enables Gemü to maintain the highest levels of quality and safety. From small valves to the largest machine on the planet: the Large Hadron Collider at CERN. There are many superlatives about this system. Their cryogenics system is also the largest in the world, and has to keep 36,000 tons of superconducting magnets at 1.9K (-271.3 Celsius) using 120 tons of liquid helium. Failures in that system can be costly. They've had problems with a weasel and a baguette, both of which are hard to predict, but other failures could potentially be stopped. Which is why CERN is using Big Data Discovery to help them understand what's going on with their cryogenics system. They are also using predictive analytics with the ultimate goal of predicting failures before they happen, and avoiding the two months it can take to warm up systems long enough to make even a basic repair, before cooling them down again. And finally this one. IoT and big data working together can help a plane to fly, a valve to make pharmaceuticals, and the world's largest machine to stay cool. What can we do for you?

When I wrote about the convergence of IoT and big data I mentioned that we have successful customers. Here I want to pick three that highlight different aspects of the complete story. There are a lot...

Focus On Big Data at Oracle OpenWorld!

Oracle OpenWorld is fast approaching and you won’t want to miss the big data highlights. Participate in our live demos, attend a theater session, or take part in one of our many hands-on labs, user forums, and conference sessions all dedicated to big data. Whether you’re interested in machine learning, predictive maintenance, real-time analytics, the internet of things (IoT), data-driven marketing, or learning how Oracle supports open source technologies such as Kafka, Apache Spark, and Hadoop as part of our core strategy, we have the information for you. For more details on how to center your attention on Big Data at OpenWorld, you can access the “Focus On” Big Data program guide link, however here are a few things you won’t want to miss: General Session: Oracle Cloud Platform for Big Data [GEN7471] Tuesday, Sept. 20th 11:00 a.m. | Moscone South—103 Oracle Cloud Platform for big data enables complete, secure solutions that maximize value to your business, lowers costs and increases agility, and embraces open source technologies. Learn about Oracle’s strategy for big data in the cloud. Oracle Big Data Management in the Cloud [CON7473] Wednesday, Sept. 21, 11:00 a.m. | Moscone South—302 Successful analytical environments require seamless integration of Hadoop, Spark, NoSQL, and relational databases. Data virtualization can eliminate data silos and make this information available to your entire business. Learn to tame the complexity of data management. Oracle Big Data Lab in the Cloud [CON7474] Wednesday, Sep 21, 12:15 p.m. | Moscone South—302 Business analysts and data scientists can experiment and explore diverse data sets and uncover what new questions can be answered in a data lab environment. Learn about the future of the data lab in the cloud and also how lab insights can unlock the value of big data for the business.   Oracle Big Data Integration in the Cloud [CON7472] Tuesday, Sep 20, 4:00 p.m. | Moscone South—302 Oracle Data Integration’s cloud services and solutions can help manage your data movement and integration challenges across on-premises, cloud, and other data platforms. Get started quickly in the cloud with data integration for Hadoop, Spark, NoSQL, and Kafka. You’ll also see the latest data preparation self-service tools for nontechnical users. Drive Business Value and Outcomes Using Big Data Platform [THT7828] Monday, Sep 19, 2:30 p.m. | Big Data Theater, Moscone South Exhibition Hall Driving business value with big data requires more than big data technology. Learn how to maximize the value of big data by bringing together big data management, big data analytics, and enterprise applications. The session explores several different use cases and shows what it takes to construct integrated solutions that address important business problems. Oracle Streaming Big Data and Internet of Things Driving Innovation [CON7477] Wednesday, Sep 21, 3:00 p.m. | Moscone South—302 In the Internet of Things (IoT), a wealth of data is generated, and can be monitored and acted on in real time. Applying big data techniques to store and analyze this data can drive predictive, intelligent learning applications. Learn about how the convergence of IoT and big data can reduce costs, generate competitive advantage, and open new business opportunities. Oracle Big Data Showcase Moscone South Visit the Big Data Showcase throughout the show and participate in a live demo or attend one of our many dedicated 20-minute theater sessions with big data experts. We are looking forward to Oracle OpenWorld 2016 and we can’t wait to see you there! In the meantime, check out oracle.com/bigdata for more information.    

Oracle OpenWorld is fast approaching and you won’t want to miss the big data highlights. Participate in our live demos, attend a theater session, or take part in one of our many hands-on labs, user...

Internet of Things and Big Data - Better Together

What's the difference between the Internet of Things and Big Data? That's not really the best question to ask, because these two are much more alike than they are different. And they complement each other very strongly which is one reason we've written a white paper on the convergence. Big data is all about enabling organizations to use more of the data around them: things customers write in social media; log files from applications and processes; sensor and device data. And there's IoT! One way to think of it is as one of the sources for big data. But IoT is more than that. It's about collecting all that data, analyzing it in real time for events or patterns of interest, and making sure to integrate any new insight into the rest of your business. With you add the rest of big data to IoT, there's much more data to work with and powerful big data analytics to come up with additional insights. Best to look at an example. Using IoT you can track and monitor assets like trucks, engines, HVAC systems, and pumps. You can correct problems as you detect them. With big data, you can analyze all the information you have about failures and start to uncover the root causes. Combine the two and now you can not just react to problems as they occur. You can predict them, and fix them before they occur. Go from being reactive to being proactive. Check out this infographic. The last data point, down at the bottom right hand side may be the most important one. Only 8% of businesses are fully capturing and analyzing IoT data in a timely fashion. Nobody likes to arrive last to a party and find the food and drink all gone. This party's just getting started. You should be asking every vendor you deal with how they can help you take advantage of IoT and big data - they really are better together, and there's lots of opportunity. Next post will highlight 3 customers who are taking advantage of that opportunity.

What's the difference between the Internet of Things and Big Data? That's not really the best question to ask, because these two are much more alike than they are different. And they complement each...

Cloud

DIY Hadoop: Proceed At Your Own Risk

Could your security and performance be in jeopardy? Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think all those people generate a huge amount of data (in the form of website visits, clicks, likes, tweets, photos, online transactions, and blog posts), wait for the data explosion that will happen when the Internet of Things (IoT) meets the Internet of People. Gartner, Inc. forecast that there will be twice as many--6.4 billion--Internet-connected gadgets (everything from light bulbs to baby diapers to connected cars) in use worldwide in 2016, up 30 percent from 2015, and will reach over 20 billion by 2020. Companies of all sizes and in virtually every industry are struggling to manage the exploding amounts of data. To cope with the problem, many organizations are turning to solutions based on Apache Hadoop, the popular open-source software framework for storing and processing massive datasets. But purchasing, deploying, configuring, and fine-tuning a do-it-yourself (DIY) Hadoop cluster to work with your existing infrastructure can be much more challenging than many organizations expect, even if your company has the specialized skills needed to tackle the job.   But as both business and IT executives know all too well, managing big data involves far more than just dealing with storage and retrieval challenges—it requires addressing a variety of privacy and security issues as well. Beyond the brand damage that companies like Sony and Target have experienced in the last few years from data breaches, there's also the likelihood that companies that fail to secure the life cycle of their big data environments will face regulatory consequences. Early last year, the Federal Trade Commission released a report on the Internet of Things that contains guidelines to promote consumer privacy and security.  The Federal Trade Commission’s document, Careful Connections: Building Security in the Internet of Things, encourages companies to implement a risk-based approach and take advantage of best practices developed by security experts, such as using strong encryption and proper authentication. While not calling for new legislation (due to the speed of innovation in the IoT space), the FTC report states that businesses and law enforcers have a shared interest in ensuring that consumers’ expectations about the security of IoT products are met. The report recommends several "time-tested" security best practices for companies processing IoT data, such as: Implementing "security by design" by building security into your products and services at the outset of your planning process, rather than grafting it on as an afterthought. Implementing a defense-in-depth approach that incorporates security measures at several levels. Business and IT executives who try to follow the FTC's big data security recommendations are likely to run into roadblocks, especially if you're trying to integrate Hadoop with your existing IT infrastructure. The main problem with Hadoop is that is it wasn’t originally built with security in mind; it was developed solely to address massive distributed data storage and fast processing, which leads to the following threats: DIY Hadoop. A do-it-yourself Hadoop cluster presents inherent risks, especially since many times it's developed without adequate security by a small group of people in a laboratory-type setting, closed off from a production environment. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management, and so forth—becomes more difficult and time-consuming. Unauthorized access. Built under the principle of “data democratization”—so that all data is accessible by all users of the cluster— Hadoop has had challenges complying with certain compliance standards, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data Security Standard (PCI DSS). That’s due to the lack of access controls on data, including password controls, file and database authorization, and auditing. Data provenance. With open source Hadoop, it has been difficult to determine where a particular dataset originated and what data sources it was derived from.  Which means you can end up basing critical business decisions on analytics taken from suspect or compromised data. 2X Faster Performance than DIY Hadoop In his keynote at last year's Oracle OpenWorld 2015, Intel CEO Brian Krzanich described work Intel has been doing with Oracle to build high performing datacenters using the pre-built Oracle Big Data Appliance, an integrated, optimized solution powered by the Intel Xeon processor family. Specifically, he referred to recent benchmark testing by Intel engineers that showed an Oracle Big Data Appliance solution with some basic tuning achieved nearly two times better performance than a comparable DIY cluster built on comparable hardware. Not only is it faster, but it was designed to meet the security needs of the enterprise.  Oracle Big Data Appliance automates the steps required to deploy a secure cluster – including complex tasks like setting up authentication, data authorization, encryption, and auditing.  This dramatically reduces the amount of time required to both set up and maintain a secure infrastructure. Do-it-yourself (DIY) Apache Hadoop clusters are appealing to many business and IT executives because of the apparent cost savings from using commodity hardware and free software distributions. As I've shown, despite the initial savings, DIY Hadoop clusters are not always a good option for organizations looking to get up to speed on an enterprise big data solution, both from a security and performance standpoint. Find out how your company can move to an enterprise Big Data architecture with Oracle’s Big Data Platform at https://www.oracle.com/big-data. Securing the Big Data Life Cycle Deploying an Apache Hadoop Cluster? Spend Your Time on BI, Not DIY

Could your security and performance be in jeopardy? Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think...

Innovation

The Surprising Economics of Engineered Systems

The title's not mine. It comes from a video done for us by ESG, based on their white paper, which looks at the TCO of building your own Hadoop cluster vs buying one ready-built (Oracle Big Data Appliance). You should watch or read, depending on your preference, or even just check out the infographic. The conclusion could be summed up as "better, faster, cheaper, pick all three". Which is not what you'd expect. But they found that it's better (quicker to deploy, lower risk, easier to support), faster (from 2X to 3X faster than a comparable DIY cluster) and cheaper (45% cheaper if you go with list pricing). So while you may not think that an engineered system like the Big Data Appliance is the right system for you, it should always be on your shortlist. Compare it with building your own - you'll probably be pleasantly surprised. There's a lot more background in the paper in particular, but let me highlight a few things:  - We have seen some instances where other vendors offer huge discounts and actually beat the BDA price.  If you see this, check two things. First, will that discount be available for all future purchases or is this just a one-off discount. And second, remember to include the cost that you incur to setup, manage, maintain and patch the system. -  Consider performance. We worked with Intel to tune Hadoop for this specific configuration. There are something like 500 different parameters on Hadoop that can impact performance one way or the other. That tuning project was a multi-week exercise with several different experts. The end result was performance of nearly 2X, sometimes up to 3X faster than a comparable, untuned DIY cluster. Do you have the resources and expertise to replicate this effort? Would a doubling of performance be useful to you? - Finally, consider support. A Hadoop cluster is a complex system. Sometimes problems arise that result from the interaction of multiple components. It can be really hard to figure those out, particularly when multiple vendors are involved for different pieces. When no single component is "at fault" it's hard to find somebody to fix the overall system. You'd never buy a computer with 4 separate support contracts for operating system, CPU, disk and network card - you'd want one contract for the entire system. The same can be true for your Hadoop clusters as well.

The title's not mine. It comes from a video done for us by ESG, based on their white paper, which looks at the TCO of building your own Hadoop cluster vs buying one ready-built (Oracle Big Data...

Predictions for Big Data Security in 2016

Leading into 2016, Oracle made ten big data predictions, and one in particular around security. We are nearly four months into the year and we've seen these predictions coming to light. Increase in regulatory protections of personal information Early February saw the creation of the Federal Privacy Council, "which will bring together the privacy officials from across the Government to help ensure the implementation of more strategic and comprehensive Federal privacy guidelines. Like cyber security, privacy must be effectively and continuously addressed as our nation embraces new technologies, promotes innovation, reaps the benefits of big data and defends against evolving threats." The European Union General Data Protection Regulation is a reform of EU's 1995 data protection rules (Directive 95/46/EC). Their Big Data fact sheet was put forth to help promote the new regulations. "A plethora of market surveys and studies show that the success of providers to develop new services and products using big data is linked to their capacity to build and maintain consumer trust." As a timeline, the EU expects adoption in Spring 2016 and enforcement will begin two years later in Spring 2018. Earlier this month, the Federal Communications Commission announced a proposal to restrict Internet providers' ability to share the information they collect about what their customers do online with advertisers and other third parties. Increase use of classification systems that categorize data into groups with pre-defined policies for access, redaction and masking. Infosecurity Magazine article highlights the challenge of data growth and the requirement for classification: "As storage costs dropped, the attention previously shown towards deleting old or unnecessary data has faded. However, unstructured data now makes up 80% of non-tangible assets, and data growth is exploding. IT security teams are now tasked with protecting everything forever, but there is simply too much to protect effectively – especially when some of it is not worth protecting at all." The three benefits of classification highlighted include the ability to raise security awareness, prevent data loss, and address records management regulations. All of these are legitimate benefits of data classification that organizations should consider. Case in point, Oracle customer Union Investment increased agility and security by automatically processing investment fund data within their proprietary application, including complex asset classification with up to 500 data fields, which were previously distributed to IT staff using spreadsheets. Continuous cyber-threats will prompt companies to both tighten security, as well as audit access and use of data. This is sort of a no-brainer. We know more breaches are coming, such as here, here and here. And we know companies increase security spending after they experience a data breach or witness one close to home. Most organizations now know that completely eliminating the possibility of a data breach is impossible, and therefore, appropriate detective capabilities are more important than ever. We must act as if the bad guys are on our network and then detect their presence and respond accordingly. See the rest of the Enterprise Big Data Predictions, 2016. Image Source: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

Leading into 2016, Oracle made ten big data predictions, and one in particular around security. We are nearly four months into the year and we've seen these predictions coming to light. Increase in...

Accelerating SQL Queries that Span Hadoop and Oracle Database

It's hard to deliver "one fast, secure SQL query on all your data". If you look around you'll find lots of "SQL on Hadoop" implementations which are unaware of data that's not on Hadoop. And then you'll see other solutions that combine the results of two different SQL queries, written in two different dialects, and run mostly independently on two different platforms. That means that while they may work, the person writing the SQL is effectively responsible for optimizing that joint query and implementing the different parts in those two different dialects. Even if you get the different parts right, the end result is more I/O, more data movement and lower performance. Big Data SQL is different in several ways. (Start with this blog to get the details). From the viewpoint of the user you get one single query, in a modern, fully functional dialect of SQL. The data can be located in multiple places (Hadoop, NoSQL databases and Oracle Database) and software, not a human, does all the planning and optimization to accelerate performance. Under the covers, one of the key things it tries to do is minimize I/O and minimize data movement so that queries run faster. It does that by trying to push down as much processing as possible to where the data is located. Big Data SQL 3.0 completes that task: now all the processing that can be pushed down, is pushed down. I'll give an example in the next post. What this means is cross-platform queries that are as easy to write, and as highly performant, as a query written just for one platform. Big Data SQL 3.0 further improves the "fast" part of "one fast, secure SQL query on all your data". We'd encourage you to test it against anything else out there, whether it's a true cross-platform solution or even something that just runs on one platform.

It's hard to deliver "one fast, secure SQL query on allyour data". If you look around you'll find lots of "SQL on Hadoop" implementations which are unaware of data that's not on Hadoop. And then...

Delegation and (Data) Management

Every business book you read talks about delegation. It's a core requirement for successful managers: surround yourself with good people, delegate authority and responsibility to them, and get out of their way. It turns out that this is a guiding principle for Big Data SQL as well. I'll show you how. And without resorting to code. (If you want code examples, start here). Imagine a not uncommon situation where you have customer data about payments and billing in your data warehouse, while data derived from log files about customer access to your online platform is stored in Hadoop. Perhaps you'd like to see if customers who access their accounts online are any better at paying up when their bills come due. To do this, you might want to start by determining who is behind on payments, but has accessed their account online in the last month. This means you need to query both your data warehouse and Hadoop together. Big Data SQL uses enhanced Oracle external tables for accessing data in other platforms like Hadoop. So your cross-platform query looks like a query on two tables in Oracle Database. This is important, because it means from the viewpoint of the user (or application) generating the SQL, there's no practical difference between data in Oracle Database, and data in Hadoop. But under the covers there are differences, because some of the data is on a remote platform. How you process that data to minimize both data movement and I/O is key to maximizing performance. Big Data SQL delegates work to Smart Scan software that runs on Hadoop (derived from Exadata's Smart Scan software). Smart Scan on Hadoop does its own local scan, returning only the rows and columns that are required to complete that query, thus reducing data movement, potentially quite dramatically. And using storage indexing, we can avoid some unnecessary I/O as well. For example, if we've indexed a data block and know that the minimum value of "days since accessed accounts online" is 34, then we know that none of the customers in that block has actually accessed their accounts in the last month (30 days). So this kind of optimization reduces I/O. Together, these two techniques increase performance. Big Data SQL 3.0 goes one step further, because there's another opportunity for delegation. Projects like ORC or Parquet, for example, are efficient columnar data stores on Hadoop. So if your data is there, Big Data SQL's Smart Scan can delegate work to them, further increasing performance. This is the kind of optimization that the fastest SQL on Hadoop implementations do. Which is why we think that with Big Data SQL you can get performance that's comparable to anything else that's out there. But remember, with Big Data SQL you can also use the SQL skills you already have (no need to learn a new dialect), your applications can access data in Hadoop and NoSQL using the same SQL they already use (don't have to rewrite applications), and the security policies in Oracle Database can be applied to data in Hadoop and NoSQL (don't have to write code to implement a different security policy). Hence the tagline: One Fast, Secure SQL Query on All Your Data.

Every business book you read talks about delegation. It's a core requirement for successful managers: surround yourself with good people, delegate authority and responsibility to them, and get out of...

Oracle Big Data SQL 3.0 adds support for Hortonworks Data Platform and commodity clusters

Big Data SQL has been out for nearly two years and version 3.0 is a major update. In addition to increasing performance (next post) we've added support for clusters that aren't built on engineered systems. And alongside several other Oracle big data software products, Big Data SQL now also supports Hortonworks Data Platform. Before 3.0, the requirements were simple. You could use Big Data SQL to deliver "one fast, secure SQL query on all your data" as long as you were using our Big Data Appliance to run Hadoop and Exadata to run Oracle Database. While those configurations continue, they are not required. If you are running Cloudera Enterprise or Hortonworks Data Platform on any commodity cluster, you can now connect that with your existing Oracle data warehouse. And you don't need Exadata to run Oracle Database (you will need version 12.1.0.2 or later), either. Any system running Linux should do the trick. Many of our other big data products also run on Hortonworks HDP. With version 3.0, Big Data SQL joins Big Data Discovery, Big Data Spatial and Graph, Big Data Connectors, GoldenGate for Big Data and Data Integrator for Big Data. Most Oracle data warehouse customers are running Hadoop somewhere in the organization. If that's you, and you're using Cloudera Enterprise or Hortonworks HDP, then it's much easier now to link those two data management components together. So instead of silos, you can have all your data simply and securely accessible via the language you and your applications already know: SQL.

Big Data SQL has been out for nearly two years and version 3.0 is a major update. In addition to increasing performance (next post) we've added support for clusters that aren't built on engineered...

Learn Predictive Analytics in 2 Days - New Oracle University Course!

What you will learn: This Predictive Analytics using Oracle Data Mining Ed 1 training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to leverage the predictive analytical power of Oracle Data Mining, a component of the Oracle Advanced Analytics option. Learn To: Explain basic data mining concepts and describe the benefits of predictive analysis. Understand primary data mining tasks, and describe the key steps of a data mining process. Use the Oracle Data Miner to build, evaluate, apply, and deploy multiple data mining models. Use Oracle Data Mining's predictions and insights to address many kinds of business problems. Deploy data mining models for end-user access, in batch or real-time, and within applications. Benefits to You When you've completed this course, you'll be able to use the Oracle Data Miner 4.1, the Oracle Data Mining "workflow" GUI, which enables data analysts to work directly with data inside the database. The Data Miner GUI provides intuitive tools that help you explore the data graphically, build and evaluate multiple data mining models, apply Oracle Data Mining models to new data, and deploy Oracle Data Mining's predictions and insights throughout the enterprise. Oracle Data Miner's SQL APIs - Get Results in Real-Time Oracle Data Miner's SQL APIs automatically mine Oracle data and deploy results in real-time. Because the data, models, and results remain in the Oracle Database, data movement is eliminated, security is maximized and information latency is minimized. Introduction Course Objectives Suggested Course Prerequisites Suggested Course Schedule Class Sample Schemas Practice and Solutions Structure Review location of additional resources Predictive Analytics and Data Mining Concepts What is the Predictive Analytics? Introducting the Oracle Advanced Analytics (OAA) Option? What is Data Mining? Why use Data Mining? Examples of Data Mining Applications Supervised Versus Unsupervised Learning Supported Data Mining Algorithms and Uses Understanding the Data Mining Process Common Tasks in the Data Mining Process Introducing the SQL Developer interface Introducing Oracle Data Miner 4.1 Data mining with Oracle Database Setting up Oracle Data Miner Accessing the Data Miner GUI Identifying Data Miner interface components Examining Data Miner Nodes Previewing Data Miner Workflows Using Classification Models Reviewing Classification Models Adding a Data Source to the Workflow Using the Data Source Wizard Using Explore and Graph Nodes Using the Column Filter Node Creating Classification Models Building the Models Examining Class Build Tabs Using Regression Models Reviewing Regression Models Adding a Data Source to the Workflow Using the Data Source Wizard Performing Data Transformations Creating Regression Models Building the Models Comparing the Models Selecting a Model Using Clustering Models Describing Algorithms used for Clustering Models Adding Data Sources to the Workflow Exploring Data for Patterns Defining and Building Clustering Models Comparing Model Results Selecting and Applying a Model Defining Output Format Examining Cluster Results Performing Market Basket Analysis What is Market Basket Analysis? Reviewing Association Rules Creating a New Workflow Adding a Data Source to the Workflow Creating an Association Rules Model Defining Association Rules Building the Model Examining Test Results Performing Anomaly Detection Reviewing the Model and Algorithm used for Anomaly Detection Adding Data Sources to the Workflow Creating the Model Building the Model Examining Test Results Applying the Model Evaluating Results Mining Structured and Unstructured Data Dealing with Transactional Data Handling Aggregated (Nested) Data Joining and Filtering data Enabling mining of Text Examining Predictive Results Using Predictive Queries What are Predictive Queries? Creating Predictive Queries Examining Predictive Results Deploying Predictive models Requirements for deployment Deployment Options Examining Deployment Options

What you will learn: This Predictive Analytics using Oracle Data Mining Ed 1training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to leverage...

Links to Presentations: BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center

Note: Cross-posting this BIWA Summit Links to Presentations blog entry. Can also be found at https://blogs.oracle.com/datamining/entry/links_to_presentations_biwa_summit We had a great www.biwasummit.org event with ~425 attendees, in depth technical presentations delivered by experts and even had several 2 hour Hands on Labs training classes that used the Oracle Database Cloud! Watch for more coverage of event in various Oracle marketing and partner content venues. Many thanks to all the BIWA board of directors and many volunteers who have put in so much work to make this BIWA Summit the best BIWA user event ever. Mark your calendars for BIWA Summit’17, January 31, Feb. 1 & Feb. 2, 2017. We’ll be announcing Call for Abstracts in the future, so please direct your best customers and speakers to submit. We’re aiming to continue to make BIWA + Spatial + YesSQL Summit the best focused user gathering for sharing best practices for novel and interesting use cases of Oracle technologies. BIWA is an IOUG SIG run by entirely by customers, partners and Oracle employee volunteers. We’re always looking for people who would like to be involved. Let me know if you’d like to contribute to the planning and organization of future BIWA events and activities. See everyone at BIWA’17! Charlie, on behalf of the entire BIWA board of directors (charlie.berger@oracle.com) (see www.biwasummit.org for more information) See List of BIWA Summit'16 Presentations below. Click on Details to access the speaker’s abstract and download the files (assuming the speaker has posted them for sharing). We now have a schedule at a glance to show you all the sessions in a tabular agenda. See bottom of page for the Session Search capability Below is a list of the sessions and links to download most of the materials for the various sessions. Click on the DETAILS button next to the session you want to download, then the page should refresh with the session description and (assuming the presenter uploaded files, but be aware that files may be limited to 5MB) you should see a list of files for that session. See the full list below: Advanced Analytics Presentations (Click on Details to access file if submitted by presenter) Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Predictive Modelling and Forecasting using OER Details Enabling Clorox as Data Driven Enterprise Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Large Scale Machine Learning with Big Data SQL, Hadoop and Spark Details Stubhub and Oracle Advanced Analytics Details Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments Details Advanced Analytics for Call Center Operations Details Machine Learning on Streaming Data via Integration of Oracle R Enterprise and Oracle Stream Explorer Details Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.0 Hands on Lab Details Scaling R to New Heights with Oracle Database Details Predictive Analytics using SQL and PL/SQL Details Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud Details Improving Predictive Model Development Time with R and Oracle Big Data Discovery Details Oracle R Enterprise 1.5 - Hot new features! Details Is Oracle SQL the best language for Statistics Details BI and Visualization Presentations (Click on Details to access file if submitted by presenter) Electoral fraud location in Brazilian General Elections 2014 Details The State of BI Details Case Study of Improving BI Apps and OBIEE Performance Details Preparing for BI 12c Upgrade Details Data Visualization at Sound Exchange – a Case Study Details Integrating OBIEE and Essbase, Why it Makes Sense Details The Dash that changed a culture Details Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option Details Oracle Data Visualization vs. Answers: The Cage Match Details What's New With Oracle Business Intelligence 12c Details Workforce Analytics Leveraging Oracle Business Intelligence Cloud Serivces (BICS) Details Defining a Roadmap for Migrating to Oracle BI Applications on ODI Details See What’s There and What’s Coming with BICS & Data Visualization Details Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c Details Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases Details On Metadata, Mashups and the Future of Enterprise BI Details OBIEE 12c and the Leap Forward in Lifecycle Management Details Supercharge BI Delivery with Continuous Integration Details Visual Analyzer and Best Practices for Data Discovery Details BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres Details Oracle Business Intelligence (OBIEE) the Smart View Way Details Big Data Presentations (Click on Details to access file if submitted by presenter) Oracle Big Data: Strategy and Roadmap Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Leveraging Oracle Big Data Discovery to Master CERN’s Control Data Details Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer Details Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform Details High Speed Video Processing for Big Data Applications Details Enterprise Data Hub with Oracle Exadata and Oracle Big Data Appliance Details How to choose between Hadoop, NoSQL or Oracle Database Details Analytical SQL in the Era of Big Data Details Cloud Computing Presentations (Click on Details to access file if submitted by presenter) Oracle DBaaS Migration Road Map Details Centralizing Spatial Data Management with Oracle Cloud Databases Details End Users data in BI - Data Mashup and Data Blending with BICS , DVCS and BI 12c Details Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud Details Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology Details Build Your Cloud with Oracle Engineered Systems Details Safe Passage to the CLOUD – Analytics Details Your Journey to the Cloud : From Dedicated Physical Infrastructure to Cloud Bursting Details Data Warehousing and ETL Presentations (Click on Details to access file if submitted by presenter) Getting to grips with SQL Pattern Matching Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Controlling Execution Plans (without Touching the Code) Details Taking Full Advantage of the PL/SQL Result Cache Details Taking Full Advantage of the PL/SQL Compiler Details Advanced SQL: Working with JSON Data Details Oracle Database In-Memory Option Boot Camp: Everything You Need to Know Details Best Practices for Getting Started With Oracle Database In-Memory Details Extreme Data Warehouse Performance with Oracle Exadata Details Real-Time SQL Monitoring in Oracle Database 12c Details A Walk Through the Kimball ETL Subsystems with Oracle Data Integration Details MySQL 5.7 Performance: More Than 1.6M SQL Queries per Second Details Implement storage tiering in Data warehouse with Oracle Automatic Data Optimization Details Edition-Based Redefinition Case Study Details 12-Step SQL Tuning Method Details Where's Waldo? Using a brute-force approach to find an Execution Plan the CBO hides Details Delivering an Enterprise-Wide Standard Chart of Accounts at GE with Oracle DRM Details Agile Data Engineering: Introduction to Data Vault Data Modeling Details Worst Practice in Data Warehouse Design Details Same SQL Plan, Different Performance Details Why Use PL/SQL? Details Transforming one table to another: SQL or PL/SQL? Details Understanding the 10053 Trace Details Analytic Views - Bringing Star Queries into the Twenty-First Century Details The Place of SQL in the Hybrid World Details The Next Generation of the Oracle Optimizer Details Internet of Things Presentations (Click on Details to access file if submitted by presenter) Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Meet Your Digital Twin Details Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Big Data and the Internet of Things in 2016: Beyond the Hype Details IoT for Big Machines Details The State of Internet of Things (IoT) Details Oracle Spatial Summit Presentations (Click on Details to access file if submitted by presenter) Build Your Own Maps with the Big Data Discovery Custom Visualization Component Details Massively Parallel Calculation of Catchment Areas in Retail Details Dismantling Criminal Networks with Graph and Spatial Visualization and Analysis Details Best Practices for Developing Geospatial Apps for the Cloud Details Map Visualization in Analytic Apps in the Cloud, On-Premise, and Mobile Details Best Practices, Tips and Tricks with Oracle Spatial and Graph Details Delivering Smarter Spatial Data Management within Ordnance Survey, UK Details Deploying a Linked Data Service at the Italian National Institute of Statistics Details ATLAS - Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management Details Oracle Spatial 12c as an Applied Science for Solving Today's Real-World Engineering Problems Details Assembling a Large Scale Map for the Netherlands Using Oracle 12c Spatial and Graph Details Using Open Data Models to Rapidly Develop and Prototype a 3D National SDI in Bahrain Details Implementation of LBS services with Oracle Spatial and Graph and MapViewer in Zain Jordan Details Interactive map visualization of large datasets in analytic applications Details Gain Insight into Your Graph Data -- A hands on lab for Oracle Big Data Spatial and Graph Details Applying Spatial Analysis To Big Data Details Big Data Spatial: Location Intelligence, Geo-enrichment and Spatial Analytics Details What’s New with Spatial and Graph? Technologies to Better Understand Complex Relationships Details Graph Databases: A Social Network Analysis Use Case Details High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph Details 3D Data Management - From Point Cloud to City Model Details The Power of Geospatial Visualization for Linear Assets Using Oracle Enterprise Asset Management Details Oracle Spatial and Graph: New Features for 12.2 Details Fast, High Volume, Dynamic Vehicle Routing Framework for E-Commerce and Fleet Management Details Managing National Broadband Infrastructure at Turk Telekom with Oracle Spatial and Graph Details Other Presentations (Click on Details to access file if submitted by presenter) Taking Full Advantage of the PL/SQL Compiler Details Taking Full Advantage of the PL/SQL Result Cache Details Meet Your Digital Twin Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Lightning Round for Vendors Details

Note: Cross-posting this BIWA Summit Links to Presentations blog entry. Can also be found at https://blogs.oracle.com/datamining/entry/links_to_presentations_biwa_summit We had a great www.biwasummit.org e...

Experimental data labs take off 2016

Oracle's #2 big data prediction out of the 10 predictions for 2016 is experimental data labs will take off. With more hypotheses to investigate, professional data scientists will see increasing demand for their skills from established companies. For example, watch how banks, insurers, and credit-rating firms turn to algorithms to price risk and guard against fraud more effectively. But many such decisions are hard to migrate from clever judgments to clear rules. Expect a proliferation of experiments default risk, policy underwriting, and fraud detection as firms try to identify hotspots for algorithmic advantage faster than the competition. Watch how the CERN European Lab Project's data scientist has delivered self-service analytics flexible enough for engineers to better research the physics of particle collisions in the universe by providing a full picture of the overall status of the accelerator complex. Oracle Big Data Discovery facilitates the data exploration and leverages the power of Hadoop to transform and analyze large amount and variety of data. Another sign that the experimental data labs will take off into data factories of production is from the big data survey Oracle conducted in August 2015 that had 633 global IT decision makers respondents to gauge capabilities. 64% of all global respondents stated that they are able to use big data in real-time as competitive advantage. However, only 45% respondents in the Asia-Pacific region agreed they can respond with big data in real-time. Watch StubHub's principal architect discuss on how they use Oracle Advanced Analytics to understand its customers in its online marketplace. Analysis times are much shorter, setup was fast and easy, and data scientists like the integration of R with the data warehouse. All fans the choice to buy or sell their tickets in a safe, convenient, and highly reliable environment. Read why StubHub senior manager of data science declares, “Big data is having a tremendous impact on how we run our business. Oracle Database and its various options—including Oracle Advanced Analytics—combine high-performance data-mining functions with the open source R language to enable predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations, and interactive graphics—all inside the database.” Learn how other customers are capitalizing on big data and what analysts are saying now at oracle.com/big-data.

Oracle's #2 big data prediction out of the 10 predictions for 2016 is experimental data labs will take off. With more hypotheses to investigate, professional data scientists will see increasing...

Big Data For All? Oracle's 2016 Top 10 Predictions

It's time for Oracle's annual predictions in big data for the year to identify the key areas of change. This is the year the big data adoption trend will begin to make the leap from 3,000 organizations that Hadoop and Spark vendors count as paying customers with most only in development to tens of thousands of organizations in production. The industry will finally begin to shift gears into more mainstream applications, affecting thousands more businesses. We are making 10 predictions for 2016 that have three trend categories: a big expansion to the big data user base, major technology advances, and growing effects on society, politics, and business process. The #1 out of the Top 10 Predictions: Data civilians will operate more and more like data scientists. While complex statistics may still be limited to data scientists, data-driven decision-making shouldn’t be. In the coming year, simpler big data discovery tools will let business analysts shop for datasets in enterprise Hadoop clusters, reshape them into new mashup combinations, and even analyze them with exploratory machine learning techniques. Extending this kind of exploration to a broader audience will both improve self-service access to big data and provide richer hypotheses and experiments that drive the next level of innovation. Oracle conducted a big data survey in August 2015 that had 633 global IT decision makers respondents to gauge top benefits and impediments. 55% of the respondents reported the biggest benefit for big data projects is to simplify access to all data, ahead of 53% for faster and better decision making and 48% for increased business and customer insight at 48%. Empowering business users to analyze their own data ranked as the top purpose of big data at with 83% of respondents in agreement slightly followed combing structured and unstructured data in a single query at 82%. Watch Ovum's Tom Pringle and Oracle's Nick Whitehead discuss Thriving in the Age of Big Data Analytics and Self-Service The demand for organizations to empower business managers and civilians to be as productive as data scientists for decision making is real and the self-service capabilities to find, transform, discover, explore, and share insight from big data and with any other data are available. Learn how other customers are capitalizing on big data and what analysts are saying now at oracle.com/big-data.

It's time for Oracle's annual predictions in big data for the year to identify the key areas of change. This is the year the big data adoption trend will begin to make the leap from 3,000 organizations...

DIY for IT is not like going to the hardware store

I enjoy a trip to the hardware store as much as the next person. I like the feeling of achievement after I've built, painted or repaired something. And I know I've saved money when I compare the cost of the parts I bought, with the cost of paying a pro to do it all for me. Of course, I don't account for my own time or how much extra time it takes to complete the job (or even the risk that I'll have to pay a pro more to fix a mistake). But if you're working in IT, those costs are real. And that's what our third big data prediction for 2016 is all about. For a few years now we've done a white paper comparing the Big Data Appliance with a DIY cluster. Somewhat paradoxically for many people, it really is cheaper to buy an appliance than to build your own from scratch (unless you can get massive discounts from your hardware and software vendors). This shouldn't be a surprise, of course: go and try to build a fridge, toaster or car from parts, and you'll find the same. But it really is about more than the cost of acquisition. Unlike my trip to the hardware store, every organization needs to account for the cost of time spend building, and the opportunity cost of the delays in completing the project. Average build time for a DIY Hadoop cluster is around 6 months. And while I know some people can do it faster, that's an average time and some people are slower. Meanwhile, we have a customer who installed, configured and tested their first Hadoop cluster, based on the Big Data Appliance, in just one week. In that recent white paper showing that the BDA is cheaper than DIY, Enterprise Strategy Group touched on some of this: Beyond this, however, ESG’s research shows that most enterprise organizations feel that stakeholders from server, storage, network, security, and other IT infrastructure and operations teams are important to the success of a big data initiative. Thus, despite the hope and hype, Hadoop on a commodity stack of hardware does not always offer a lower cost ride to big data analytics, or at least not as quickly as hoped. Many organizations implementing Hadoop infrastructures based on human expertise working with commodity hardware and open source software may experience unexpected costs, slower speed to market, and unplanned complexity throughout the lifecycle. Accordingly, ESG’s research further shows that more than three-quarters of respondents expect it will take longer than six months to realize significant business value from a new big data initiative. Cost of acquisition is important, but those hidden costs are more significant in the long run. So take a look at that white paper and see how Oracle can help you accelerate time to value in the cloud or in your own datacenter - no trip to the hardware store needed.

I enjoy a trip to the hardware store as much as the next person. I like the feeling of achievement after I've built, painted or repaired something. And I know I've saved money when I compare the cost...

Data Swamps Try Provenance to Clear Things Up

It’s a Data Lake; Not the Data Loch Ness Loch Ness (Loch Nis) is a large, deep, freshwater loch in the Scottish Highlands extending for approximately 23 miles (37 km) southwest of Inverness. It is one of a series of interconnected, murky bodies of water in Scotland; its water visibility is exceptionally low due to a high peat content in the surrounding soil. It would be a wonder if such mysterious conditions did not give raise to Nessie, the Loch Ness monster. It is not a stretch of analogy to say that corporate data swamps and reservoirs can easily evolve into a series of murky interconnectedness, where ungoverned data could lurk that is not only forgotten but could also be dangerous if not secured properly. Line of business managers are rightly afraid to dig too deep for fear of what they might not find, or more importantly what they might find. Business decision makers will always need a brave and enterprising data scientist to organize exploratory tours to scour for mythical data treasures from the depths of the data lake. Data Provenance to the Rescue Data lineage used to be a nice-to-have capability because so much of the data feeding corporate dashboards came from trusted data warehouses. But in the big data era data lineage is a must-have because customers are mashing up company data with third-party data sets. Some of these new combinations incorporate high-quality, vendor-verified data. But others will use data that’s not officially perfect, but good for prototyping. When surprisingly valuable findings come from these opportunistic explorations, managers will look to the lineage to know how much work is required to raise it to production-quality levels. Data provenance, or data lineage, in whatever form, should be a process that is well thought out, organic and flexible (and by flexible picture Elastigirl from The Incredibles). The data is going to expand, take various forms and come at varying speeds. Data provenance should be able to contort all these data, stitching and stringing them all together wherever they are being used and be able to provide clear answers to questions about the data, because that’s is what is going to make data transparent and trustworthy. And the more we can trust the data, the greater the propensity to use data in decision making in the right context. Data Provenance: Technology, Process or Wizardry Data provenance is an all encompassing area. It aims to capture all ways the data is interconnected with each other in an organization. The most common uses of data provenance of course are impact analysis (figuring out downstream effects of a change in data), or data lineage (the traceability and history of a data element). But there are other questions that it can answer. Data visibility is a critical criterion within an organization handling sensitive and customer data. So is a data topography map which can help analyze data usage, performance and bottle necks within the organization. Historic data is crucial to “travel back in time” to scenarios that needs to be recreated to get answers. The trouble is data is not always digitized. Many decisions are made whose only imprint is in the head of the person who made that decision. Oracle, and whoever is grappling with data provenance and governance issues, has to acknowledge and account for that gap in data. This is where Oracle’s suite of technologies offers a distinct advantage. The Oracle Advantage Oracle, through its transparent set of suites helps capture, stream, store, compute and visualize data and information. From its Big Data solutions, through to its Data Integration products, big data appliance and discovery solutions keep track of data diligently. Oracle Metadata Management is built with data provenance in mind. It “harvests” (pulls data from various systems) metadata (data about data) from various systems and provides answers to data provenance questions, enhancing transparency, governance and trust about data within organizations. Big Data projects generally require a variety of technology strung together to meet its business mandate. For example, if using an Oracle Stack (recommended but you could of course use any other trusted technology), data elements that pass through Oracle GoldenGate for data ingestion, Oracle Data Integrator for data transformation and load into Oracle Big Data Appliance is fully captured and stored and surfaced by Oracle Metadata Management to ensure there are no data black holes. Loch Ness makes for great tourist attraction but what we need are the crystal clear Maldivian data lakes.

It’s a Data Lake; Not the Data Loch Ness Loch Ness (Loch Nis) is a large, deep, freshwater loch in the Scottish Highlands extending for approximately 23 miles (37 km) southwest of Inverness. It is one of...

Analytics

BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center

Oracle Big Data + Analytics + Spatial + YesSQL User Community, ---PLEASE Share with OTHERS!!!--- BIWA Summit 2016 – Big Data + Analytics 3-Day User Conference at Oracle HQ Conference Center has a great lineup!   See Schedule at a Glance to show all the BIWA Summit'16 sessions in a tabular agenda. Download BIWA Summit'16 Full Agenda & Sessions See BIWA Summit’16 Talks by Tracks with Abstracts and Speaker Bios   See some representative talks: Advanced Analytics · Enabling Clorox as Data Driven Enterprise · Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments · Improving Predictive Model Development Time with R and Oracle Big Data Discovery · Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.0 Hands on Lab · Stubhub and Oracle Advanced Analytics · Is Oracle SQL the Best Language for Statistics?—Brendan Tierney, Oralytics BI & Visualization · BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres · Data Visualization at Sound Exchange – a Case Study · Business Intelligence Visual Analyzer Cloud Service: View and Analyze Your Data with customer use case · Electoral fraud location in Brazilian General Elections 2014 · See What’s There and What’s Coming with BICS & Data Visualization Cloud Services · Visual Analyzer and Best Practices for Data Discovery Big Data · The Place of SQL in the Hybrid World—Kerry Osborne, Accenture and Tanel Põder, Gluent · Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform · Analytical SQL in the Era of Big Data · How to choose between Hadoop, NoSQL or Oracle Database Cloud · Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud Data Warehousing & SQL · Panel discussion: Making SQL Great Again (SQL is Huuuuuuuuuuuge!)—Andy Mendelsohn, Executive Vice President for Database Server Technologies, Oracle · Taking Full Advantage of the PL/SQL Result Cache—Steven Feuerstein, Oracle · Why Use PL/SQL?—Bryn Llewellyn, Oracle Journal - Editor’s Pick Spatial & Graph · Deploying a Linked Data Service at the Italian National Institute of Statistics · Gain Insight into Your Graph Data -- A hands on lab for Oracle Big Data Spatial and Graph Internet of Things · Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive · Leveraging Oracle Big Data Discovery to Master CERN’s Control Data   BIWA Inc. is an independent user group SIG of the Oracle Independent User Group (IOUG) See everyone at BIWA Summit’16! Charlie  

Oracle Big Data + Analytics + Spatial + YesSQL User Community, ---PLEASE Share with OTHERS!!!--- BIWA Summit 2016 – Big Data + Analytics 3-Day User Conference at Oracle HQ Conference Center has a great...

On Graph Databases and Cats

We did a series of 10 predictions for 2016 around big data. The fourth one was around data virtualization, a key component of which is Oracle Big Data SQL which we’ve blogged about before. But the prediction was a little broader: "Look for a shifting focus from using any single technology, such as NoSQL, Hadoop, relational, spatial or graph…”. You know about NoSQL, relational and Hadoop. You’ve used a map, so can figure out spatial (though there’s much more to it than just a map). But graphs? Graph databases and graph analytics are growing in use. To some they’re a bit like a cat: difficult to understand and not clear what use they are (though people keep them around). But there are some real uses and I want to give a high level overview, so you can see where they might apply in your organization. Hopefully graph will then be a little easier to understand. I bet you’ve already used graph analytics. You just didn’t realize it at the time. But that game Six Degrees of Kevin Bacon is just graph analytics at work. The most important thing is not the actors, or the movies they were in; it’s the relationships between the actors that matter to generating the answer. Running with this example, one way to store data in a graph database is to capture the actors as nodes along with all their relationships to other actors. Finding someone’s Bacon number is just a matter of following those relationships to find the shortest route between two nodes in the database.** And this is a much quicker and easier task using a graph database than other kinds of data management. Relationships between people play a role in many different applications: - Intelligence agencies want phone call metadata to identify potential terrorism suspects based on who talks to who (independent of what’s actually said). - Insurance companies can identify gangs that make multiple fraudulent claims when they spot relationships between people in multiple, apparently unconnected claims, that couldn’t happen by chance - Communications companies identify influencers and customers likely to churn by seeing who is connected to who (phone call metadata again, amongst other things). - Financial companies can identify and track money laundering by “following the money” (if person A sends $5,000 to business B, who sends it to person C, who sends it back to person A, this is much easier to spot with graph analytics) or by uncovering unexpected relationships between people and organizations. - Any organization can spot productive teams (and find the optimal size for a project team) by studying who calls and emails who (again, independent of the content; metadata is very useful). And it’s not just relationships between people. Master data management (this part is a component of this sub-assembly is a component of this product, to give a manufacturing example) is about relationships between things. Finding the shortest route between two points is in part about the relationships between places. Identifying which distributor can deliver all the needed components in the shortest time potentially mixes both things and places. Big data is not just more data; it’s more types of data. As the relationships within a given data set, or between multiple data sets, grow in complexity, graph analytics should be a tool in your toolbox. Oracle offers Spatial and Graph analytics in Oracle Database and on Hadoop. It’s good to have them around. Just like my cat. Graph databases can be hard to understand. Mine likes the dog ** Which brings up a lovely attribute of graph databases. The shortest route between two nodes is often the key thing you care about. Some node pairs have a shortest route of just 1. Others require 2 hops or even longer routes. And the longest shortest route is called the graph diameter. How often can you say “longest shortest” and have it make sense?

We did a series of 10 predictions for 2016 around big data. The fourth one was around data virtualization, a key component of which is Oracle Big Data SQL which we’ve blogged about before. But the...

Enterprise Class Hadoop, the Best Tool for Mining Data

This is a special contributed post by Charles Zedlewski (@zedlewski), VP of Products at Cloudera, about Oracle's and Cloudera's joint work in bringing enterprise-grade Hadoop to the corporate computing environment. Strata + Hadoop World in Singapore is just around the corner and we were reminded about the importance of big data and how it is playing a more critical role in everything we do. Data isn’t just something that corporate executives consume to make business decisions, but rather, it is something that we all consume and it is shaping the customer experience. There are new emerging data sources like IoT data, social media, and machine logs that are increasing the demand to capture and analyze data. Ultimately, this all boils down to what kind of valuable insights can be attained from data that allow people to take action. In a way, we could look at data as the new gold. Gold’s value is determined by market forces primarily based on supply and demand. Data also has market forces - demand to create insight and supply of time to collect and process the data. The better organizations can manage the supply and demand functions of data, or process more valuable data in less time, the better the return on investment. Furthermore, data is similar to gold because it has to be mined; quality data insights only come from a quality mine that uses the right tools. These tools need to have good performance, redundancy, security, and availability. In other words, these tools need to be enterprise class. Over the past seven years Cloudera has been driving the Hadoop ecosystem by creating a more enterprise class solution. Many of the improvements to Hadoop can be seen in how end customers are moving beyond the standard “data lake” model of Hadoop, where customers were simply aggregating data for eventual consumption. With technology like Oracle Big Data Appliance (BDA), analysts and data scientists are actively drawing insights from the combination of data from traditional sources and newer sources. For example, Wargaming.net, a worldwide massively multiplayer online (MMO) gaming company, recently described how they use Oracle BDA in conjunction with the Oracle Database Appliance and Oracle Advanced Analytics to draw insights about game play. With Oracle BDA, Wargaming.net was able to stand up a Cloudera Enterprise cluster quickly, reducing time to value. They could then quickly mine the data in their organization, resulting in increased revenues of 62% in one of their key sales regions. Like Wargaming.net, many companies are looking for agility, but not just in the speed of setting up a Hadoop cluster but also in how quickly they can get real-time insights from their data. They are looking for real-time analytics and Cloudera is leading the industry by adopting Apache Spark as a part of Hadoop. This September, Cloudera announced the One Platform Initiative, which highlights Spark as a key part of the future of Hadoop. The Hadoop ecosystem is a mix of services that go beyond what Spark can offer alone. By becoming more tightly integrated with Hadoop, Cloudera expects Spark to increase the ROI for customers looking for better security, scale, management and streaming. Besides running 100 times faster than MapReduce on certain data workloads, this demonstrates that Hadoop is not purely defined by HDFS and MapReduce. As described by Mike Olson, “Cloudera’s customers use Spark for portfolio valuation, tracking fraud and handling compliance in finance, delivering expert recommendations to doctors in healthcare, analyzing risk more precisely in insurance and more.” Of course, the quality of the tools you use to mine gold have a tremendous effect on how much the mine produces. In the case of big data, Oracle BDA is the leading solution to most effectively mine and manage data. In many cases, Oracle customers are already using one of Oracle’s database management systems to support critical enterprise systems along with a data warehouse to analyze the data for business insights. Yet, now there are more streams of data flowing in from social media feeds, log files, and new internet enabled devices. Data analysts now need to draw valuable insights from variety of separate, combined, and stream data sources to help their organizations gain a competitive advantage. In support of this demand, recent improvements by Cloudera and Oracle take data mining to the next level of enterprise class data management and analytics. Going forward, these improvements will allow data analysts to have even greater confidence in insights gained from their data mine. As you can see, the Hadoop stack has increased its focus on data quality and integrity. We fully expect this innovation to continue to evolve into more enterprise class software offerings that provides additional manageability, agility, security, and governance features. As Hadoop continues to mature, so will the enterprise feature set. Enterprise grade big data solutions are here to stay and are only getting stronger by the day. Oracle BDA with Cloudera Enterprise are leading the way in turning data gold into the real thing.

This is a special contributed post by Charles Zedlewski (@zedlewski), VP of Products at Cloudera, about Oracle's and Cloudera's joint work in bringing enterprise-grade Hadoop to the corporate...

Heading to Strata + Hadoop World Singapore?

Hot off the heels of Oracle OpenWorld San Francisco, and Strata + Hadoop NY, Oracle will be present at Strata + Hadoop World in Singapore. Come along and meet the team and find out more about how Oracle - the biggest data company - is able to make big data effective in the enterprise. You can also learn more about the new Oracle Cloud Platform for Big Data, launched less than a month ago at Oracle OpenWorld. These new capabilities deliver on our promise to enable customers to take advantage of the same technologies on-premises and in the cloud. No wonder our big data business is growing faster than the market as a whole! In fact, VP Product Management, Big Data, Neil Mendelson will be discussing Real-World Success with Big Data, in his session at 4pm on Wednesday December 2nd, during which he will be sharing the best practices and lessons learned from successful big data projects in Asia and around the world. (Location: 333) Or visit us, and experience our innovative demos at booth #102: · Unlocking Value with Oracle Big Data Discovery · Oracle Big Data Preparation Cloud Service—Ready Your Big Data for Use · Best SQL Across Hadoop, NoSQL, and Oracle Database · Graph, Spatial, Predictive, and Statistical Analytics Alternatively, you can hear Oracle’s Hong Eng Koh and Vladimir Videnovic in their session entitled, ‘Don't believe everything you see on CSI: Beyond predictive policing’, which takes place at 4:50pm on December 2. (Location: 331) Sessions you won’t want to miss: Real-World Success with Big Data 4:00pm–4:40pm Wednesday, 2nd December 2015 Location: 333 Neil Mendelson (Oracle) Companies who are successful with big data need to be analytics-driven. During this session, Neil will look at new analytics capabilities that are essential for big data to deliver results, and discuss how to maximize the time you spend providing differentiation for your organization. This session will also cover some common big data use cases in both industry and government. Don't believe everything you see on CSI: Beyond predictive policing 4:50pm–5:30pm Wednesday, 2nd December 2015 Location: 331 Hong Eng Koh (Oracle), Vladimir Videnovic (Oracle) Public safety and national security are increasingly being challenged by technology; the need to use data to detect and investigate criminal activities has increased dramatically. But with the sheer volume of data and noise, law enforcement organisations are struggling to keep up. This session will examine trends and use cases on how big data can be utilised to make the world a safer place.

Hot off the heels of Oracle OpenWorld San Francisco, and Strata + Hadoop NY, Oracle will be present at Strata + Hadoop World in Singapore. Come along and meet the team and find out more about how...

Big Made Great – Big Data from the Biggest Data Company…Oracle

Big Made Great – Big Data from the Biggest Data Company…Oracle At Oracle OpenWorld last year, big data was big news for over 60,000 attendees. From executive keynotes to the Industry Showcases, from product deep dives to customer panels, there was a wealth of information where the discussions centered on how the phenomenon of big data and the datafication of everything is transforming businesses. This year’s big data discussions at Oracle OpenWorld will center on how the biggest data company is able to make big data effective in the enterprise. Perspectives from Oracle product executive and customers will shed light on big data strategies and best practices for approaching your enterprise infrastructure, data management, security and governance and analytics. Here are some sessions you won’t want to miss: Sunday, October 25 5:00 p.m.–7:00 p.m., Moscone North, Hall D Integrated Cloud Applications and Platform Services Keynote featuring Larry Ellison - Executive Chairman of the Board and Chief Technology Officer, Oracle Oracle has more cloud application, platform, and infrastructure services than any other cloud provider—it has the only truly integrated cloud stack. Larry Ellison will announce a broad set of new products and highlight why integrated cloud will deliver the most innovative and cost-effective benefits to customers Transformation and Innovation in the Data Center: Driving Competitive Advantage for the Enterprise Keynote featuring Intel CEO Brianh Krzanich The last few years have witnessed an incredible transformation in the data center—from the build-out of the cloud, to the power of big data, to the proliferation of connected devices. The pace of this transformation continues to accelerate. This transformation provides both incredible new opportunities as well as new challenges to solve. Intel CEO Brian Krzanich, along with some special guests, will explore these opportunities and challenges, share the innovative solutions Intel and our partners are creating to address them, and show how best-in-class organizations are using this transformation to drive a competitive advantage. Monday, October 26 – Thursday, October 29 Big Data Showcase – Moscone South Exhibition Hall Monday, October 26 2:45 pm – Moscone South 103 Exploiting All Your Data for Business Value: Oracle’s Big Data Strategy [GEN7350] General Session featuring: Andy Mendelson - EVP, Database Server Technologies, Oracle Inderjeet Singh – EVP, Fusion Middleware Development, Oracle Luc Ducrocq, SVP - BI&A NA Leader, HCCg Big data in all its variety is now becoming critical to the development of new products, services and business processes. Organizations are looking to exploit all available data to generate tremendous business value. Generating this new value requires the right approach to discover new insights, predict new outcomes and integrate everything securely with existing data, infrastructure, applications and processes. In this session we’ll explain Oracle’s strategy and architecture for big data, and both present and demonstrate the complete solution including analytics, data management, data integration and fast data. Hitachi Consulting will close the session covering three specific use cases where both companies align to deliver high value, high impact big data solutions. 4:00 pm – Moscone West 2020 The Rise of Data Capital [CON10053] Session featuring Paul Sonderegger, Oracle Big Data Strategist Data is now a kind of capital—as vital as financial capital to the development of new products, services, and business processes. This creates a land-grab competition to digitize and “datafy” key activities before rivals do, intensified pressure to bring down the overall cost of managing and using data capital, and a new thirst for algorithms and analytics to increase the return on invested data capital. In this session, learn about competitive strategies to exploit data capital and hear examples of companies already putting these ideas into action. 5:15 pm – Moscone South 102 Big Data and the Next Generation of Oracle Database [CON8738] Session featuring George Lumpkin, Vice President, Product Management, Oracle Oracle’s data platform for big data is Oracle Big Data Management System, which combines the performance of Oracle’s market-leading relational database, the power of Oracle’s SQL engine, and the cost-effective, flexible storage of Hadoop and NoSQL. The result is an integrated architecture for managing big data, providing all of the benefits of Oracle Database, Oracle Exadata, and Hadoop, without the drawbacks of independently accessed data repositories. In this session, learn how today’s data warehouses are evolving into tomorrow’s big data platforms, and how Oracle is continuing to enhance Oracle Big Data Management System through new database features and new capabilities on top of Hadoop. Tuesday, October 27 4:00pm – Moscone South 104 Oracle Big Data SQL: Deep Dive—SQL over Relational, NoSQL, and Hadoop [CON6768] Session featuring Dan Mcclary, Senior Principal Product Manager, Big Data, Oracle Big data promises big benefits to your organization, but real barriers exist in achieving these benefits. Today’s big data projects face serious challenges in terms of skills, application integration, and security. Oracle Big Data SQL radically reduces these barriers to entry by providing unified query and management of data in Oracle Database, Hadoop, and beyond. In this session, learn how Oracle Big Data SQL uses its Smart Scan feature, SQL, and storage indexing technology to make big data feel small. Additionally, learn how to use your existing skills and tools to get the biggest bang out of big data with Oracle Big Data SQL. 6:15 pm – Moscone South 303 Meet the Experts: Oracle’s Big Data Management System [MTE9564] Jean-Pierre Dijcks, Sr. Principal Product Manager, Oracle Martin Gubar, Big Data Product Management, Oracle Dan Mcclary, Senior Principal Product Manager, Big Data, Oracle New transformative capabilities delivered with Oracle Database 12c and Oracle Big Data Appliance will have a dramatic impact on how you design and implement your data warehouse and information systems. You now have the opportunity to analyze all your data across the big data platform—including Oracle Database 12c, Hadoop, and NoSQL sources—using Oracle’s rich SQL dialect and data governance policies. Attend this session to ask the experts about Oracle’s big data management system, including Oracle Big Data SQL, Oracle In-Memory Database, Oracle NoSQL Database, and Oracle Advanced Analytics. 6:15 – Moscone South 304 Meet the Expert: Oracle Data Integration and Governance for Big Data [MTE10023] Alex Kotopoulis, Product Manager, Oracle Oracle Data Integration and governance provide solutions to future-proof your big data technology and use a tool- and metadata-driven approach to realizing your data reservoir. You now have the opportunity to logically design your big data integration as part of your enterprise architecture and execute it using native Hadoop technologies such as Spark, Hive, or Pig. Attend this session to ask the experts about Oracle’s big data integration and governance, including Oracle Data Integrator, Oracle GoldenGate for Big Data, Oracle Big Data Preparation Cloud Service, and Oracle Enterprise Metadata Management. 6:15 pm – Moscone South 306 Meet the Experts: Oracle Spatial and Graph [MTE9565] Jean Ihm, Principal Product Manager, Oracle Spatial and Graph, Oracle Xavier Lopez, Senior Director, Oracle Jayant Sharma, Director, Product Mgmt, Oracle James Steiner, Vice President, Product Management, Oracle This session is for those interested in learning about customer innovations and best practices with Oracle Spatial and Graph and the Oracle Fusion Middleware MapViewer feature. Meet the experts and discuss benefits derived from the extreme performance and advanced analytics capabilities of the platform. Topics include use cases from application areas such as business analytics, mobile tracking, location-based services, interactive web mapping, city modeling, and asset management. 7:15 pm – Moscone South 303 Oracle NoSQL Database: Meet the Experts [MTE9622] Rick George, Senior Principal Product Manager, Oracle Ashok Joshi, Senior Director, Oracle David Rubin, Director of NoSQL Database Development, Oracle NoSQL is a hot, rapidly evolving landscape. Every Oracle NoSQL Database implementation is different. Whether you’re new to NoSQL or an experienced NoSQL application developer, come to this open Q&A session to hear from experienced NoSQL practitioners. The session features senior members of the Oracle NoSQL Database engineering team, field consultants, and customers and partners who have been using the product in production applications. From the theoretical to the practical, this is your opportunity to get your questions answered. Wednesday, October 28 11:00 pm – Park Central – Metropolitan III Introducing Oracle Internet of Things Cloud Service [CON9472] Henrik Stahl, Vice President, Product Management, Oracle Jai Suri, Director, Product Management, Oracle This session introduces Oracle Internet of Things Cloud Service, which is at the heart of Oracle’s Internet of Things (IoT) strategy. In this session, learn how the feature set helps you quickly build IoT applications, connect and manage devices, configure and monitor security policies, manage and analyze massive amounts of data, and integrate with your business processes and applications. See examples of out-of-the-box integration with other Oracle Cloud services and applications, showing the unique value through the end-to-end integration that Oracle provides. 1:15 pm – Moscone South 309 Big Data Security: Implementation Strategies [CON8747] Martin Gubar, Big Data Product Management, Oracle Bruce Nelson, Principal Sale Consultant, Big Data Lead, Oracle Hadoop has long been regarded as an insecure system. However, that is so 2014! A lot has changed in the past year. As Hadoop enters the mainstream, both its security capabilities and Oracle’s ability to secure Hadoop are evolving. Oracle Big Data Appliance facilitates the deployment of secure systems—greatly simplifying the configuration of authentication, authorization, encryption, and auditing—enabling organizations to confidently store sensitive information. This session shares the lessons learned from implementing secure Oracle Big Data Appliance projects for customers and identifies four security levels of increasing sophistication. As a bonus, we describe the roadmap of big data security as we see it playing out. 3:00 pm - Moscone West 2020 Introduction to Oracle Big Data Discovery [CON9101] Chris Lynskey, Vice President, Product Management, Oracle Ryan Stark, Director Product Management, Oracle Oracle Big Data Discovery and Oracle Big Data Discovery Cloud Service enable anyone to turn raw data in Hadoop into actionable insight in minutes. For organizations eager to get more value out of Hadoop, Oracle Big Data Discovery allows business analysts and data scientists to find, explore, transform, and analyze data, then easily share results with the big data community. This intuitive data discovery solution means analysts don’t need to learn complex tools or rely only on scarce resources, and data scientists can spend time on high-value analysis instead of being mired in data preparation. Join us in this session to hear how Oracle Big Data Discovery can help your organization take a huge step forward with big data analytics. 4:15 – Moscone South 301 Customer Panel: Big Data and Data Warehousing [CON8741] Manuel Martin Marquez, Senior Research Fellow and Data Scientist, Cern Organisation Européenne Pour La Recherche Nucléaire Jake Ruttenberg, Senior Manager, Digital Analytics, Starbucks Coffee Company Chris Wones, Chief Enterprise Architect, 8451 Reiner Zimmermann, Senior Director, DW & Big Data Global Leaders Program, Oracle Serdar Özkan, AVEA In this session, hear how customers around the world are solving cutting-edge analytical business problems using Oracle Data Warehouse and big data technology. Understand the benefits of using these technologies together, and how software and hardware combined can save money and increase productivity. Learn how these customers are using Oracle Big Data Appliance, Oracle Exadata, Oracle Exalytics, Oracle Database In-Memory 12c, or Oracle Analytics to drive their business, make the right decisions, and find hidden information. The conversation is wide-ranging, with customer panelists from a variety of industries discussing business benefits, technical architectures, implementation of best practices, and future directions. Thursday, October 29 10:15 am – Marriott Marquis' Converting Big Data into Economic Value Jim Gardner Senior Director, WSJ. Insights, The Wall Street Journal Rich Clayton Vice President, Business Analytics Product Group, Oracle  

Big Made Great – Big Data from the Biggest Data Company…Oracle At Oracle OpenWorld last year, big data was big news for over 60,000 attendees. From executive keynotes to the Industry Showcases, from...

Biggest Data Company @Strata+Hadoop NY

Strata+Hadoop World September 29–October 1 at the Javits Center in New York is fast approaching and you will not want to miss learning about Big Data Cloud Services from the Biggest Data Company! Hear how to "Simplify Big Data with Platform, Discovery and Data Preparation from the Cloud" from Oracle VPs Product Management, Jeff Pollock and Chris Lynskey on Thursday, October 1st at 1:15 p.m., in Room 3D 06/07. Experience innovative demos at booth #123: · Unlocking Value with Oracle Big Data Discovery · Oracle Big Data Preparation Cloud Service—Ready Your Big Data for Use · Best SQL Across Hadoop, NoSQL, and Oracle Database · Graph, Spatial, Predictive, and Statistical AnalyticsHear from the experts and partners in a Mini-Theater at the booth. Wednesday, September 30 Time Session and Speakers 11:00 a.m. Big Data Graph Analytics for the Enterprise Melli Annamalai, Senior Principal Product Manager, Oracle 11:30 a.m. Is IT Operations a Big Data Problem? Tom Yates, Product Marketing, Rocana 1:30 p.m. Transparently Bursting into Cloud with Hadoop, Workload Brett Redenstein, Director, Product Management, WANdisco 2:00 p.m. Unlocking the Insights in Big Data Prabha Ganapathy, Big Data Strategist, Intel 2:30 p.m. Achieving True Impact with Actionable Customer Insights on Oracle BDA and BDD and Lily Enterprise Steven Noels, CTO, NGDATA 3:00 p.m. Scaling R to Big Data Mark Hornick, Director, Oracle Advanced Analytics, Oracle 3:30 p.m. Big Data Preparation – Avoiding the 80% “Hidden Cost” of Big Data Luis Rivas, Director, Product Management, Oracle Thursday, October 1 3:00 p.m. SQL Performance Innovations for Big Data JP Dijcks, Senior Principal Product Manager, Big Data, Oracle Finally, you can also visit us in the Cloudera Partner Pavilion - K8 and at oracle.com/big-data.

Strata+Hadoop World September 29–October 1 at the Javits Center in New York is fast approaching and you will not want to miss learning about Big Data Cloud Services from the Biggest Data Company! Hear...

Rocana partners with Oracle, supports Oracle Big Data Appliance

Guest Blog Author: Eric Sammer, CTO and Founder at Rocana I’m very excited to announce our partnership with Oracle. We’ve been spending months optimizing and certifying Rocana Ops for the Oracle Big Data Appliance (BDA) and will be releasing certified support for Oracle’s Big Data software. Ever since we worked with the Oracle Big Data team in the early days of the Big Data Appliance Engineered System while still at Cloudera, we’ve had a vision of the power of a pre-packaged application backed by the BDA. Today, our customers have access to an optimized all in one Big Data system with which they can run Rocana Ops to control their global infrastructure. The Oracle BDA is a platform that helps our customers realize their vision of monitoring everything in one place. During our certification we saw the incredible power of the BDA. On a half rack (9 node) system we were clocking 40,000 events per second with subsecond latency to query time. One of our large enterprise customers can monitor an entire application infrastructure with 500 servers using a half-rack Oracle BDA. Fully loaded, these half racks can retain 500 billion events, which amounts to 4.5 months of data retention in just half a rack of floor space. This means that each full rack of of a BDA can monitor one thousand nodes of a high traffic website as well as all of the network status, database events and metrics for the entire system. A fully loaded BDA system of 20 racks could monitor an entire data center with 10,000 machines and keep full detailed historical data for 6 months. This kind of power was unheard of just a few years ago. The raw power and simplicity of the BDA is a boon to our customers, and even more so is the extensibility of Oracle’s Big Data offerings. With Oracle Big Data SQL, Oracle Exadata customers can access Rocana data transparently through their existing Exadata connected applications. Analysts can use the Big Data Discovery software to explore data within Rocana and perform in depth analysis on data center behavior. Certification for these integrations is coming soon and we look forward to enabling these capabilities for our customers. As an Oracle ExaStack Optimized partner, we’re thrilled to be able to offer certified support for Rocana Ops on the Oracle BDA. For both Oracle and Rocana customers, this combination sets a new bar for what’s possible in global data center monitoring. If you’re interested in learning more about Rocana Ops on the Oracle BDA, contact sales@rocana.com. Rocana will also be talking about IT Operations being a Big Data Problem at the Oracle mini theater at Strata Hadoop World in New York (September 29 - October 1) as well as at the Oracle Open World (October 25 - 29). Eric (@esammer) is a co-founder of Rocana and serves as its CTO. As CTO, Eric is responsible for Rocana’s engineering and product. Eric is the author of Hadoop Operations published by O’Reilly Media, and speaks frequently on technology and techniques for large scale data processing, integration, and system management.

Guest Blog Author: Eric Sammer, CTO and Founder at Rocana I’m very excited to announce our partnership with Oracle. We’ve been spending months optimizing and certifying Rocana Ops for the Oracle Big...

Oracle Big Data Discovery v1.1 is Ready

Oracle Big Data Discovery version 1.1 includes major new functionality as well as hundreds of enhancements, providing new value for customers and addressing key feedback from the 1.0 release. Highlights and benefits include: More Hadoop: Big Data Discovery now runs on Hortonworks HDP 2.2.4+, in addition to Cloudera CDH 5.3+. That makes BDD the first Spark-based big data analytics product to run on the top two Hadoop distributions, significantly expanding the user community. In addition, the changes that enable BDD to run on Hortonworks also make it easier to port to other distributions, paving the way for even broader community in the future. For Cloudera CDH, customers have the option to run BDD 1.1 on the Oracle Big Data Appliance as part of an Engineered System or commodity hardware; for Hortonworks HDP, customers can run BDD on commodity hardware. More data: Customers can now access enterprise data sources via JDBC, making it easy to mash up trusted corporate data with big data in Hadoop. BDD 1.1 elegantly handles changes across all this data, enabling full refreshes, incremental updates, and easy sample expansions. All data is live, which means changes are reflected automatically in visualizations, transformations, enrichments, and joins. BDD 1.1 includes all Oracle Endeca Information Discovery functionality and more. More visual: Dynamic visualizations fuel discovery – but no product can include every visualization out-of-the-box. This release includes a custom visualization framework that allows customers and partners to create and publish any visual and have it behave like it’s native to BDD. Combined with new visualizations and simpler configuration, this streamlines the creation of discovery dashboards and rich, reusable discovery applications. More wrangling: Big Data Discovery is unique in allowing customers to find, explore, transform, and analyze big data all within a single product. This release significantly extends BDD Transform, making it both easier and more powerful. New UIs make it easy to derive structure from messy Hadoop sources, guiding users through common functions, like extracting entities and locations, without writing code. Transformation scripts can be shared and published, driving collaboration, and scripts can be scheduled, automating enrichment. Transform also includes a redesigned custom transformation experience and the ability to call external functions (such as R scripts), providing increased support for sophisticated users. Together with an enhanced architecture that makes committing transformations much faster, these capabilities greatly accelerate data wrangling. More security: Secure data and analytics are a hot topic in the big data community. BDD 1.1 addresses this need by supporting Kerberos for authentication (both MIT and Microsoft versions); enabling authorization via Studio (including integration with LDAP) to support Single Sign-on (SSO); and providing security at both project and dataset levels. These options allow customers to leverage their existing security and extend fine-grained control to big data analytics, ensuring people see exactly what they should. More virtual: BDD 1.1 joins the many of the key big data technologies that part of Oracle's big data platform in the Oracle Big Data Lite Virtual Machine for testing and educational purposes. Learn more at oracle.com/big-data.

Oracle Big Data Discovery version 1.1 includes major new functionality as well as hundreds of enhancements, providing new value for customers and addressing key feedback from the 1.0...

Evolution of Your Information Architecture

A Little Background Information quality is the single most important benefit of an information architecture. If information cannot be trusted, then it is useless. If untrusted information is part of an operational process, then the process is flawed and must be mitigated. If untrusted information is part of an analytical process, then the decisions will be wrong. Architects work hard to create a trustworthy architecture. Furthermore, most architects would agree that regardless of data source, data type, and the data itself, data quality is enhanced by having standardized, auditable processes and a supporting architecture. In the strictest enterprise sense, it is more accurate to say that an information architecture needs to manage ALL data – not just one subset of data. Big Data is not an exception to this core principle. The processing challenges for large, real-time, and differing data sets (aka Volume, Velocity, and Variety) do not diminish the need to ensure trustworthiness. The key task in Big Data is to discover, ‘the value in there somewhere.’ But we cannot expect to find value before the data can be trusted. The risk is that treated separately, Big Data can easily add to the complexity of a corporate IT environment as it continues to evolve through frequent open source contributions, expanding cloud services, and true innovation in analytic strategies. Oracle’s perspective is that Big Data is not an island. Nearly every use case ultimately blends new data and data pipelines with old data and tools, and you end up with an integration, orchestration, transformation project. Therefore, the more streamlined approach is to think of Big Data as merely the latest aspect of an integrated enterprise-class information management capability. It is also important to adopt an enterprise architecture approach to navigate your way to the safest and most successful future state. By taking an enterprise architecture approach, both technology and non-technology decisions can be made ensuring business alignment, a value centric roadmap, and ongoing governance. Learn more about Oracle’s EA approach here. A New White Paper So, in thinking about coordinated, pragmatic enterprise approaches to Big Data, Oracle commissioned IDC to do a study that illustrates how Oracle customers are approaching Big Data in the context of their existing and planned larger enterprise information architectures. The study was led by Dan Vesset, head of business analytics and big data research at IDC, who authored the paper, titled Six Patterns of Big Data and Analytics Adoption: The Importance of the Information Architecture, and you can get it here. Highlights - Three Excerpts from the Paper Patterns of Adoption The paper explores six Big Data use cases across industries that illustrate various architectural approaches for modernizing their information management platforms. The use cases differ in terms of goals, approaches, and outcomes, but they are united in that each company highlighted has a Big Data strategy based on clear business objectives and an information technology architecture that allows it to stay focused on moving from that strategy to execution. Case Industry Project Motivation Scope 1 Banking Transformational modernization Transform core business processes to improve decision-making agility and transform and modernize supporting information architecture and technology. 2 Retail Agility and resiliency Develop a two-layer architecture that includes a business process–neutral canonical data model and a separate layer that allows agile addition of any type of business interpretation or optimization. 3 Investment Banking Complementary expansion Complement the existing relational data warehouse with a Hadoop-based data store to address a near-real-time financial consolidation and risk assessment. 4 Travel Targeted enablement Improve a personalized sales process by deploying a specific, targeted solution based on real-time decision management while ensuring minimal impact on the rest of the information architecture. 5 Consumer Packaged Goods Optimized exploration Enable the ingestion, integration, exploration, and discovery of structured, semi-structured, and unstructured data coupled with advanced analytic techniques to better understand the buying patterns and profiles of customers. 6 Higher Education Vision development Guarantee architectural readiness for new requirements that would ensure a much higher satisfaction level from end users as they seek to leverage new data and new analytics to improve decision making. Copyright IDC, 2015 Oracle in the Big Data Market Oracle offers a range of Big Data technology components and solutions that its customers are using to address their Big Data needs. In addition, the company offers Big Data architecture design and other professional services that can assist organizations on their path to addressing evolving Big Data needs. The following figure shows Oracle’s Big Data Platform aligned with IDC’s conceptual architecture model. Copyright IDC, 2015 Lessons Learned Henry David Thoreau said, "If you have built castles in the air, your work need not be lost; that's where they should be. Now put the foundations under them." The information foundation and architecture on which it is based is a key building block of these capabilities. In conducting IDC's research through interviews and surveys with customers highlighted in this white paper and others, we have found the following best practices related to the information architecture for successful Big Data initiatives: Secure executive sponsorship that emphasizes the strategic importance of the information architecture and ensure that the information architecture is driven by business goals. Develop the information architecture in the context of the business architecture, application architecture, and technology architecture — they are all related. Create an architecture board with representation from the IT, analytics, and business groups, with authority to govern and monitor progress and to participate in change management efforts. Design a logical architecture distinct from the physical architecture to protect the organization from frequent changes in many of the emerging technologies. This enables the organization to maintain a stable logical architecture in the face of a changing physical architecture. Consider the range of big use cases and end-user requirements of Big Data. Big Data is not only about exploration of large volumes of log data by data scientists. Even at the early stages of a project when evaluating technologies, always consider the full range of functional and nonfunctional requirements that will most likely be required in any eventual deployment. Bolting them on later will drive costs and delays and may require a technology reevaluation. This is yet another reason why an architecture-led approach is important. Oracle also has a variety of business and technical approaches to discussing Big Data and Information Architecture. Here are a few: Big Data Reference Architecture Information Architecture Reference Architecture 12 Industry-specific Guides for Big Data Business Opportunities Oracle Big Data Products

A Little Background Information quality is the single most important benefit of an information architecture. If information cannot be trusted, then it is useless. If untrusted information is part of an...

Utilities Are Getting Smarter Using Big Data, from WSJ. Custom Studios

The final industry-specific research from WSJ. Custom Studios on how senior executives plan to empower their organizations with big data is about utilities. Utilities gather huge amounts of operational and customer data but struggle to apply data to solve critical business problems. Utilities grapple with translating the vast amount of performance data from power plants, transmission lines, and thermostats that are always on, sensing fluctuations in power, temperature, and usage. Many utilities rely on industry-specific applications more than others yet rank these apps as less business-critical; 83 percent plan to grow their business analyst team and 64 percent contract with vendors to host and manage their business-critical data, second only to financial services. Utilities that are on the cutting edge of developing such cultures are applying analytics to new uses and new kinds of data, especially with smart metering. Take the issue of unpredictable energy demand: Shifts in customer behavior, such as the growing use of plug-in electric vehicles, have made it increasingly difficult for utilities to predict how much energy will be needed by using traditional forecasting techniques. A sudden, unexpected bump in demand can cause blackouts or require a utility to acquire energy at a high cost. Kansas City Power & Light (KCP&L) is an example of a regulated electric utility successfully partnering with Oracle to implement its advanced metering initiative spanning network management, outage analytics, smart grid gateway, and meter data management. KCP&L is now able to bring its reliable, smart-grid operations and business processes together to enable storm-proven outage management and key distribution management functions, which in turn help to improve operational excellence and pave a streamlined pathway for critical customer communications. Additionally, KCP&L gains a wealth of useful, granular insight from Oracle on various aspects of energy production and distribution—such as field-crew performance related to short- and long-cycle projects, that establishes a decision-support system to improve business performance and reduce operational costs. To learn more about how to convert your big data as a utility, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.      

The final industry-specific research from WSJ. Custom Studios on how senior executives plan to empower their organizations with big data is about utilities. Utilitiesgather huge amounts of...

Telecommunications: Timing Is Everything, from WSJ. Custom Studios

Following up on the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now call on telecommunications companies using big data to make customers happier—and save lives. Do you receive advertising offers that know where you live and shop? Wonder how they know who and where you are? The answer may well be your mobile, Internet, cable, and landline service providers. Telecommunications companies collect more information about customers than almost any other industry: where people relocate, who they chat with, what they look at online. And telecom executives are eager to draw more intelligence from their great reservoirs of data—theirs is one of the top industries, making it an utmost priority. Telecom companies are relatively more data-driven than the other five industries in the research, distinguishing them as highly effectively in data management. Drawing intelligence from data and expanding analytics skills are top priorities. Much of telecom’s big-data focus today is on location-based services. For instance, telecom operators can capture a customer’s location as he or she enters a certain area (called “geo-fencing”) and create targeted promotions. Telecoms are investing in optimizing subscriber and network information processing to better understand subscriber behavior, improve subscriber retention, and increase cross-selling of mobile communications products and services. Necessary for optimizing the telecom information is strengthening analytics and reporting capabilities for better insight into customer preferences—to improve marketing campaigns and new service offerings from leisure entertainment to vital telemedicine. Customer-care agents must be enabled to respond to customer queries regarding network faults, connection errors, and charging and service options in as close to real time as possible. One successful organization partnering with Oracle to tackle big data is Globacom. It is saving inordinate call-processing minutes daily to improve decision-making and customer service, as well as providing vital telemedicine in remote Middle Eastern and African areas. Globacom’s COO Jameel Mohammed states, “Oracle Big Data Appliance enables us to capture, store, and analyze data that we were unable to access before. We now analyze network events forty times faster; create real-time, targeted marketing messages for data-plan users; and save 13 million call-center minutes per year by providing first-call resolution to more and more customers. There is no other platform that offers such performance.” To learn more about how to convert your data into value for telecom service providers, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.

Following up on the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now call...

Retailers Get Personal: Improving the Customer Experience, from WSJ. Custom Studios

Following up on the global summary research of WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now journey into how This refers to customers using more than one channel to buy goods, such as purchasing an item online then picking it up in a brick-and mortar store, browsing or sharing sentiment on social media, emailing and texting from mobile phones, and calling customer service from home. These complex, omnichannel shopping models are driving retailers to embrace more scalable and robust analytics and IT platforms that can capture web activity logs and transactional records in stores. Responding to customers who are both mobile and social in real time is no small challenge. Retailers have long gathered customer data tied to loyalty cards, the majority of which show what items customers previously purchased, as well as demographic data. The customer data illustrates past buying patterns, but might not be indicative of future demand. Utilizing additional Hadoop data such as Internet search, clickstream, mobile location-based services, weather, and social media sources can help retailers gain a better understanding of future customer demand, as well as a better view of the customer, and his or her family and network buying patterns. The savvy use of predictive analytics and next-best offers also have the potential to please customers and provide them with a better experience: a winning strategy in an increasingly competitive and demanding marketplace. Consumer science company dunnhumby provides an example of delivering big data solution to retailers. Watch as dunnhumby’s Director Denise Day tells how an Oracle big data platform helps their analysts no longer be confined to only sample data sizes and relieved of the inefficiencies in searching for collected data. Now analysts can view 100 percent of the data, drill down for anomalies, and understand individual behavior at the detail level. “Our analysts don’t have to learn new coding languages, they use the same single SQL language and it will bring back results. It doesn’t matter where the data is stored for analysts as long as they can get the answers that they need,” she says. To learn more about how to convert your data into value for retail, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.

Following up on the global summary research of WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now journey into how This...

Manufacturing and the Search for More Intelligence, from WSJ. Custom Studios

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives identify the economic potential they view in harnessing big data, we will now explore how the manufacturing industry manages and analyzes big data to improve processes and supply chains. More than any other industry, executives in manufacturing put a high priority on drawing intelligence from their big-data stores. Factories are becoming more automated and smarter, allowing machines to “talk” to another and quickly exchange the data necessary to improve the manufacturing process, reduce labor cost, and speed production. Operations managers use advanced analytics to mine historical process data, identifying patterns and relationships among discrete process steps and inputs. This allows manufacturers to optimize the factors that prove to have the greatest effect on yield. Making better use of dataallows manufacturers to understand and address the cross-functional drivers of cost, such as warehouses being incentivized to keep stockouts down while production lines are incentivized to reduce costs. For example, Riverbed Technology improved low test yields by empowering engineers to proactively discover root cause analysis with a greater variety and range of detailed data and data sets faster and better, spending less time on inefficient data techniques. Watch Riverbed Director Keith Lavery, who says, “The business value that’s been received is around getting the product out the door faster and being able to reduce the manpower in testing the applications prior to the leaving the factory floor.” Using data more effectively can identify instances where companies are working at cross-purposes, such as consumer goods manufacturer Proctor & Gamble (P&G). To better understand the myriad of P&G brand performance and market conditions, P&G needed to clearly and easily understand its rapidly growing and vast amounts of data across regions and business units. The company integrated structured and unstructured data across research and development, supply chain, customer-facing operations, and customer interactions, both from traditional data sources and new sources of online data. P&G Associate Director Terry McFadden explains, “With Oracle Big Data Appliance, we can use the powerful Hadoop ecosystem to analyze new data sources with existing data to drive profound insight that has real value for the business.” To learn more about how to convert your data into value for manufacturing, you can download the enterprise architect’s guide here and visit oracle.com/big-data for more information.  

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives identify the economic potential they view in harnessing big data, we will now explore...

Health Care Looks to Unlock the Value of Data, from WSJ. Custom Studios

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now examine how personalized medicine in the health care industry hinges on analytics. As an industry, health care faces numerous challenges, and health care companies see data and analytics as a path to resolving them—from improving clinical practices to increasing business efficiencies. Health care providers are facing a growing need to manage costs and understand patients more holistically to improve the quality of care and patient outcomes. In general, the industry’s desire is to move towards evidence-based medicine as opposed to trial-and-error approaches. In order to meet these goals, organizations are analyzing and managing vast volumes of clinical, genomic, financial, and operational data while rapidly evolving the information architecture. Health information organizations have long gathered information about patient encounters, but only in the last few years has much of this information entered the digitized world in the form of EMRs (electronic medical records), wearable devices, smartphones, and social media adoption, which allow quick access to data on a near real-time basis. The data can also help expose different treatments and their associated outcomes. Even though clinical research data is fueling an initial set of analytics platforms, provider organizations are looking beyond clinical information alone to provide superior care while reducing cost. An example of one Oracle customer innovating with big data to improve patient outcomes is the University of Pittsburgh Medical Center. Watch UPMC’s Vice President Lisa Khorey, who states, “With Oracle’s Exadata, Advanced Analytics, and purpose-built applications, we have a high performance platform that can personalize treatment and improve health care outcomes.” Vice President John Houston also speaks to how health care is transforming around mobile, privacy, and regulations, and how Oracle’s cloud platform supports their health system. To learn more about how to convert your data into value for health care payers and life sciences manufacturers, you can download the enterprise architect’s guide here and here, respectively, and visit oracle.com/big-data for more information.

As a follow-up to the global summary of the research conducted by WSJ. Custom Studios on how senior executives are investing in the economic potential they view in harnessing big data, we will now...

Converting Big Data into Economic Value from WSJ. Custom Studios

Data is now a kind of capital, on par with financial capital for creating new products, services, and business models. The implications for how companies capture, keep, and use data are far greater than the spread of fact-based decision-making through better analytics. In some cases, data capital substitutes for traditional capital and explains most of the market valuation premium enjoyed by digitized companies. But most companies don’t think they’re ready for big data. Oracle recently commissioned Wall Street Journal Custom Studios and the private research think tank Ipsos to conduct an online survey of over 700 senior executives along with interviews with subject matter experts to understand their biggest opportunities, challenges, and areas of investment with big data. You can read the global summary of the research, “Data Mastery: The Global Driver of Revenue,” and snapshot here. The key findings are that garnering insights in the new world of big data is a top three priority for 86 percent of all respondents—and the number one priority for a third of the participants. In addition, 81 percent plan to expand their business analyst staff. The bottom line is 98 percent of executives who responded believe they are losing an average of 16 percent of annual revenue as a result of not effectively managing and leveraging business information—information that is available on an unprecedented scale and rate from a variety of cloud, mobile, social, and sensor technology devices and platforms. The newer information generated on searches, clickstreams, sentiment, and performance by people and things—in combination with customer and operations data that businesses have traditionally managed—has enormous potential for business value and customer experiences. Stay tuned for more of the industry findings from the research in this blog series that will also feature Oracle customer-success stories and architecture guides to help you convert big data into economic value. Learn more now at oracle.com/big-data.  

Data is now a kind of capital, on par with financial capital for creating new products, services, and business models. The implications for how companies capture, keep, and use data are far greater...

Big Data and the Future of Privacy - paper review (Part 3 of 3)

This is part 3 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a data-driven future and suggest some of the legal and ethical principles that need to be built into that future. Authors Richards and King suggest a three-pronged approach to protecting privacy as information rules: regulation soft regulation big data ethics New regulation will require new laws and practitioners of big data can seek to influence those laws but ultimately only maintain awareness and adherence to those laws and regulations. Soft regulation occurs when governmental regulatory agencies apply existing laws in new ways, such as the Federal Trade Commission is doing as described in a previous post. It also occurs when entities in one country must comply with the regulatory authority of another country to do business there. Again this is still a matter of law and compliance. The authors argue that the third prong, big data ethics, will be the future of data privacy to a large extent because ethics do not require legislation or complex legal doctrine. "By being embraced into the professional ethos of the data science profession, they [ethical rules] can exert a regulatory effect at a granular level that the law can only dream of." As those that best understand the capabilities of the technology, we must play a key role in establishing a culture of ethics around its use. The consequences of not doing so are public backlash and ultimately stricter regulation. Links to Part 1 and Part 2

This is part 3 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a...

Big Data and the Future of Privacy - paper review (Part 2 of 3)

This is part 2 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a data-driven future and suggest some of the legal and ethical principles that need to be built into that future. Authors Richards and King identify four values that privacy rules should protect that I will summarize here from my own perspective. Identity Identities are more than government ID numbers and credit car accounts. Social Security numbers and credit cards can be stolen, and while inconvenient and even financially damaging, loss of those doesn't change who we are. However when companies use data to learn more and more about us without limit, they can cross the boundaries we erect to control our own identity. When Amazon and Netflix give you recommendations for books, music, and movies, are they adapting to your identity or are they also influencing it? If targeted marketing becomes so pervasive that we live in bubbles where we only hear messages driven by our big data profiles, then is our self-determination being manipulated? As the authors state, "Privacy of varying sorts - protection from surveillance or interference - is what enables us to define our identities." This raises the question of whether there is an ethical limit to personal data collection and if so, where is that limit? Equality Knowledge is power and data provides knowledge. Knowledge resulting from data collection can be used to influence and even control. Personal data allows sorting of people and sorting is on the spectrum with profiling and discrimination. One possible usage of data-driving sorting is price discrimination. Micro-segmented customer profiles potentially allow companies to charge more to those that are willing to pay more because they can identify that market segment. Another ominous usage of big data is to get around discrimination laws. A lender might never ask your race on a loan application but it might be able figure out your race from other data points that it has access to. We must be careful that usage of big data does not undermine our progress towards equality. Security As pointed out earlier, the sharing of personal data does not necessarily remove an expectation of privacy. Personal privacy requires security by those that hold data in confidence. We provide personal information to our medical providers, banks, and insurance companies but we also expect them to protect that data from disclosures that we don't authorize. Data collectors are obligated to secure the data they posses with multiple layers of protection that guard data from both internal and external attack. Trust Privacy promotes trust. When individuals are confident their information is protected and will not be misused, they are more apt to share. Consider doctor/patient confidentiality and attorney/client privilege. These protections promote trust that enable an effective relationship between the two parties. Conversely, when companies obtain information under one set of rules and then use it in another way by combining it with other data in ways the consumer did not expect, it diminishes trust. Trust is earned through transparency and integrity. See Part 3 for a three-pronged approach to protecting privacy. Link to Part 1

This is part 2 of a review of a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors assert the importance of privacy in a...

Announcing Oracle Big Data Spatial and Graph

We recently shipped a new big data product: Oracle Big Data Spatial and Graph. We’ve had spatial and graph analytics as an option for Oracle Database for over a decade. Now we’ve taken that expertise and used it to bring Spatial and Graph analytics to Hadoop and NoSQL. But first, what are spatial and graph analytics? I'll just give a quick summary here. Spatial analytics involves analysis that uses location. For example, Big Data Spatial and Graph can look at datasets that include, say, zip code or postcode information and add or update city, state and country information. It can filter or group customer data from logfiles based on how near one customer is to one another. Graph analytics is more about how things relate to each other. It’s about relative, rather than absolute relationships. So you could use graph analytics to analyze friends of friends in social networks, or build a recommendation engine to recommend products to (related in the network) shoppers. Next question is why move this capability to Hadoop and NoSQL? First, we wanted to support the different kinds of data sets and the different workloads, which included being able to process this data natively on Hadoop and in parallel using MapReduce or in-memory structures. Secondly, our overall big data strategy has always been to minimize data movement, which means doing analysis and processing where the data lies. Oracle Big Data Spatial and Graph is not just suitable for existing Oracle Database customers - if you need spatial or graph analytics on Hadoop this will meet your needs even if you don’t have any other Oracle software. But of course, we’re hoping that existing customers will be as interested in it as Ball Aerospace: "Oracle Spatial and Graph is already a very capable technology. With the explosion of Hadoop environments, the need to spatially-enable workloads has never been greater and Oracle could not have introduced "Oracle Big Data Spatial and Graph" at a better time. This exciting new technology will provide value add to spatial processing and handle very large raster workloads in a Hadoop environment. We look forward to exploring how it helps address the most challenging data processing requirements." - Keith Bingham, Chief Architect and Technologist, Ball Aerospace

We recently shipped a new big data product: Oracle Big Data Spatial and Graph. We’ve had spatial and graph analytics as an option for Oracle Database for over a decade. Now we’ve taken that expertise...

Big Data and the Future of Privacy - paper review (Part 1 of 3)

What is privacy and what does it really mean for big data? Some say that privacy and big data are incompatible. Recall Mark Zuckerberg's comments in 2010 that the rise in social media means that people no longer have an expectation of privacy. I recently read a paper titled Big Data and the Future of Privacy from the Washington University in St. Louis School of Law where the authors argue the opposite. Their ideas provide more food for thought in the quest for guiding principles on privacy for big data solutions. What do we mean when we talk about data privacy? Can data be private if it is collected and stored by another party? The paper's authors, Neil M. Richards and Jonathan H. King, take on these questions but point out privacy is difficult to define. We often think of private data as being secret or unobserved yet we share information with others with an expectation of privacy and protection. Rather than getting wrapped up in the nuances of a legal definition they suggest that for personal information in digital systems, information exists in intermediate states between public and private and the information should not lose legal protection in those intermediate states. The authors suggest that a practical approach to dealing with data privacy is to focus on the rules that govern that data. See Parts 2 and 3 for an overview of their suggested values and rules for data privacy.

What is privacy and what does it really mean for big data? Some say that privacy and big data are incompatible. Recall Mark Zuckerberg's comments in 2010 that the rise in social media means that...

Oracle Named World Leader in the Decision Management Platform

Decision management platforms are increasingly essential to competitive advantage in the era of big data analytics providing a more comprehensive organizational view from strategy to execution---generating fast and high returns on investment. The decision management market has only recently emerged as platforms of an integrated set of business rules and advanced analytic modeling for more accurate, high-volume decision making. Decision management has evolved into a more collaborative process of business analysts and modeler developers scientifically experimenting in digital channels with dynamic content, imagery, products, and services most common to deliver next best marketing offers. The content and context aware-decisions prompt customer-facing staff with recommendations and websites, emails, mobile apps, as well as supply chain systems are continuously tailored to take more precise, local, and personalized actions. IDC has released its Marketscape for the Worldwide Decision Management Software Platform 2014 Vendor Assessment and named Oracle Real-Time Decisions (RTD) as among first to market and the world leader. · Appealing to the business user or CMO while maintaining the flexibility and control by advanced analytic modelers · High-performance analytics tuned for a single engineered system · Self-learning or machine-learning optimizing capabilities · Integration and portability across deployment options through Java “The returns organizations cited during interviews for this study were impressive — so much so that no single organization would permit publication of the outcomes because the specific decision management solutions were viewed as key to creating a competitive advantage." said Brian McDonough, research manager, Business Analytics Solutions, IDC. "Organizations can feel confident that they can see real and impressive business benefits from any of these solutions today."    

Decision management platforms are increasingly essential to competitive advantage in the era of big data analytics providing a more comprehensive organizational view from strategy to execution---genera...

Big Data Privacy and the Law

In a previous post I discussed a presentation given at Strata+Hadoop. Another one of the Law, Ethics, and Open Data sessions at Strata+Hadoop that I had a chance to attend was by two attorneys, Alysa Z. Hutnik and Lauri Mazzuchetti, from a private law practice talking about Strategies for Avoiding Big Privacy “Don’ts” with Personal Data. I found it very interesting and you can see their slides here. They provided the regulatory perspective on personal data and I must add that lawyers are really good at making you aware of all the ways you can end up in court. Technology is moving so quickly and governmental legislative bodies move so glacially, that regulation will likely always lag behind. That doesn't mean that companies are off the hook when it comes to personal data and privacy regulation. I learned that in the absence of specific legislation, governments will find ways to regulate using existing law. In the US, the Federal Trade Commission has taken up the cause of consumer data privacy consistent with their mission to "protect consumers in the commercial sphere" and, according to the speakers, identified three areas that it focused on in 2014: Big data Mobile Technology Protecting sensitive information The FTC is adding Internet of Things to that list for 2015 with a report released in January titled Internet of Things: Privacy & Security in a Connected World based on a workshop they held in November 2013. In terms of regulating security and privacy, the FTC states in the report that it will "continue to use our existing tools to ensure that IoT companies continue to consider security and privacy issues as they develop new devices." When the FTC refers to its "existing tools", it means enforcement of "...the FTC Act, the FCRA, the health breach notification provisions of the HI-TECH Act, the Children’s Online Privacy Protection Act, and other laws that might apply to the IoT." The report also said that "...staff will recommend that the Commission use its authority to take action against any actors it has reason to believe are in violation of these laws." It's clear that the industry cannot put its head in the sand by overlooking or ignoring privacy concerns. The speakers made a good case for considering the legal implications when working with personal data and they made some recommendations. Think privacy from the start by designing-in privacy and security. Suggested methods include limiting data, de-identifying data, securely storing retained data, restricting access to data, and safely disposing of data that is no longer needed. Empower consumer choice. In apps, give users tools that enable choice, make it easy to find and use those tools, and honor the user's choices. Regularly reassess your data collection practices. Consider your purpose in collecting the data, the retention period, third-party access, and the ability to make a personally identifiable profile of users. Be transparent. Do not hide or misrepresent what data you are collecting and what you are doing with that data. Be open about the third party access to your data, including what happens after termination and/or deletion of user accounts. Platform providers should provide frequent and prominent disclosures using just-in-time principles and also by providing a holistic view of data collection. Also, consumers should be able to easily contact providers and there should be a process for responding to consumer concerns. Providers also need to find ways to effectively educate users about privacy settings. I won't cover all of their recommendations but there are lessons here that we can apply as we build out big data applications.

In a previous post I discussed a presentation given at Strata+Hadoop. Another one of the Law, Ethics, and Open Data sessions at Strata+Hadoop that I had a chance to attend was by two attorneys, Alysa...

Unmask Your Data. The Visual Face of Hadoop (part 5 of 5)

Following on from the previous posts we looked at the fundamental capabilities for an end-to-end big data discovery product. We discussed 1) the ability to 'find' relevant data in Hadoop, 2) the ability to 'explore' a new data set to understand potential and in our post recent post, the capability to 3) transform big data to make it better. We now turn our focus to our final concepts: 4. Discover (blend, analyze and visualize). Once the data is ready we can only then start to really analyze it. The BDD discover page is an area that allows users to build interactive dashboards that expose new patterns in the data by dragging and dropping from a library of visual components onto the page. Anything from a beautiful chart, heat map or word cloud to a pivot table, a raw list of individual records or search results. Users can join and combine with other data sets they find or upload to the data lake to widen perspectives and deepen the analytic value. Dashboards created are fully interactive. Users can filter through the data by selecting any combination of attribute values in any order, they can further refine using powerful keyword search to look for specific terms or combinations of terms in unstructured text. A discovery tool needs to encourage free-form interaction with the data to enable users to ask unanticipated questions and BDD is founded upon this idea. 5. Share (insights and data for enterprise leverage). Big data analytics is a team sport so sharing and collaboration of work is fundamental to the discovery process. To honor this we made, all the Big Data Discovery projects and dashboard pages built by end users are shareable with other users (if granted permission) so they can work together. Users can even share specific analysis within project pages via bookmarks or take visual snapshots of the pages and then put these into galleries and publish to tell stories about the data. But it's not just the analysis that are shareable in BDD, one of the most frequently requested capabilities we heard from customers is that the product 'plays nice' with other tools in the big data ecosystem. Perhaps our data scientist wants to use the data sets we prepared in BDD to build a new predictive model in R or perhaps we want to lock down, secure and share a discovery with thousands of users via the enterprise BI tool. BDD enables this throughout the product by constantly providing the ability to write results back to Hadoop and even automatically register the improved data in Hive so it can instantly be consumed by any other tool that connects to the data lake. These 5 concepts (see previous posts for first 3) are fundamental to Big Data Discovery and this is what we mean by ‘end-to-end’. The ability to quickly find relevant data to start working with and then explore it to evaluate and understand potential. Users can transform and enrich the data, without moving it out of Hadoop, to make it better. Only then are we ready to discover new insights by blending and interacting with the data and finally sharing our results for leverage across the enterprise in terms of people and tools and connect to the big data ecosystem. If you have implemented a data lake or have plans to we hope these ideas resonate with you and compel you to take a deeper look at Oracle Big Data Discovery, the “visual” face of Hadoop.

Following on from the previous posts we looked at the fundamental capabilities for an end-to-end big data discovery product. We discussed 1) the ability to 'find' relevant data in Hadoop, 2) the...

Unmask Your Data. The Visual Face of Hadoop (part 4 of 5)

Following on from the previous post we started to discuss the fundamental capabilities that an end-to-end big data discovery product needs to contain in order to allow anyone (not just highly skilled 'techies') to turn raw data in Hadoop into actionable insight. We discussed 1) the ability to 'find' relevant data in Hadoop and 2) the ability to 'explore' a new data set to understand potential. We now turn our focus to: 3. Transform (to make big data better). We already discussed that data in Hadoop typically isn't ready for analytics because it needs changing in some way first. Perhaps we need to tease out product names or id's buried in some text or replace some missing values, concatenate fields together, turn integers into strings or strings into dates. Maybe we want to infer a geographic hierarchy from a customer address or IP address or a date hierarchy from a single time stamp. The BDD Transform page allows any user to directly change the data in Hadoop without moving it or picking up the phone and calling IT and then waiting for ETL tools to get the data ready for them. Via an Excel-like view of the data, Transform allows users to quickly change data with a simple right-click and preview the results of the transformation before applying the change. For the more sophisticated data wrangler, they can leverage a library of hundreds of typical transforms to implement on the data to get it ready. They can even make the data better and richer by adding new data elements from large text fields based on the results of clever term and named entity extraction algorithms. Any transform or enrichment used can be previewed before applying, but when it is applied BDD leverages the power of a massively scalable open source data processing framework called Apache Spark behind the scenes so the transforms can be applied at scale upon data sets in Hadoop that contain billions and billions of records. All of this complexity is masked away from the user so they can just sit back and wait for the magic to happen. In the next and final post we will discuss the final 2 critical capabilities for an effective big data discovery tool. Until then... please let us know your thoughts!

Following on from the previous post we started to discuss the fundamental capabilities that an end-to-end big data discovery product needs to contain in order to allow anyone (not just highly skilled...

Unmask Your Data. The Visual Face of Hadoop (part 3 of 5)

Based on the challenges outlined in part 2 of this series, what Oracle wanted to provide first and foremost is a single product to address them. Meaning, one product that allows anyone to turn raw data in Hadoop into actionable insight, and fast. We don't want to force customers to learn multiple tools and techniques, then constantly switch between these tools to solve a problem. We want to provide a set of end-to-end capabilities that offers the user everything they need to get the job done. Next, the user interface for the product needed to be intuitive and easy to use, extremely visual and highly compelling. We want our customers to want to use the product. To be excited to use it when they get to work in the morning. We think based on initial reactions we've received that we have achieved this. During a recent meeting with one of the worlds largest insurance companies I was particularly pleased (and slightly amused) to hear one customer remark "you guys are seriously in danger of making Oracle look sexy". Ok, seems like we are headed in the right direction then. So what are the fundamental set of capabilities that lie beneath our compelling user interface? Just how do we allow people to turn "raw data into actionable insight"? What does this entail? To explain this I typically talk about 5 concepts that relate directly to areas within the product: 1. Find (relevant data). When a user logs into the product the first page they are presented with is called Catalog. Right after BDD is installed it indexes and catalogs all of the data sets in Hadoop automatically then continually watches for new data to add to the catalog. In the interface users are then able to quickly find relevant data to start working with using familiar techniques such as keyword search and guided navigation. Just type a word like "log" or "weather" and BDD returns all the data sets that match the keyword as well as summarizing all the associated metadata so the user can further refine and filter through the results using attributes like who added the data, when it was added, how large the data is and what it contains (such as geographic or time based attributes) they can even refine using tags that have been added by other users over time. For any individual data set the user can see how it's been used in other projects and what other data sets it was combined with. We wanted the catalog experience to make finding data to start working with in Hadoop as "easy as shopping online". 2. Explore (to understand potential). Once an interesting data set is found the logical question is "does it have analytical potential?". Even before we start working with a data set we need to help users understand if it's worth investing in it. The Explore page in BDD does just that. It allows a user to walk up to a data set they've never seen before and understand what is in it in terms of its attributes. Without any work at all, they are instantly able to see all the attributes at once, displayed as different visual tiles depending on the data type (number, string, date, geo, etc), quickly see raw statistics about the individual attributes (distribution, max, min, middle values, number of distinct values, missing values and so on) and even start to uncover patterns and relationships between combinations of attributes by placing them together in a scratchpad to quickly visualize initial discoveries. If it is worth investing in a new analytic project, organizations need to be able to understand the potential of the data quickly to avoid months of wasted time and millions of wasted dollars. Stay tuned for part 4 of the series when we will address the remaining 3 concepts for effective big data discovery on raw data in Hadoop.

Based on the challenges outlined in part 2 of this series, what Oracle wanted to provide first and foremost is a single product to address them. Meaning, one product that allows anyone to turn...

Unmask Your Data. The Visual Face of Hadoop (part 2 of 5)

Over the last 18 months or so, while Oracle Big Data Discovery was in early stages of design and development I got to ask lots of customers and prospects some fundamental questions related to why they struggle to get analytic value from Hadoop and after a while common patterns started to emerge that we ultimately used as the basis of the design for Big Data Discovery. So here’s what we learned. 1. Data in Hadoop is not typically ‘ready’ for analytics. The beauty of Hadoop is that you just put raw files into it and worry about how unpack them later on. This is what people mean when they say Hadoop is "schema on read". This is both good and bad. On the one hand it’s easy to capture data, but on the other, it requires more effort to evaluate and understand it later on. There is usually a ton of manual intervention required before the data is ready to be analyzed. Data in Hadoop is typically flowing from new and emerging sources like social media, web logs and mobile devices and more. It is unstructured and raw, not clean, nicely organized and well governed like it is in the data warehouse. 2. Existing tools BI and data discovery tools fall short. We can't blame them because they were never designed for Hadoop. How can tools that speak ‘structured’ query language (SQL) be expected to talk to unstructured data in Hadoop? For example, how do they extract value from the text in a blog post or the notes a physician makes after evaluating a patient? BI tools don't help us find interesting data sets to start working with in the first place. They don't provide profiling capabilities to help us understand the shape, quality and overall potential of data even before we start working with it. And what about when we need to change and enrich the data? We need to bring in IT resources and an ETL tools for that. Sure, BI tools are great at helping us visualize and interact with data, but only when the data is ready… and (as we outlined above) data in Hadoop isn’t usually ready. 3. Emerging tools are point solutions. As a result of the above challenges we have seen a ton of excitement and investment in new Hadoop native tooling offered by various startups, too numerous to mention here. We are tracking tools for cataloging and governing the data lake, profiling tools to help users understand new data sets in Hadoop. Data wrangling tools that enable end-users to change the data directly in Hadoop and a ton of analytic and data visualization products to help expose new insights and patterns. An exciting space for sure, but the problem is that in addition to the fact that these tools are new (and may or may not exist next month), they only cover one or two of the aspects of big data discovery lifecycle. No single product allows us to find data in Hadoop and turn it into actionable insight with any kind of agility. Organizations can't be expected to buy a whole collection of immature and non-integrated tools, then ask their analysts to learn them all and ask IT to figure out how to integrate them all together. Clearly then a fundamentally new approach is required to address the challenges outlined above and that's exactly what we've done with Oracle Big Data Discovery. Over the next 3 posts I will outline the specific capabilities we designed into the product to address the challenges outlined above. As always, I invite your comments and questions related to this exciting topic!

Over the last 18 months or so, while Oracle Big Data Discovery was in early stages of design and development I got to ask lots of customers and prospects some fundamental questions related to why they...

Unmask Your Data. The Visual Face of Hadoop (part 1 of 5)

By now you’ve probably heard a lot of buzz about our exciting big data vision and specifically our new product, Oracle Big Data Discovery (or BDD as it is affectionately abbreviated). It’s a new ‘Hadoop native' visual analytics tool that offers the ability for anyone to turn raw data in Hadoop into actionable insight in minutes, without the need to learn complex tools or rely only on highly specialized resources. For underserved business analysts, eager to get value out of the Hadoop data lake (or reservoir/swamp depending on your mood), finally there is a product providing a full set of capabilities easy enough to use to be truly effective. For the data scientist, already proficient with complex tools and languages but bottlenecked by messy and generally unprepared data, life just got a little easier. But most important of all, business analysts, data scientists and anyone else who is analytical minded, can now work together on problems as a team, on a common platform and this means productivity around big data analytics just took a huge step forward. Before we dig into BDD in more detail let's pause to understand some of the fundamental challenges that we are trying to address with the product. To do this we need to ask some basic questions:   Why has it been so difficult to get analytic value out of data in Hadoop? Why do data scientists seem to be bottlenecked around activities related to evaluation and preparation of data vs. actually discovering new insights? Why are business analysts, effective with BI tools for years, struggling to point these tools at Hadoop? I invite your comments to discuss these important questions related to unmasking data in Hadoop as I’ll be sharing more of the vision and investment Oracle has been making to delivering a fundamentally new approach to big data discovery.

By now you’ve probably heard a lot of buzz about our exciting big data vision and specifically our new product, Oracle Big Data Discovery (or BDD as it is affectionately abbreviated). It’s a new...

Announcing Oracle Data Integrator for Big Data

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level. Increasing Big Data Heterogeneity and Transparency This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include Support for Apache Spark, Support for Apache Pig, and Orchestration using Oozie. Click here for a detailed list of what is new in Oracle Data Integrator (ODI). Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce. Oracle Data Integrator for Big Data Webcast We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data Provides advanced scale and expanded heterogeneity for big data projects Uniquely compliments Hadoop’s strengths to accelerate decision making, and Ensures sub second latency with Oracle GoldenGate for Big Data. Click here to register.

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling...

Big Data and Privacy

We're still in the early days of big data and a lot of the focus continues to be on use cases and figuring out how to get value from the technology and the data. Many of the use cases involve customer data or personal data in some way. And as is typical in these early days, secondary considerations are often glossed over until they become critical. One of those considerations, particularly for big data, is privacy. Companies tend to look at grabbing data and extracting value from data and we technology practitioners are happy to help them. But it's not always clear if the companies collecting personal data should be using all of that data and if they do, how they should involve consumers. I think it's our duty, as practitioners and enablers of big data, to be knowledgeable on privacy issues and address privacy in big data endeavors. I was encouraged when I attended the Strata+Hadoop conference recently because there was a track on Law, Ethics, and Open Data. It wasn't the hottest topic at the event but it was fairly well attended indicating that others are thinking about this too. Intuit did one of the presentations and talked about how they have brought their legal and data science teams together, and even though it sounds counterintuitive (pun intended), this marriage of legal and tech is helping them drive innovation. Check out their presentation here. Intuit looked at how they were handling customer data across all of their products and there was considerable inconsistency because they had been making those decisions in isolation for each product. In their new approach, they established their mission to democratize the data for the benefit of the customer and then they defined data stewardship principles that teams could use to guide decision making. This should be a familiar approach to us in the engineering world. As a former consultant, some of the best projects I have been involved in established guiding principles at the outset that helped the project team stay focused and allowed decisions to be made more quickly and effectively.   That raises the question of what the guiding principles around a big data solution should be. I don't think there is any one answer that fits all situations but I do think there are some common themes that deserve further exploration.

We're still in the early days of big data and a lot of the focus continues to be on use cases and figuring out how to get value from the technology and the data. Many of the use cases involve customer...

Production workloads blend Cloud and On-Premise Capabilities

Prediction #7 - blending production workloadsacross cloud and on-premise in Oracle's Enterprise Big Data Predictions 2015 is a tough nut to crack. Yet, we at Oracle think this is really the direction we all will go. Sure we can debate the timing, and whether or not this happens in 2015, but it is something that will come to all of us who are looking towards that big data future. So let’s discuss what we think is really going to happen over the coming years in the big data and cloud world. Reality #1 – Data will live both in the cloud and on-premise We see this today. Organizations run Human Capital Management systems in the cloud, integrate these with data from outside cloud based systems (think for example LinkedIn, staffing agencies etc.) while their general data warehouses and new big data systems are all deployed as on-premise systems. We also see the example in the prediction where various auto dealer systems uplink into the cloud to enable the manufacturer to consolidate all of their disparate systems. This data may be translated into data warehouse entries and possibly live in two worlds – both in the cloud and on-premise for deep diving analytics or in aggregated form. Reality #2 – Hybrid deployments are difficult to query and optimize We also see this today and it is one of the major issues of living in the hybrid world of cloud and on-premise. A lot of the issues are driven by low level technical limitations, specifically in network bandwidth and upload / download capacity into and out of the cloud environment. The other challenges are really (big) data management challenges in that they go into the art of running queries across two ecosystems with very different characteristics. We see a trend to use engineered systems on-premise, which delivers optimized performance for the applications, but in the cloud we often see virtualization pushing the trade-off towards ease of deployment and ease of management. These completely different ecosystems make optimization of queries across them very difficult. Solution – Equality brings optimizations to mixed environments As larger systems like big data and data warehouse systems move to the cloud, better performance becomes a key success criterion. Oracle is uniquely positioned to drive both standardization and performance optimizations into the cloud by deploying on engineered systems like Oracle Exadata and Oracle Big Data Appliance. Deploying engineered systems enables customers to run large systems in the cloud delivering performance as they see today in on-premise deployments. This then means that we do not live in a world divided in slow and fast, but in a world of fast and fast. This equivalence also means that we have the same functionality in both worlds, and here we can sprinkle in some – future – Oracle magic, where we start optimizing queries to take into account where the data lives, how fast we can move it around (the dreaded networking bandwidth issue) and where we need to execute code. Now, how are we going to do this? That is a piece magic, and you will just need to wait a bit… suffice it to say we are hard at work at solving this challenging topic.

Prediction #7 - blending production workloadsacross cloud and on-premise in Oracle's Enterprise Big Data Predictions 2015 is a tough nut to crack. Yet, we at Oracle think this is really the direction...

You've just got to be prepared to pay less. (Part 2)

Somebody looked at this earlier post and said "you're not comparing apples with apples". That's because the Big Data Appliance (BDA) is based upon a high specification 2 RU server, rather than the smaller and simpler 1 RU servers that I used in my comparison. Fair enough. Although the reason I used that config is because that's what almost everybody is thinking of when they say "I can build a Hadoop cluster cheaper than the BDA". And I can do that comparison quickly on a trade show floor or the back of an envelope and show that approach doesn't do what they think. However, we build the BDA with bigger boxes because we think that makes for a better product with lower TCO. So this time it's a full 3 year TCO with everything included. And it's based on a comparable server, the HP ProLiant DL380 Gen9, configured as close the the BDA server configuration as we could get it. We also included both a flavor of Linux and the complete stack of Cloudera software (which is what we ship with the BDA). We took list prices from the web and applied a 20% discount. We found that the BDA was 38% cheaper over 3 years than building a comparable DIY cluster. Now we did this exercise back in January, so it's possible that prices might have changed somewhat. And of course, your organization might see different discounts, or may have existing software licenses to apply. Perhaps you have different costs to install a complete rack from cables to software configuration. So if you did this today for your specific situation, you might get different numbers. And I'd encourage you to do this exercise yourself, making sure to include all the relevant components. Or give us a call and we'll help you. Because the BDA really is cheaper than building it yourself and it's not just Oracle saying so. Next post... DIY Cluster Oracle Big Data Appliance Description 3 Year Cost Description 3 Year Cost Servers $455,918 Appliance $420,000 Network Infrastructure $32,000 Network Infrastructure Included Rack $4,000 Rack Included Install $14,000 Install Included Support $48,753 Support $151,200 Software $367,158 Software Included Total $921,829 Total $571,000

Somebody looked at this earlier post and said "you're not comparing apples with apples". That's because the Big Data Appliance (BDA) is based upon a high specification 2 RU server, rather than the...

You've just got to be prepared to pay less. (Part 1)

I've spent a lot of time on trade show stands. And I've lost count of the number of times that somebody has asked about the Oracle Big Data Appliance (BDA) and then said "I (or my organization) can build that cheaper myself". There are a couple of approaches that usually come up. Let's look at the first one. The most common approach is to find a cheap 1U, commodity server and build out a rack of those. I looked at the Dell site and you could go with either the R320 or the R430. Just to get a ballpark estimate my approach to pricing those was to start with the basic config and add as many of the biggest available disks as possible (8 and 10 1TB drives respectively), configure with 32 GB of RAM and two additional network ports (10Gbps and Infiniband respectively). The website gave me discounted prices of $4,957 and $6,361 respectively. That's $198K or $254K for a full rack of 40 servers. The BDA lists for $525K. Case closed. Not so fast. You've got to compare comparable setups and neither of those are comparable. Most Hadoop clusters are gated more by available storage than anything else and that cheaper full rack contains just 320 TB of raw disk space. The more expensive one contains 400TB. Meanwhile the BDA has 864 TB of raw disk space. You'd actually need 108 of the cheaper servers or 86 of the more expensive ones, not 40. Which means we're looking at $535K or $547K and 3 racks. $525K for a single rack of BDA is looking much better now. But the BDA is more than just a collection of servers. It includes the rack (only one is needed, not 3), switches, cables, rack rails, redundant power, more powerful CPUs and more RAM. It also includes the entire suite of Cloudera software and it comes pre-assembled, configured, tuned, tested and working. All of that adds up. So all those people who confidently assured me they could build something comparable but cheaper than the BDA with low price 1 RU commodity servers were incorrect. Yes, I'm sure you can get discounts for bulk purchases and that there are cheaper servers out there somewhere. But not that much cheaper, and there's all that other stuff you need. And anyway, that BDA price was list. But what happens if you try to build a cluster with larger nodes containing much more storage. That would probably be cheaper wouldn't it? Another post...

I've spent a lot of time on trade show stands. And I've lost count of the number of times that somebody has asked about the Oracle Big Data Appliance (BDA) and then said "I (or my organization) can...

HBase is Free but Oracle NoSQL Database is cheaper

How can Oracle NoSQL Database be cheaper than "free"? There's got to be a catch. And of course there is, but it's not where you are expecting. The problem in that statement isn't with "cheaper" it's with "free". At Strata Hadoop World last month one of our product managers, Robert Greene, did a talk summarizing what one of our customers found. You can check out some of the slides from Oracle OpenWorld or even a short video (skip to 2 minutes 15 seconds if you are pressed for time) for more details, but I'll cover the main finding here.When you size a cluster for any NoSQL database, there are a couple of things to think about. Of course you need to have enough servers to handle the expected data capacity. But you also need enough servers to deliver the required throughput, and this chart focuses on that second issue. There are two axes: the horizontal axis shows "transactions per second" while the vertical one shows how many servers are required to deliver that throughput. Bigger is definitely not better here. If you look on the right hand side, you can see that HBase needed just shy of 70 servers to deliver 20,000 TPS while Oracle NoSQL Database needed just under 50. But look at that bottom line. When you configure those servers with SSDs that 50 number drops to just 3. The customer looked at using SSDs with HBase, but did not get that big a boost in throughput. HBase runs on top of HDFS and HDFS was originally designed for parallel scans of large blocks of data, rather than smaller, more random, read/writes. Because that code is not optimized for random read/writes, HBase/HDFS simply executes a lot more code per operation than Oracle NoSQL Database. Given its purpose, there are some optimizations it simply can't (or doesn't) do. This shows up with HDDs as you can see on the top two lines of the chart where Oracle NoSQL Database scales better (disk speed is the same so it's everything else that's less efficient). When you dramatically speed up the disk by using SSDs, HBase/HDFS can take only limited advantage. See this post where the author notes that "...a database that is layered above HDFS would not be able to utilize all the iops offered by a single SSD"An HBase solution isn't really free because you do need hardware to run your software. And when you need to scale out, you have to look at the how well the software scales. Oracle NoSQL Database scales much better than HBase which translated in this case to needing much less hardware. So, yes, it was cheaper than free. Just be careful when somebody says software is free.

How can Oracle NoSQL Database be cheaper than "free"? There's got to be a catch. And of course there is, but it's not where you are expecting. The problem in that statement isn't with "cheaper" it's...

How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Cutting through the Big Data Clutter The Big Data world is changing rapidly, giving raise to new standards, languages and architectures. Customers are unclear about which Big Data technology will benefit their business the most, and how to future proof their Big Data investments. This webcast helps customers sift through the changing Big Data architectures to help customer build their own resilient Big Data platform. Oracle and Cloudera experts discuss how enterprise platforms need to provide more flexibility to handle real-time and in memory computations for Big Data. The speakers introduce the 4th generation architecture for Big Data that allows for expanded and critical capabilities to exist alongside each other. Customers can now see higher returns on their Big Data investment by ingesting real time data and improved data transformation for their Big Data analytics solutions. By choosing Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Metadata Management, customers gain the ability to keep pace with changing Big Data technologies like Spark, Oozie, Pig and Flume without losing productivity and reduce risk through robust Big Data governance. In this webcast we also discuss the newly announced Oracle GoldenGate for Big Data. With this release, customers can stream real time data from their heterogeneous production systems into Hadoop and other Big Data systems like Apache Hive, HBase and Flume. This brings real time capabilities to customer’s Big Data architecture allowing them to enhance their big data analytics and ensure their Big Data reservoirs are up-to-date with production systems. Click here to mark your calendars and join us for the webcast to understand Big Data Integration and ensure that you are investing in the right Big Data Integration solutions.

Cutting through the Big Data ClutterThe Big Data world is changing rapidly, giving raise to new standards, languages and architectures. Customers are unclear about which Big Data technology will...

Oracle

Integrated Cloud Applications & Platform Services