Consumers today have more choices than ever as to when, where, and how to purchase everything from groceries, clothing, and music to major appliances and automobiles—easily shifting their dollars from traditional bricks-and-mortar shops to online-only retailers, big-box stores, discounters, and other options. And because many choices for consumers means companies are competing for those consumers’ business, some of the world’s best-known retailers, consumer packaged goods (CPG) manufacturers, and consumer products and services providers turn to dunnhumby for help attracting and retaining customers. The London, England–based customer science company combines big data, science, analysis, and its own special algorithms with extreme processing power to help its clients create and sustain customer loyalty through tailored programs, promotions, and discounts.
Oracle was very good at explaining Oracle Big Data Appliance and the leverage we would get by linking out to Oracle Exadata Database Machine . ”–Matt Steell,
Head of Global Data Solutions at dunnhumby
Matt Steell, dunnhumby’s London-based head of global data solutions, says the company tailors its data science services to the specific needs of its clients and their markets. “For instance, in the grocery space, we help our clients understand how to stock shelves, price their goods, and pitch offers to build business,” says Steell. “And building customer loyalty is an amalgamation of all those things.”
A few years ago, dunnhumby faced limits with its existing information technology (IT) infrastructure; in 2012 the company consolidated data and projects to improve performance with Oracle Exadata Database Machine, later adding Oracle Big Data Appliance to its engineered systems roster.
“Like other IT organizations, we had evolved over time and had many separate data warehouses and data marts,” says Enterprise Architect Chris Wones, who is based in the company’s US operational headquarters in Cincinnati, Ohio. “We needed to bring everything back into a centralized environment so that we could perform standardization and governance, and then push out data with confidence.”
Oracle Exadata and Oracle Big Data Appliance enable dunnhumby to do more faster, and to spend more time analyzing the data it collects from clients to glean meaningful business insights and spend less time managing it. “Using Oracle Exadata Database Machine and Oracle Big Data Appliance allows us to look at more data and to do so more quickly,” continues Wones. “This ultimately allows us to focus deeper on the science of what we do rather than on data manipulations.”
dunnhumby was founded 25 years ago by a husband-and-wife team of data scientists who pioneered the field of customer science, which analyzes various datasources to predict consumer behavior and earn consumer loyalty. Among its first clients was Tesco, the UK’s second-largest grocery chain and, since 2006, dunnhumby’s corporate parent. dunnhumby maintains more than 100 terabytes of online production data at each of its primary US and UK data centers, and many additional terabytes of “near-line” data—stored on other systems easily accessible to its Oracle Exadata Database Machines—to analyze the buying habits of more than 700 million consumers worldwide.
dunnhumby collects, stores, processes, and analyzes terabytes of data, both structured and unstructured, including customer information, weather, geography, demographics, and social media. The type and amount of data collected and services provided varies according to client requirements.
Location: London, England
Industry: Professional services
Oracle Products: Oracle Big Data Appliance, Oracle Exadata Database Machine, Oracle ZFS Storage Appliance, Oracle Database, Oracle Advanced Analytics, Oracle Advanced Compression, Oracle Advanced Security
“Each client has somewhere between 5 terabytes—for small clients—and 75 terabytes for our biggest clients,” says Wones. “We consume about 200 million shopping baskets per week containing an average of 1.4 billion items per week, plus contextual information, totaling 2 billion rows.”
dunnhumby, adds Steell, does more than just acquire and analyze data for its clients—it helps them change corporate mindsets. “We help companies transition from taking a reactive approach to a customer behavior that just happened, to a proactive approach that helps them infer what will happen next,” Steell says. “A lot of companies have not been using technology effectively to be more predictive.”
When dunnhumby decided to consolidate its many data warehouses and data marts, the company looked at offerings from Oracle, IBM, Teradata, EMC Greenplum, and others before choosing Oracle Exadata Database Machine for its exceptional performance. An Oracle engineered system, Oracle Exadata combines Oracle compute servers, software, storage, and other technology to maximize the performance of dunnhumby’s core database: Oracle Database 12c.
“The heart and soul of our stack right now is Oracle Exadata Database Machine,” says Wones. “Oracle Exadata serves as our enterprise data warehouse—this is where we do the bulk of all our analytical and BI [business intelligence] queries.”
The Oracle Exadata machines allow dunnhumby analysts to do more faster, says Wones. “With Oracle Exadata, we’ve been able to reduce queries from days to minutes, and those that used to take minutes to seconds,” he says.
To date, dunnhumby has installed Oracle Exadata machines at two locations: four full-rack production machines and a half-rack system for testing in the US and five full-rack production systems in the UK. These Oracle Exadata clusters run in an active-active mode to ensure high availability, while two Oracle ZFS Storage Appliances in each of the company’s primary data centers provide data staging, data cleansing, and backup and recovery.
“Oracle ZFS allows us to offload that work from the Oracle Exadata Database Machine so that we can use it primarily for doing analytics,” explains Wones.
dunnhumby typically passes data from external sources—mostly flat files—first through the Oracle ZFS Storage Appliance for data cleansing and formatting before uploading to its Oracle Exadata Database Machines. Conversely, those systems pass outbound data—including all backups—initially to the Oracle ZFS Storage Appliance, and from there the data is written out to tape, saving CPU cycles on dunnhumby’s core systems.
Using these procedures and a number of Oracle data compression techniques, including Oracle Advanced Compression and Oracle Hybrid Columnar Compression, dunnhumby has been able to achieve remarkable performance gains over previous systems with its Oracle Exadata Database Machines. Although actual improvement varies depending on the work, Wones says, “for the bulk of what we do, we get 20 to 25 times performance improvement, and for some jobs, we have seen performance increases of 100 to 200 times or more for many tasks.”
What Oracle Big Data Appliance gave us over a custom, hand-built cluster is a system we can run and manage with the same tools we are using on our Oracle Exadata Database Machines. ”–Matt Steell,
Head of Global Data Solutions at dunnhumby
Those analyses are also typically deeper and broader. “In the previous environment, there were limitations to what we could do simply because of the server’s memory capacity,” Wones says. “So when you did a trending analysis, for example, you had to narrow the scope so that you could access all the data.
“The Oracle Exadata environment is much, much larger,” he continues, “so it allows us to pull in far more years of data, with much greater depth, to do the analysis. This makes the trends analysis all the more valuable.”
While Oracle Exadata dramatically improved performance for structured data analyses at dunnhumby, the company realized it needed to get similar performance and operational improvements for its unstructured data sets as well. “Unstructured data today comprises about 10 percent of all our data,” notes Wones, “but it is growing faster, at a rate of 10 to 15 percent a year.”
For this reason, dunnhumby recently augmented its data science systems with Oracle Big Data Appliance, another member of the Oracle family of engineered systems. (See the “Big Data Clusters: Build or Buy” sidebar for more information about dunnhumby and Oracle Big Data Appliance.)
With the combined power of Oracle Exadata and Oracle Big Data Appliance, dunnhumby’s data science will continue to identify and keep loyal customers for competitive businesses.
“We are still very early in our journey with Oracle Big Data Appliance,” says Steell. “We knew we needed to change technology for big data, and Oracle was very good at explaining Oracle Big Data Appliance and the leverage we would get by linking out to Oracle Exadata Database Machine.
“One of the things we have asked all of our strategic partners like Oracle,” Steell continues, “is that they help guide us in things like trends in the marketplace, how to implement solutions, what we’re doing right, and how we can do things better. Oracle has been a great partner in doing just that.”
A business can build its own commodity hardware clusters to run Apache Hadoop, the leading open source software framework for collecting and storing unstructured data. While that is one way to scale up a big data solution, building custom clusters is not necessarily the cheapest or most cost-effective option for running and integrating Hadoop and other technologies to deliver big data results.
“A lot of people out there will say Hadoop is free,” says Matt Steell, head of global data solutions at dunnhumby. “Although the software is free, companies should ask about the cluster hardware and the staff they need to support it, because that’s where the cost comes in.”
dunnhumby recently evaluated its options to improve processing and analysis of the large volumes of unstructured data the company collects for its clients from their website clickstreams, social networks, and the like. dunnhumby chose Oracle Big Data Appliance: a preintegrated Oracle engineered system that combines hardware and software specifically designed for handling big data. The software includes Cloudera’s Distribution including Apache Hadoop (CDH), Oracle R Distribution, Oracle Big Data SQL, and Oracle NoSQL Database.
“What Oracle Big Data Appliance gave us over a custom, hand-built cluster is a system we can run and manage with the same tools we are using on our Oracle Exadata Database Machines,” Steell says. “We don’t have to increase headcount to support an appliance that looks very similar to Oracle Exadata.”
After months of testing, experience has made the case for Oracle Big Data Appliance. “At this point, we’ve reached the scale we need to prove that going with Oracle Big Data Appliance was the best choice for our business,” concludes Steell.
LEARN more about about Oracle engineered systems
Photography by Hao Wang,Unsplash