This blog takes a closer look at why the Moat Reach team moved their data warehouse from Spark on AWS to Exadata on OCI.
Moat Reach, by Oracle Data Cloud, is a cross-platform TV and digital measurement solution that measures how many people and households your campaigns reach, at what frequency, and how relevant those people are to your marketing objectives.
It integrates Moat Analytics’ digital impression data with TV advertisement impression data from iSpot.tv, which uses 14 million opted-in TV devices, against the people and households in the Oracle ID Graph. This data allows marketers to measure valid and viewable impressions for the audiences that matter to them across TV and digital channels.
In marketing terminology, an impression refers to the number of times an advertisement was seen when browsing the internet on a web browser, watching a sports game on TV, playing a game in an app, and so on. Examples of marketing channels include browsers, apps, traditional cable TV, and streaming video.
For example, a consumer packaged goods (CPG) advertiser can use Moat Reach to measure the unique, people-based reach of its new soft drink campaign using the same metrics across all of its TV and digital channels. Even better, it can also measure its audience based on custom or first-party segments, like soft drink buyers or frequent shoppers, and analyzes key demographics, like age and gender, through each channel.
By comparing the reach, frequency, and viewability of its ads through each channel, the CPG advertiser might discover that it was reaching more of its intended audience through celebrity magazine websites than DIY cable channels. It can then adjust its media strategy to invest more in those channels.
After years of cross-channel confusion, Moat Reach gives marketers a self-service tool to measure the consolidated reach and frequency of their campaigns, so they can see whether they reached their intended audiences through each platform, publisher, and network on a consolidated and de-duplicated basis.
Moat Reach answers the impact of the marketing campaign on an individual across TV and digital media. To do that, it counts distinct records for that individual in the data set from Moat and iSpot.tv.
The Moat team developed their first data warehouse solution using Apache Spark, Qubole, and Scala on AWS infrastructure. This solution operated on a dataset with about 97 billion impressions from Moat and 49 billion impressions from iSpot.tv.
The dataset in this first version architecture spanned 34 clients, 275 thousand campaigns, and two years of data. The Moat team loaded this data from object storage into the Apache Spark cluster and ran analytics using Qubole and Scala. This data was precomputed into dimensions to assist with query processing.
Although this method has limitations, other analytics companies in the industry use similar technologies and the same methodology of precomputing the dimensions because it was too hard a problem to solve doing it in real time.
Getting answers out of this v1 data warehousing solution took roughly seven days from when a user asked the question to the return of a response. This solution also costs a lot of money, with each nightly run costing about $5–8K USD, $100–160K USD for the monthly runs, and about $1.2–2.8M USD annually.
Moat Reach v1 relied on a snapshot model where the Moat team ingested impression event, id-graph, demographic, and audience data into a set of parquet tables on AWS S3. They then ran several large Spark jobs to populate reach and frequency metrics in a web UI. The metrics were stored in a database by the nightly job.
The Moat team wrote the job to quickly return user requests without computing any more information. This approach allowed them a responsive site but at the cost of only computing a finite set of metrics.
This lack of flexibility frustrated customers. For example, they computed the number of 18–30 year-olds who saw an ad, the number of women who saw an ad, how many mobile views, or five-second views an ad received. But changing the range of women to 18–29 year olds would have taken at least a day.
A cross-section of 18-year-old women who saw the ad for five seconds on their mobile phones with all other cross-sections was similarly financially infeasible. It would be too expensive to run a job every night for every value of every dimension across every client and campaign. The Moat team set a goal to gain insights into and across all dimensions of the Moat Reach data.
While the v1 Moat Reach service was already differentiated in the market because of its uniquely consolidated insights, the team felt there was an opportunity to be more comprehensive and offer more customized insights for customers in near real time. The Moat team faced the challenge of finding technology with enough storage to handle terabytes of data—and over a petabyte a year—and make much faster queries than the existing architecture.
Oracle Cloud Infrastructure (OCI) is the only provider to offer Oracle Database and Oracle Exadata as a service. This offering has vast amounts of space and equally impressive compute capabilities. In the on-demand architecture, they only had to ingest impression event, id-graph, demographic, and audience data into their Exadata appliance, and then let an API do the rest. When users visit the site, an API queries the database for counts, based on the selected filters. Data science methodology is applied to estimate total reach and frequency from the observed data. This architecture extends the product use case from broad, top-line insights to precise and deeply detailed metrics that clients can use to optimize their ad campaigns’ reach in real-time.
To query billions of rows quickly, they used Oracle Database’s probabilistic data structures and hyperloglog estimation functions to obtain bounded estimates of the needed inputs for their data science methodology. To further optimize performance, they took advantage of Exadata’s tuning abilities and support team. By working with Oracle engineering staff, they tuned Exadata to be more performant than its competitors AWS Redshift and Clickhouse to run a baseline set of queries meant to mimic the live production system requirements. In-house Oracle expertise and the wide availability of Oracle Database and SQL skill sets were other factors to encourage them toward OCI and Exadata.
The following table compares the two approaches:
|V1 solution||V2 solution|
|Spark and AWS||Exadata on OCI|
|Precomputed dimensions for reporting||Flexible dimensions for reporting|
|Customer answers in days||Custom answers in seconds|
|Limited reports||Unlimited number of custom reports|
|Long development cycles||Rapid development cycles|
|Delayed innovation turnaround||Real-time access for Innovation|
While improved customer experience was the Moat team’s top priority, cost was also a factor in their decision to use Exadata over the V1 architecture and AWS’s data warehouse, Redshift. An AWS Redshift instance cost roughly 2 million dollars for a baseline of 912 CPUs and 940 TB of storage. For the same cost, Exadata on OCI offered much faster performance and real-time data queries with an optimized database shape with 150 CPUs, 23 TB of RAM, persistent memory for an in-memory database, and close to 1 PB of flash and disk storage for historical data. Exadata features like smart scan, storage indexes, and smart flash cache were some of the factors for the faster query performance.
|V1 solution||Alternative||V2 solution|
|Spark and AWS||AWS Redshift||Exadata on OCI|
|Customer answers in days||Customer answers in hours||Customer answers in seconds|
|Limited reports||Less limited reports||Unlimited number of custom reports|
As data sets continue to grow and customers look for new and faster ways to get value out of data about transactions, visitors, viewers, and more across multiple sources, the cloud offers new solutions that you didn’t consider in the past.
Apache Spark solutions are good for many large-scale analytical workloads but aren’t always the answer. Scale-out architectures like Spark have some advantages but can be costly and prohibitive in terms of frequency of results and flexibility. In the use cases that we described, you can precompute dimensions and lose out on the ability to view data instantaneously.
Oracle Cloud Infrastructure’s Exadata offerings use some of the same underlying hardware benefits that drove the rise of Spark: more processing and memory in servers at a lower cost. But Oracle’s Exadata Cloud service applies those hardware benefits into large-scale, real-time processing. For Moat, this underlying architecture choice yielded valuable new features that the competition didn’t offer and better performance at the same cost.