Larry Ellison grabbed the corporate world’s attention when he announced at Oracle OpenWorld in September that Oracle’s next generation of infrastructure-as-a-service (IaaS) offerings will deliver twice the compute power, twice the memory, four times the storage, and 10 times the I/O speeds compared with offerings from Amazon Web Services (AWS).
The natural response to that announcement: How is it possible for Oracle to leapfrog its main IaaS competitor in such a profound way? The short answer: by building the industry’s first Gen 2 cloud platform, says Deepak Patil, vice president of product development for Oracle’s IaaS offerings, who is charged with conveying to customers Oracle’s unique cloud architecture and strategy.
Patil came to Oracle about a year ago after 16 years at Microsoft, where he helped launch that company’s Azure service. The challenge for him—and several hundred engineers like him who came to Oracle from cloud players AWS, Google, Joyent, Microsoft, and others—was to build a second-generation cloud platform “by applying all of your lessons, avoiding all the mistakes, and leveraging both the advancements in software and hardware and also some of the unique strategic benefits that only Oracle can offer,” he relates. “And that opportunity was just too exciting to pass up.”
In this network model, all of the servers can send the full amount of traffic, and there’s enough pipe at the top to support it.”
–Vishvananda Ishaya, Cloud Development Architect, Oracle
Oracle’s resulting IaaS design is not only well suited to enterprise customers looking for a better cloud fit, but it’s also superior to competitors’ offerings in three critical categories: performance, predictability, and price. “We’ve used the experiences and the lessons and the evolution of cloud computing as a default paradigm to make some fundamentally different decisions,” Patil explains.Availability Domains
The first decision is what Patil refers to as differentiated infrastructure design. Oracle’s cloud data centers, also called availability domains, are still standalone structures, each with its own power and cooling systems. But in the new architecture at least three availability domains, located relatively close together (30 to 40 kilometers), will be interconnected with a low-latency network to make up what’s known as a single cloud computing region.
This three-legged application-processing and data-backup structure eliminates any single point of failure and hides any interruption of service due to an unplanned outage. It’s an aggressive approach to fault tolerance and computing uptime, and it’s not new, says Vishvananda Ishaya, a cloud development architect who came to Oracle a year and a half ago from cloud startup Nebula. “Distributed cloud applications require this kind of structure,” he says.
What Oracle brings to this design is scale and speed. Oracle’s availability domains are built to accommodate up to a million servers. And the servers themselves are built to the highest performance standards, their hard disks and drives promising more than 4 million “read” and 2.5 million “write” IOPS (input/output operations per second), Patil says. “That’s at least 10 times more than the leading cloud providers in the industry,” he says. “And that’s really, really, really powerful.”Enough Pipe
Oracle’s second Gen 2 cloud design decision was at the network level. The physical networks in the availability domains are designed to have a million ports, to support that number of high-performance servers. They’re also non-oversubscribed and flat, which make them highly resilient and very fast.
We can give you a bare metal computer that’s completely yours, like it is in your data center, and you can use it any way you want.”
–Aaron Mohrman, Senior Director, Oracle
A non-oversubscribed network eliminates bottlenecks often caused by a switch at the top of a server rack that can’t handle a sudden deluge of traffic. “In this network model, all of the servers can send the full amount of traffic, and there’s enough pipe at the top to support it,” Ishaya says.
A flat network—based on a design known as a Clos network, named after its original architect, telecom engineer John Clos—speeds up traffic by reducing the number of routers and switches it bounces among. Oracle’s flat network, for instance, limits to two the number of “hops” it takes to get from one server to another anywhere in the availability domain. An interaction between any two compute and storage nodes results in “less than 100 microseconds latency,” Patil says. “That’s unprecedented.”
“Fat pipes”—high-bandwidth fiber optic cables with no switching—connect a region’s availability domains. Latency is minimized by their physical proximity, Ishaya says.Punch-Through
The third decision Oracle made concerning its Gen 2 cloud architecture involved “differentiated software design,” Patil says. The basic building block of cloud computing, since its inception, has been a single server using a hypervisor to run multiple workloads by virtualizing the operating system and network functions they require, such as input/output. No longer.
“Most, if not all, of the cloud providers put their I/O virtualization on the hypervisor,” Patil says. “We took I/O virtualization out of the hypervisor and we put that in the network.”
Such “off-box” I/O virtualization does a couple of things. It obviates a common cloud performance problem known as “noisy neighbor,” whereby server performance degrades because multiple virtual machine workloads are accessing the same hypervisor at the same time, causing I/O bottlenecks.
It also makes a cloud server much more secure because it eliminates the hypervisor as a potential point of access for malware. Because I/O is handled in a virtual network, Oracle’s cloud servers are “completely encapsulated and isolated” from the physical network, Patil says. Therefore, a hypervisor “punch-through” can’t compromise the network, he says, because “there’s no access to the network from the internet.”Bare Essentials
Such encapsulated and isolated servers are referred to as bare metal, because there’s no software code—in particular, no Oracle code—running on them. It’s meant to let companies approach the Oracle cloud the same way they do their own IT infrastructures.
“No cloud was built before with enterprise customers in mind,” says Aaron Mohrman, senior director of product management in Patil’s group. Mohrman previously worked at AWS, “in the very early days,” he says.
Oracle’s Gen 2 cloud architecture, unlike AWS’s first-generation one, was designed from the start for enterprise computing, bare metal servers being a prime example. “We can give you a bare metal computer that’s completely yours, like it is in your data center, and you can use it any way you want,” Mohrman says. For instance, a customer can run an application on a dedicated server for high performance, or multiple applications using a hypervisor for cost efficiency. “You get all the flexibility in our cloud that you get in your data center,” he says.
That flexibility extends to Oracle itself. The company plans to put its high-performance engineered systems, such as its Exadata Database Machine, into its cloud data centers and offer their capabilities as cloud services. Because of its network virtualization, the architecture extends to just about “any metal box,” Patil says. Oracle also plans to add hypervisors to its bare metal servers for more conventional, cost-effective cloud services—“but better,” Ishaya says.Performance Matters
Oracle has one Gen 2 cloud region up and running, in Phoenix, Arizona. One in Virginia is scheduled to go live in November, Patil says. Oracle will then roll out the structures steadily worldwide, via both new construction and retrofitting current cloud data centers for the new architecture, he says.
High performance is an immediate draw. “The most excited customers we have right now are the ones with these really big data set workloads that need all that performance,” Ishaya says. “They were doing their work on some lower-performing cloud, and then they can come to ours and get 10 times the performance for the same price—and that’s very compelling,” he says.
YellowDog, a startup in Bristol, England, provides 3-D rendering for graphic-intensive projects such as car commercials and kids’ TV shows. YellowDog taps several cloud providers for the requisite compute resources, a process that involves passing around some very large files. “Our requirements for compute are pretty high, and our requirements for bandwidth are pretty high,” says Gareth Williams, managing director at YellowDog.
Last summer, YellowDog began running its Windows-based rendering application on six 36-core multithreaded servers in the new Oracle cloud data centers in Phoenix. Each application runs natively on the bare metal servers, and turns in “obscenely fast performance,” Williams says—from 2x to almost 10x better performance than YellowDog’s other compute providers, based on benchmark tests the company conducted recently, he says.More Like a Startup
Oracle’s IaaS platform engineering group, now numbering about 400 people, is based in downtown Seattle, Washington, instead of at Oracle headquarters in Silicon Valley. This contributes to a cultural dynamic that Ishaya describes as “more like a startup than a big company.”
That also might have to do with the group itself. These engineers are “the smartest and most well accomplished of anyone” working in cloud architecture today, Ishaya says.
“It’s been the most fun job I’ve ever had,” says Mohrman. “Great team, coolest product—they’re going to have to pry me away from this group.”
That enthusiasm is reflected in the growing buzz around the new designs. Patil says the number of customers that have signed on to use the new cloud offerings in only the short time they’ve been available is more “than in most of my previous experiences.”
It’s confirmation of his vision. “Real customers using our platform are seeing real results that are unprecedented in the industry, and we’re very excited about that momentum,” Patil says. “The more we tell our story, the more we tell our differentiators, the more it appeals to our customers.”
LEARN more about Oracle Infrastructure as a Service.
LEARN more about Oracle Bare Metal Cloud Services.
Photography by Rohan Makhecha,Unsplash