The machine learning expertise at Oracle isn’t concentrated in a single department or R&D effort, it’s dispersed all over the world. That was made clear recently when more than 250 of the company’s experts spent three days together at Oracle headquarters for a machine-learning summit. It took an hour just to get through a lightning-round of introductions at one minute per team.
“We represent the oldest profession in the world,” joked one engineer introducing his team. “I’m talking about construction, the tools that help the pharaohs build the pyramids.” Seemingly every vertical market, from healthcare to hospitality, was represented—a sign that machine learning is increasingly woven into software, and not treated as a standalone capability. People described machine learning efforts in products for measuring household electricity consumption (OPower), guided learning (Oracle University), AI-sourced company data (Oracle Datafox), and personalized banking. And plenty of attendees weren’t focused on one particular product, instead bringing their expertise from years spent conducting pure research on Internet topology, user experience telemetry, data science workloads on Java, distributed parallel graph analytics, and much more.
Are Businesses Ready for ML?
The mission of building a machine learning community dates back to a request in 2011 from Oracle Chief Architect Edward Screven, said Stephen Green, director of the Machine Learning Research Group at Oracle Labs, which led the Machine Learning Summit. Earlier summits focused on search and natural language processing. Today, the tech industry has evolved beyond how to retrieve information, and now it’s all about interpretation. When he finds new data scientists, he asks them how they’re “operationalizing” ML: How do they annotate the data? How do they build, deploy, measure, and update their models? What tools do they use? Above all, however, Green’s discussions are less about tools and tactics and more about what new problems teams are solving using ML.
“We don’t have machine learning problems, we have business problems that we and our customers might solve with machine learning,” Green said. “It doesn’t matter how clean your data is, how well your model generalizes, or how scalable your deployment is if you can’t answer these questions.” That pragmatic, enterprise-oriented approach also characterizes the difference at Oracle, where the AI focus isn’t on winning ancient games of strategy, Green said: “We don’t want to play Go really well, because playing Go is not important to our customers.”
Below is just a taste of the topics and experiences Oracle’s machine learning experts embraced at this year’s summit.
Safe Data Practices
As machine learning becomes more central to every type of application, developers and data scientists need data—and lots of it, to test algorithms at scale. There’s a risk, however: Those records must be obtained and used with appropriate permissions. There are open source data options, but these often come with reciprocal licenses such as copyleft, which requires all code using the data be released as open source. Web scraping is another popular approach, but it too must be done with care not to violate terms of service (Twitter forbids scraping, for example), circumvent paywalls or robots.txt instructions, or even overload site servers with traffic.
“Data today is a lot like how open source software was when it first came out, and everyone thought, ‘This is great, free software—let’s use it,’” said Geoff Barry, senior corporate counsel at Oracle. “That’s where data is headed. It may appear to be free, but it’s not actually free.”
That’s why more than five years ago, Oracle Labs created an in-house research data repository that teams can use to build models. Under the leadership and authority of Craig Stephen, head of Oracle Labs, this specially secured in-house repository contains large data sets suitable for a variety of challenging ML projects. All data sets in the repository are carefully tracked and managed by a rigorous process that includes vetting for IP and privacy concerns, and are made available internally for uses that are consistent with those requirements.
Java for ML
Another way Oracle is evolving with machine learning technology is by revving up the underlying source code, which tends to be written in Python due to that language’s approachability to data scientists who haven’t studied computer programming.
“Python is primarily used in machine learning as a way to drive libraries that are written in native code,” said Mark Reinhold, chief architect of the Java Platform Group at Oracle. “As soon as you need to do something for which you don’t have a good native language library, you’re stuck. You have to buckle down and write native functions in C or C++ or assembly code.”
That’s about to change, Reinhold said, thanks to four ongoing projects in the OpenJDK Community: Panama, Loom, Amber, and Valhalla. Panama makes it easier to call native functions and access native data from Java programs. Loom simplifies Java’s signature emphasis on concurrent programming via threads by introducing fast, low-footprint “fibers”. Amber brings in pattern-matching, among other features. And Valhalla aims to enable Java programmers to create data structures just as efficient as those that they can create in native languages such as C and C++.
“Java is a broad-spectrum language for the working programmer, and it’s already moving toward being a better foundation for ML than Python,” he said.
A Never-Ending Story
A fun exercise for future summits: Add up the decades of experience present in the room. Reinhold told stories of his early days working on Java and some of the design choices that were made, while Craig Stephen reminded the conference attendees that, “We have been doing this for a long time—we’ve been learning from data for decades.” He pointed to Kenny Gross, who has a PhD in nuclear engineering and published a paper on power plant monitoring and fault detection in the late 1990s. Gross still works at Oracle, where he does research on algorithms, AI, and security (a current project is “Advanced Prognostics for Dense-Sensor IoT Applications”).
The goal for the three-day event was simple—make connections—and based on the hallway conversations and poster sessions, it was met. That pleased Stephen, who reminded everyone that technology doesn’t evolve without talent: “What’s the most expensive thing about this conference? It’s your time.”