Papers, research and other insights from Oracle Labs' Machine Learning Research Group

About MLRG

This blog is a place for members of the Machine Learning Research Group (MLRG) group (formerly the Information Retrieval and Machine Learning Group) in Oracle Labs to write about what we're up to. We work on the research and development of machine learning, natural language processing, and information retrieval systems to help solve difficult business problems. We're going to talk about our papers, what conferences we're attending, and cool things we're doing with our research.


Stephen Green (Principal Investigator)

Steve is a researcher in Information Retrieval, like his father before him. He's been developing search systems for more than 20 years, both in research and as shipping products. He's been running the MLRG group in one form or another for more than 10 years, doing work on passage retrieval, document classification, recommendation (for music in particular), and statistical NLP.

Jeffrey Alexander

Jeff has been working on Information Retrieval systems for more than 10 years. He has worked both in the inner reaches of an industrial strength research search engine and in creating abstract frameworks for pushing data in and out of many different engines. When possible, he combines his interest in IR with his interests around scalable and distributed systems, building highly scalable distributed systems for search-related tasks.

Pallika Kanani

Pallika works at the intersection of NLP and Machine Learning. She is interested in information extraction, semi-supervised learning, transfer learning, and active information acquisition. She works closely with various product groups at Oracle on real world applications of Machine Learning, and has worked extensively with social data. She did her PhD at UMass, Amherst, under Prof. Andrew McCallum. Along the way, she interned for the Watson Jeopardy! project at IBM, worked on Bing at Microsoft Research, played with algorithmic trading at Deutche Bank, analyzed babies' growth in a psychology lab, tutored students for GRE, and worked in the family chemical manufacturing business. She also serves as a senior board member for Women in Machine Learning (WiML).

Philip Ogren

Philip has nearly 20 years of software engineering experience which includes four years on an Oracle product team and working on a variety of NLP open source projects during his PhD at the University of Colorado. He enjoys working on a variety of software engineering problems related to NLP including frameworks, data structures, and algorithms. His current interests include string similarity search and named entity recognition and linking.

Adam Pocock

Adam is a Machine Learning researcher, who finished his PhD in Information Theory and feature selection in 2012. His thesis won the British Computer Society Distinguished Dissertation award in 2013. He's interested in distributed machine learning, Bayesian inference, and structure learning. And writing code that requires as many GPUs as possible, because he enjoys building shiny computers.

Jean-Baptiste Tristan

John has been tempted over to Machine Learning from his first calling in Programming Languages research. His recent work is on scaling Bayesian inference across clusters of GPUs or CPUs, while maintaining guarantees on the statistical quality of the result. During his PhD he helped develop the Compcert compiler, the first provably correct optimising C compiler. This work won him the 2011 La Recherche award with his advisor and the CompCert research group.

Michael Wick

Michael works at the intersection of Machine Learning and NLP. He received his PhD in Computer Science from the University of Massachusetts, Amherst advised by Prof. Andrew McCallum. He has co-organized machine learning workshops and has dozens of machine learning and NLP papers in top conferences. In 2009 he received the Yahoo! Award for Excellence in Search and Mining and 2010 he received the Yahoo! Key Scientific Challenges award. Recently, a hierarchical coreference algorithm he created won an international competition held by the U.S. Patent Office (USPTO). The algorithm is both the fastest and most accurate at disambiguating inventors, and will soon drive patent search for the USPTO. His research interests are learning and inference in graphical models, and structured prediction in NLP (e.g. coreference).

Jack Sullivan

Jack Sullivan is a data scientist in MLRG. He has an MS in Computer Science from the University of Massachusetts, Amherst. He has worked on large-scale coreference and entity linking, lexicon and word embeddings, and deep learning models for named entity recognition and document structure detection.

Daniel Peterson

Daniel is a data scientist in MLRG. He is also a PhD student at the University of Colorado at Boulder. His academic research is focused on re-creating and extending VerbNet from large corpora, in the hopes that it will make VerbNet-style resources available in more languages, and useful in more domains.

Rob Oberbreckling

Rob has over 25 years of software engineering and applied research experience. He spent several years with inventors of Latent Semantic Analysis spearheading various NLP and ML efforts in standardized essay grading, word embeddings, and team communication analysis and modeling. Over the years, he has accumulated broad engineering experience from embedded systems, data analysis, web servers, audio and video streaming, mobile devices, GPUs, speech recognition, finance, and from long ago, a modest video game.

Haniyeh Mahmoudian

Haniyeh is working on machine learning techniques at Oracle Labs. She received her PhD in Astrophysics on Gravitational Lensing from Universitaet Bonn. During her PhD she developed a new approach to reconstruct Hubble Space Telescope images with higher resolutions. After leaving academia she worked on forecasting, recommender systems and built fraud detection models for large financial institutions and merchants.


Our recent published research has focused on a couple of different areas within ML and NLP.

  • Speeding up Bayesian inference by using clusters of GPUs or CPUs, whilst maintaining statistical guarantees about the stationary distribution. We have several ML papers on this topic, published in NIPS 2014, ICML2015, AISTATS 2016 and a paper on approximate counters in PPoPP 2016.
  • Extracting information from noisy, poorly written text. We had a paper in a NIPS 2015 workshop, and several are in preparation.
  • Extending NLP systems into multiple languages. Our first paper in this area was at AAAI 2016.

There are a few other areas we are interested in:

  • Scalable coreference and entity linking.
  • Improving search results with learning to rank.
  • Applying Recurrent Neural Networks to grammatical inference and program induction.
  • Deep learning, just like everyone else.


Oracle Labs has a summer internship program for talented graduate students who want to work in industry during their studies. In MLRG we take a few interns each year to work on our research goals in IR, NLP and ML. We're based near Boston, MA. Note: our lunchtime sessions of Mario Kart are entirely optional, but any trash talking must be backed up by results.

The MLRG team

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.