Papers, research and other insights from Oracle Labs' Machine Learning Research Group

  • February 14, 2016

Hello World

Adam Pocock
Principal Member of Technical Staff

This blog is a place for members of the Information Retrieval and Machine Learning (IRML) group in Oracle Labs to write about what we're up to. We work on the research and development of information retrieval, natural language processing, and machine learning systems to help solve difficult business problems. We're going to talk about our papers, what conferences we're attending, and cool things we're doing with our research.


Stephen Green (Principal Investigator)

Steve is a researcher in Information Retrieval, like his father before him. He's been developing search systems for more than 20 years, both in research and as shipping products. He's been running the IRML group in one form or another for more than 10 years, doing work on passage retrieval, document classification, recommendation (for music in particular), and statistical NLP.

Jeffrey Alexander

Jeff has been working on Information Retrieval systems for more than 10 years. He has worked both in the inner reaches of an industrial strength research search engine and in creating abstract frameworks for pushing data in and out of many different engines. When possible, he combines his interest in IR with his interests around scalable and distributed systems, building highly scalable distributed systems for search-related tasks.

Pallika Kanani

Pallika works at the intersection of NLP and Machine Learning. She is interested in information extraction, semi-supervised learning, transfer learning, and active information acquisition. She works closely with various product groups at Oracle on real world applications of Machine Learning, and has worked extensively with social data. She did her PhD at UMass, Amherst, under Prof. Andrew McCallum. Along the way, she interned for the Watson Jeopardy! project at IBM, worked on Bing at Microsoft Research, played with algorithmic trading at Deutche Bank, analyzed babies' growth in a psychology lab, tutored students for GRE, and worked in the family chemical manufacturing business. She also serves as a senior board member for Women in Machine Learning (WiML).

Philip Ogren

Philip has nearly 20 years of software engineering experience which includes four years on an Oracle product team and working on a variety of NLP open source projects during his PhD at the University of Colorado. He enjoys working on a variety of software engineering problems related to NLP including frameworks, data structures, and algorithms. His current interests include string similarity search and named entity recognition and linking.

Adam Pocock

Adam is a Machine Learning researcher, who finished his PhD in Information Theory and feature selection in 2012. His thesis won the British Computer Society Distinguished Dissertation award in 2013. He's interested in distributed machine learning, Bayesian inference, and structure learning. And writing code that requires as many GPUs as possible, because he enjoys building shiny computers.

Jean-Baptiste Tristan

John has been tempted over to Machine Learning from his first calling in Programming Languages research. His recent work is on scaling Bayesian inference across clusters of GPUs or CPUs, while maintaining guarantees on the statistical quality of the result. During his PhD he helped develop the Compcert compiler, the first provably correct optimising C compiler. This work won him the 2011 La Recherche award with his advisor and the CompCert research group.

Michael Wick

Michael works at the intersection of Machine Learning and NLP. He received his PhD in Computer Science from the University of Massachusetts, Amherst advised by Prof. Andrew McCallum. He has co-organized machine learning workshops and has dozens of machine learning and NLP papers in top conferences. In 2009 he received the Yahoo! Award for Excellence in Search and Mining and 2010 he received the Yahoo! Key Scientific Challenges award. Recently, a hierarchical coreference algorithm he created won an international competition held by the U.S. Patent Office (USPTO). The algorithm is both the fastest and most accurate at disambiguating inventors, and will soon drive patent search for the USPTO. His research interests are learning and inference in graphical models, and structured prediction in NLP (e.g. coreference).


Our recent published research has focused on a couple of different areas within ML and NLP.

  • Speeding up Bayesian inference by using clusters of GPUs or CPUs, whilst maintaining statistical guarantees about the stationary distribution. We have several ML papers on this topic, published in NIPS 2014, ICML2015, AISTATS 2016 and a paper on approximate counters in PPoPP 2016.
  • Extracting information from noisy, poorly written text. We had a paper in a NIPS 2015 workshop, and several are in preparation.
  • Extending NLP systems into multiple languages. Our first paper in this area appears at AAAI 2016.

There are a few other areas we are interested in:

  • Scalable coreference and entity linking.
  • Improving search results with learning to rank.
  • Applying Recurrent Neural Networks to grammatical inference and program induction.
  • Deep learning, just like everyone else.


Oracle Labs has a summer internship program for talented graduate students who want to work in industry during their studies. In IRML we take a few interns each year to work on our research goals in IR, NLP and ML. We're based near Boston, MA. Note: our lunchtime sessions of Mario Kart are entirely optional, but any trash talking must be backed up by results.

The IRML team

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.