![]()
We’re pleased to announce the new feature release of Tribuo, v4.2. This release adds ONNX export support for Tribuo models, new models including factorization machines, and an automatic reproducibility system for models trained in Tribuo. Tribuo is a Java machine learning library developed by Oracle Labs, which was open sourced under an Apache 2.0 license in September 2020.
The full release notes for this release are available on the Tribuo website and explore the Tribuo project.
Export models in ONNX format
ONNX is a cross-platform and cross-library model exchange format. ONNX models can be served by various packages and hardware accelerators, and Tribuo can already serve ONNX models through its ONNX Runtime interface. With the v4.2 release, Tribuo can now export models in ONNX format for serving in cloud services, on edge devices, or in other languages like Python, C#, or JavaScript.
In this release, Tribuo supports exporting linear models (multiclass classification, multilabel classification, and regression), sparse linear regression models, factorization machines (multiclass classification, multilabel classification, and regression), LibLinear models (multiclass classification and regression), LibSVM models (multiclass classification and regression), and ensembles of those models, including arbitrary levels of ensemble nesting. We plan to expand this coverage to more models over time. However, for TensorFlow we recommend that you export those models as a saved model and use the Python tf2onnx converter.
Tribuo models exported in ONNX format preserve their provenance information in a metadata field which is accessible when the ONNX model is loaded back into Tribuo. This functionality allows ONNX models to interact with the rest of the Tribuo library, including the reproducibility system that we’ve added in this release. You can find more details on our ONNX support in the ONNX export tutorial.
Tribuo’s ONNX support is extensible, and the org.tribuo.util.onnx package that implements ONNX model generation has no dependencies on other Tribuo code. So, the wider JVM ecosystem can use it to write out ONNX models. It’s packaged in a separate jar file, and we plan to expand the set of supported ONNX operations over time. We welcome community contributions to expand ONNX support on the JVM.
Model deployments with OCI Data Science
The ONNX format is widely used across cloud providers, and Oracle Cloud Infrastructure (OCI) has two services which can automatically serve and scale up ONNX model deployments:
In the v4.2 release, we added support for deploying Tribuo models directly to OCI Data Science with a wrapper that presents a Tribuo model interface to a data science model deployment, scoring examples using the REST endpoint for that deployment. You can find more details in the model deployment section of our ONNX export tutorial. You can use our ONNX export support to prepare models for Machine Learning services too, well though we don’t have direct deployment support in Tribuo yet.
New models
In this release we’ve added factorization machines, HDBSCAN, and classifier chains. Factorization machines are a powerful non-linear predictor that uses a factorized approximation to learn a feature-feature interaction term and a logistic regression. We’ve added factorization machines for multiclass classification, multilabel classification, and regression.
HDBSCAN is a hierarchical density-based clustering algorithm, which chooses the number of clusters based on properties of the data rather than as a hyperparameter. The Tribuo implementation can cluster a dataset, and then at prediction time, it provides the cluster with what the given data point would be without modifying the cluster structure.
Classifier chains are an ensemble approach to multilabel classification that, given a specific ordering of the labels, learns a chain of classifiers where each classifier gets the features and predicted labels from earlier in the chain. We also added ensembles of randomly ordered classifier chains, which work well in situations when the ground truth label ordering is unknown (most of the time). As part of the work on classifier chains, we added more performance measures to multilabel evaluations and improved the data loading mechanisms for multilabel data.
Reproducibility framework
Tribuo has strong model metadata support with its provenance system, which records how models, datasets, and evaluations are created. In this release, we enhance this support by adding a push-button reproduction framework that accepts either a model provenance or a model object and rebuilds the complete training pipeline, ensuring consistent usage of RNGs and other mutable state.
With this capability, Tribuo can easily rebuild models to see if updated datasets change performance or if the model is even reproducible, which regulatory reasons might require. Over time, we aim to expand this support into a full experimental framework, allowing models to be rebuilt with hyperparameter or data changes as part of the data science process or for debugging models in production.
This framework was written by Joseph Wonsil and his PhD advisor Professor Margo Seltzer at the University of British Columbia as part of a collaboration between Professor Seltzer and Oracle Labs. We’re excited about working with Joe, Margo, and the rest of the Systopia lab at UBC to improve Tribuo’s support for provenance and reproducibility.
Get involved with Tribuo
For the next release of Tribuo, we’re planning to migrate from Java serialization, start investigating what benefits we can get from improvements to the Java language in Java 17 (like the Vector API), and add more models and other smaller improvements.
As always, we welcome community contributions to Tribuo, and you can check out the project on GitHub or at our website.
