We are pleased to announce the general availability of Oracle Machine Learning for Python 2.0 on Oracle Autonomous Database.
Oracle Machine Learning for Python (OML4Py) leverages the database as a high-performance computing environment to explore, transform, and analyze data faster and at scale from Python, where database tables and views can be manipulated using overloaded functions on Pandas DataFrame proxy objects. The in-database parallelized machine learning algorithms are exposed through a natural Python interface. Data scientists and other Python users can create user-defined Python functions that are managed in the database and leverage third-party packages to augment included functionality. Python objects can also be stored directly in the database – as opposed to being managed in flat files. These features facilitate collaboration across the data science team by enabling convenient hand-off from data scientists to application developers and production system administrators for immediate deployment. OML4Py also supports automated machine learning—or AutoML—which not only enhances data scientist productivity, but also enables non-experts to use and benefit from machine learning. AutoML can help produce more accurate models faster, through automated algorithm and feature selection, and model tuning and selection.
New features in OML4Py 2.0
OML4Py 2.0 inlcudes new data types to enable greater Python-based exploration and manipulation including date and integer column types. The Datetime type represents database table columns of type TIMESTAMP and DATE. Support for TIMESTAMP is available immediately. Support for DATE is coming soon. The Timedelta type represents the difference between two dates or times, and the Timezone type represents time zone data.
Additionally, OML4Py 2.0 makes available three additional in-databsae algorithms from a Python API. Exponential Smoothing (ESM) is a machine learning algorithm used for forecasting univariate time series data. Non-negative Matrix Factorization (NMF) supports feature extraction and is often used where there are many attributes/columns, and the attributes individually may have weak predictability. By combining attributes, NMF can produce meaningful patterns, topics, or themes, which can be fed into other machine learning algortihms. Extreme Gradient Boosting (XGBoost) is a highly efficient, scalable machine learning algorithm for regression, classification, and survival analysis that integrates the XGBoost Gradient Boosting open-source functionality into the in-database algorithm framework.
Oracle Machine Learning for Python feature summary
Use Oracle Database as high-performance computing environment
- Leverage database table and view proxy objects to explore, transform, and analyze data in the database at scale
- Use familiar Python syntax on database data that overloads native Python functions and translate functionality to SQL
Use in-database parallelized and distributed machine learning algorithms
- Build more models on more data and score large volume data – faster
- Use in-database algorithms via natural Python API
- Increased productivity from automatic data preparation, partitioned models, and integrated text mining capabilities
Run user-defined Python functions in database-spawned and controlled Python engines and manage Python objects in the database
- Collaborate to hand-off data science work products from data scientist to developers
- Run user-defined Python functions using system-provided infrastructure with data-parallel, task-parallel, and non-parallel invocation
- Return structured and image results in Python, SQL, and REST APIs
Automated machine learning (AutoML) and model explainability (MLX)
- Enhance data scientist productivity and enable non-experts to use of machine learning
- Perform automated algorithm and feature selection as well as model tuning and selection
- Model-agnostic identification of important features that impact model predictions
For details on OML4Py, please see the Oracle Machine Learning for Python API Documentation.