Earlier this month, Oracle Cloud Infrastructure (OCI) Data Science released support for the Intel scikit-learn extension, daal4py, to accelerate your scikit-learn applications. This extension helps to speed up your scikit-learn models with the use of Intel oneAPI Data Analytics Library (oneDAL).
Daal4py is designed to make your machine learning in Python fast and easy to use. With minmal code changes, daal4py dynamically patches scikit-learn estimators to use Intel oneDAL library as the underlying solver, making it produce the same results faster.
The following scikit-learn algorithms are available as part of the accelerator:
The following example shows the improvement in performance when training a k-means model using scikit-learn and daal4py accelerator.
1. Install the latest version of Intel scikit learn extension.
pip install scikit-learn-intelex
2. Load the necessary modules.
import daal4py.sklearn
import importlib
import logging
import numpy as np
import sklearn
import time
import warnings
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
3. Prepare the dataset using the sklearn make_blobs function, which generates isotropic Gaussian blobs for clustering. The following command creates a dataset with 100K rows and 150 columns.
rows, cols = 100000, 150
X, y = make_blobs(n_samples=rows, n_features=cols, centers=8, random_state=42
4. Train the k-means models using sklearn on the dataset.
from sklearn.cluster import KMeans
estimator = KMeans(n_clusters=8)
print("Module being used: " + estimator.__module__)
t0 = time.perf_counter()
trained = estimator.fit(X)
fit_elapsed = str(time.perf_counter() - t0)
print("Training took seconds " + fit_elapsed + " seconds")
5. Train the k-means models using the daal4py accelerator. To use the accelerator, you only have one extra step. To use oneDAL as the underlying solver, use daal4py to dynamically patch the sklearn estimators. You get the same solution as before, but faster. The sklearn modules must be imported again after the patching is complete.
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.cluster import KMeans
estimator = KMeans(n_clusters=8)
# After patching, this should indicate daal4py is being used
print("Module being used: " + estimator.__module__)
After patching, now train using daal4py accelerator.
t0 = time.perf_counter()
trained = estimator.fit(X)
elapsed = str(time.perf_counter()-t0)
print("Training time in seconds " + elapsed+" seconds")
6. Finally, unpatch daal4py and reload to start using sklearn modules.
daal4py.sklearn.unpatch_sklearn()
sklearn = importlib.reload(sklearn)
# remember to re-import all the relevant modules
Comparing the performance, the daal4py accelerators provide almost 50% improvement in training time. Hopefully, this brief overview gives an idea of how Intel accelerators for sklearn can help improve performance; they are now available within Oracle Cloud Infrastructure Data Science environment.
For more information, see the following resources:
Currently working as Product manager associated with Data & AI group within Oracle Cloud Infrastruture.
Prior to moving to Product role I was a practitioner in Data science space. Over the years my experience has been in applying Advanced analytics and Data science methodologies to various domains - Financial services, Teleco, Entertainment & Gaming and Cloud Business
Jize is a Seattle-based software engineer at Oracle with a doctorate in applied mathematics from the University of Washington.
Previous Post