If you can describe your machine learning problem as “predict a number from a table of features,” the new ADS Regression Operator is built for exactly that workflow.

The operator gives you a low-code way to train, evaluate, and package supervised tabular regression models with a simple YAML configuration. Instead of stitching together preprocessing, model selection, metric calculation, artifact writing, and reporting by hand, you can define the problem once and let ADS run the end-to-end workflow for you.

In this post, we will walk through what the ADS Regression Operator is, where it helps, and a hands-on example using the Bike Sharing dataset.

ADS Regression Operator

What Is the Regression Operator?

The ADS Regression Operator is a low-code operator for supervised tabular regression. It takes a training dataset, a target column, and a small set of configuration choices, then produces a trained model and a standard set of outputs such as predictions, metrics, an HTML report, and a serialized model artifact.

At a high level, the operator helps with:

  • loading training data and optional test data
  • using the target_column you specify in the YAML
  • preprocessing mixed tabular features
  • training a regression model
  • evaluating the model on training and optional held-out test data
  • generating artifacts you can review or operationalize

The operator currently supports these model options:

  • auto
  • linear_regression
  • random_forest
  • knn
  • xgboost

When you choose auto, the operator compares the supported regression model families using cross-validation and selects the best one for the configured metric. If you already know the model family you want, you can run one explicitly.

Why This Operator Is Helpful?

Many regression projects start with the same boilerplate:

  • read data
  • split features and target
  • detect numeric, categorical, and date columns
  • impute missing values
  • encode categories
  • train a model
  • score predictions
  • save outputs somewhere consistent

That work is necessary, but it is also repetitive. The Regression Operator reduces that overhead so you can focus on the problem itself: what you want to predict, which data you trust, and how to interpret the results.

The operator is especially useful when you want:

  • a repeatable YAML-driven workflow instead of one-off notebook code
  • built-in preprocessing for mixed tabular data
  • a consistent artifact layout for review and handoff
  • a fast way to compare baseline models
  • an easier path from experimentation to deployment

Example: Predicting Daily Bike Rental Demand

For this example, we will use the public Bike Sharing dataset from the UCI Machine Learning Repository. The dataset contains hourly and daily rental counts for the Capital Bikeshare system from 2011 and 2012, along with weather and seasonal information. The daily dataset includes the target column cnt, which makes it a natural regression problem.

We want to predict daily bike demand using features such as:

  • date
  • season
  • month
  • holiday and working-day indicators
  • weather situation
  • temperature, humidity, and wind speed

One important detail: the dataset also includes casual and registered, and those two values add up to cnt. Because they directly reveal the target, they should not be used as input features for this example.

Step 1: Install the Regression Operator dependencies

python3 -m pip install "oracle_ads[regression]"

Step 2: Generate starter files

ads operator init -t regression --overwrite --output ./regression/

This generates a starter regression.yaml and backend configs you can customize.

Step 3: Prepare the bike-sharing train/test split

Download the dataset from UCI:

After downloading and unzipping the archive, use the day.csv file.

You can create a simple train/test split with pandas or your preferred data prep tool. For example:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("day.csv")

feature_columns = [
    "dteday",
    "season",
    "yr",
    "mnth",
    "holiday",
    "weekday",
    "workingday",
    "weathersit",
    "temp",
    "atemp",
    "hum",
    "windspeed",
    "cnt",
]

df = df[feature_columns]

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42, shuffle=True)

train_df.to_csv("day_train.csv", index=False)
test_df.to_csv("day_test.csv", index=False)

Step 4: Configure the Regression Operator

Replace the contents of ./regression/regression.yaml with a configuration like the one below. Update the file paths so they point to your local copies of day_train.csv and day_test.csv.

The example marks dteday as a date column, asks the operator to choose a model automatically, and writes outputs to a results directory.

kind: operator
type: regression
version: v1
spec:
  training_data:
    url: /path/to/day_train.csv
  test_data:
    url: /path/to/day_test.csv
  output_directory:
    url: /path/to/results/blog_bike_sharing_regression
  target_column: cnt
  column_types:
    dteday: date
  model: auto
  metric: rmse
  generate_report: true
  generate_explanations: true

Step 5: Verify the configuration

ads operator verify -f ./regression/regression.yaml

Step 6: Run the operator locally

ads operator run -f ./regression/regression.yaml -b local

Expected Artifacts

After a successful run, you can expect outputs such as:

  • training_predictions.csv
  • test_predictions.csv
  • training_metrics.csv
  • test_metrics.csv
  • report.html
  • model.pkl
  • global_explanations.csv when explanations are enabled

This artifact pattern is useful because it makes review and handoff much easier. A teammate can quickly inspect predictions, open the HTML report, or pick up the saved model without reverse-engineering the training code.

The report.html file includes an Actual vs Predicted scatter plot that helps you quickly assess model quality. Each point represents one record, with the actual target value on the x-axis and the predicted value on the y-axis. Points closer to the diagonal reference line indicate better predictions, while points farther away highlight larger errors.

Regression training result graph

If explainability is enabled, the report also includes a Global Feature Importance bar chart. This chart shows which features had the strongest influence on the fitted model, making it easier to understand what drove the predictions.

Regression global importance graph

A Few Practical Tips

  • Start with auto when you want a quick baseline across supported model families.
  • Add a test_data dataset whenever you want a clearer picture of held-out performance.
  • Be careful about leakage. If a feature is derived from the target or reveals it too directly, leave it out.
  • Use column_types when you want to force a field such as dteday to be treated as a date.
  • Turn on report generation so you get a shareable HTML summary of the run.

Closing Thoughts

The Regression Operator makes it easier to move from a tabular dataset to a working regression workflow without writing the same scaffolding code every time. It gives ADS users a repeatable way to preprocess data, train models, compare options, evaluate results, and save useful artifacts in one place. If your next project involves predicting a continuous value from structured data, the Regression Operator can help you get to a solid baseline quickly, while still leaving room to tune, explain, and productionize the result. You can find more details about our ADS Regression Operator from this official documentation page.