Training a Machine Learning Model on Time-Series Telemetry to Optimize Manufacturing

May 19, 2022 | 6 minute read
Joe Hahn
Senior Data Scientist
Text Size 100%:

A blast furnace heats raw iron ore to produce molten pig iron, and this furnace heating is the first major step in the steel-making process that is illustrated in Fig. 1 below.

Alternative Text
Figure 1. Schematic shows a skip-cart loading a charge of raw iron ore and coke (purified coal) into the top of a blast furnace that is heated from below via a hot air blast of temperture T~2000° F. This hot air blast heats the furnace and smelts the molten pig iron that flows out via the iron notch, with waste furnace gasses exiting above where bleeder valves protect the furnace from sudden pressure surges.


The blast furnace at one particular steel-making Oracle cloud customer was also prone to problematic `bleeder’ events where the furnace gas pressure can surge without warning by 20-100% (see Fig. 2 below), which triggers a bleeder valve to vent excess furnace gas. Though rare, these bleeder events are undesired because they interrupt production and can have environmental impact.

Alternative Text
Figure 2. Furnace pressure versus time for small (green curve), medium (blue), and large (orange) bleeder events.


To date, the furnace’s subject-matter-experts (SMEs) had not determined the root-cause of these events, so a traditional machine learning (ML) model was then trained on the furnace telemetry, to forecast whether a bleeder event would occur 30 minutes in advance. But that ML model was only a partial success. Although that ML model did alert on most events in sufficient time, testing revealed that this model also emitted three times as many false positive alerts, with those false positives causing furnace operators to ignore the alerting system. This Oracle customer then asked for assistance with this use-case, and this blog describes the recommended solution that was developed using the OCI-DataScience service on the furnace’s historical telemetry that this cloud customer had archived in its datalake within Oracle Cloud Infrastructure (OCI).


First note that this steel maker had achieved only partial success with its ML forecasting system, which tells us that this manufacturing mishap is a probabilistic event and is not deterministic. In other words, the furnace data does provide useful hints about whether a mishap is more or less likely to happen, but that data is not sufficiently diagnostic to allow an ML model to forecast when that mishap would happen again with the desired certainty.


In this circumstance, the better remedy is to instead use ML to optimize furnace operations, namely, to train an ML model to recommend to the furnace operator those furnace settings that would preserve furnace production while steering clear of those conditions that are more likely to lead to a mishap. The recommended solution trains an ML model to instead forecast a derived quantity called furnace-health furnace health where is the furnace pressure, mass production rate is the furnace’s mass-production-rate, and the so-called mishap-factor f is almost always unity except during rare moments that are detailed below. The goal of this optimization strategy is to always seek those furnace settings that minimize in the near future, which implies that production mass production rate is sustained in way that prevents p from ramping up rapidly.


ML for optimization: This new ML model is trained on a short list of the model’s most impactful features X, namely, furnace pressure p, furnace production-rate furnace production rate, furnace-health h, plus a few other features relevant to furnace operations. This new ML model is also trained on the furnace’s wind-rate w, which is the main control lever that the operator uses to manage furnace operations, so w is the furnace setting that is to be optimized by this ML model. And to make the model time-sequence aware, the ML model is also trained on δX-1 = X – X-1 which measures how feature X changed from its value at the previous timestep (aka moment-in-time) X-1 to its value X at the next moment. And because we want to use this model as a recommendation engine, the model is also trained on δw+1, which is the change that the furnace operator will make to the furnace’s wind-rate during the subsequent timestep. Note that δw+1 is a forward-looking quantity that is known when training the model on historical data but is of course unknown when the model is used in production. Lastly, this ML model is trained to predict h+2, which is the furnace-health two time-steps into the future. This results in an ML model that is a function of a single unknownδw+1, which is the wind-rate change that the operator could make during the subsequent timestep, with that model also predicting the furnace health h+2 two timesteps out. This ML strategy now allows the furnace operator to use the ML model to optimize furnace settings since the optimal change to furnace wind-rate δw+1 will be the value that minimizes h+2. Consequently, this optimization strategy will always recommend to the operator the furnace setting that would steadily nudge the furnace towards a lower value of h such that furnace production rate mass production rate is maximal while keeping furnace pressure p under control.


Note that the mishap factor appearing in the furnace-health score furnace health is always unity except when the furnace is one time-step away from a mishap, for which f=3. So when training the ML model on historical data, this setting flags those furnace operations that resulted in a bleeder event. So f is also a forward-looking quantity that ordinarily would not be known when the ML model is being used to predict future furnace health h+2. Nonetheless, by setting f=1 when using ML to optimize the furnace’s δw+1, that setting also constrains the ML model to only consider those possible future-states where the furnace is always at least one or more timesteps away from a bleeder event. This f=1 requirement thus allows this optimization strategy to navigate the furnace around or away from a mishap in the future.


Next step: testing the solution. Lastly note that an ML optimizer is a recommendation engine whose efficacy can only be assessed while the solution is used in production. That testing strategy is known as A/B testing and involves temporarily exposing the ML model’s recommended furnace settings to the furnace operator and then monitoring whether production is sustained while bleeder frequency is reduced. Which is this Oracle customer’s next step, the results of which will be reported via a follow up blog.



Joe Hahn

Senior Data Scientist

Joe is an Oracle data scientist, and he specializes in delivering machine learning, analytics, and data visualization on Oracle Cloud Infrastructure (OCI). Joe received a PhD in physics from the University of Notre Dame, and also has many years experience performing astronomy research and scientific computing.

Previous Post

Deeper Analysis from Data Science Model Deployment Logs

Douglas Silva | 11 min read

Next Post

Advancing Healthcare Image Analysis on OCI - Diabetic Retinopathy

Subhan Chaudry | 8 min read