Developed by Abraham Wald more than a half century ago, the Sequential Probability Ratio Test (SPRT) is a statistical technique for binary hypothesis testing (helping to decide between two hypothesis
H0 and
H1) and extensively used for system monitoring and early annunciation of signal drifting. SPRT is very popular for quality control and equipment surveillance applications, in industries and areas requiring a highly sensitive, reliable and especially fast detection of degradation behavior and/or sensor malfunctions. Examples of SPRT-based applications include process anomaly detection for nuclear power plants, early fault predictions for wind turbines, anomaly surveillance for key satellite components in the aerospace industry, pro-active fault monitoring for entreprise servers, quality control studies in manufacturing, drug and vaccine safety surveillance, construction and administration of computerized adatptive tests (CAT) and many other.
Conventional techniques for signal monitoring rely on simple tests, based, for example on control chart schemes with thresholds, mean values, etc, and are sensitive only to spikes exceeding some limits or abrupt changes in the process mean. They trigger alarms generally just before failures or only after the process drifted significantly. Tighter thresholds lead to high numbers of false alarms and relaxed thresholds result in high numbers of missed alarms. Moreover, these techniques can perform very poorly in the presence of noise.
The popularity of using SPRT for surveillance applications derives from the following characteristics:
- SPRT detects signal discrepancies at the earliest mathematically possible time after the onset of system disturbances and/or sensor degradations. This property results from the fact that SPRT examines the statistical qualities of the monitored signals and catches nuances induced by disturbances well in advance of any measurable changes in the mean values of the signals.
- SPRT examines successive observations of a discrete process and is very cheap to compute. Thus, it is perfectly adapted for real time analysis with streaming data flows.
- SPRT offers control over false-alarms and missed-alarms rates via user-defined parameters.
The mathematical expressions for SPRT were derived under two assumptions: (1) The samples follow an a-priori known distribution function and (2) The samples are independent and identically distributed (
i.i.d). Let us consider a sequence of values {Yn} = y0, y1 ... yn resulting from a stationary process satisfying the previous assumptions. Let us further consider the case where the signal data obeys a Gaussian distribution with mean Âµ0 and variance Ïƒ02. The normal signal behavior is referred to as the null hypothesis, H0. Alternative hypothesis express abnormal behavior. For example one can formulate alternate H1,H2,H3 and H4 to characterize signals with larger or smaller mean or variance. More explicitly, the null and alternate hypothesis could be written as:
H0 : mean Âµ_{0} , variance Ïƒ_{0}^{2}
H1 : mean Âµ_{1} > Âµ_{0} , variance Ïƒ_{0}^{2}
H2 : mean Âµ_{2} < Âµ_{0} , variance Ïƒ_{0}^{2}
H3 : mean Âµ_{0} , variance Ïƒ_{3}^{2} = V Ïƒ_{0}^{2}
H4 : mean Âµ_{0} , variance Ïƒ_{4}^{2} = (1/V) Ïƒ_{0}^{2}
The likelihood ratio (LR) is the ratio of probabilities for observing the sequence {Yn} under an alternate hypothesis Hi, versus the null hypothesis H0 :
At each step of the {Yn} sequence, SPRT calculates a test index as the natural log of LR, referred to as LLR and compares it to two stopping boundaries:
If LLR is greater or equal to the upper boundary log(B) then H0 is rejected and Hi is accepted. If LLR is less or equal to the lower boundary log(A) then Hi is rejected and H0 is accepted. For as long as LLR remains between these two boundaries there is not enough evidence to reach a conclusion; the sampling yn continues and LLR(n) is updated for a new comparison. The decision boundaries are derived from the false and missed alarm probabilities via the following equations:
with
Î± = probability of accepting Hi when H0 is true (false alarm probability)
Î² = probability of accepting H0 when Hi is true (missed alarm probability)
After a decision is reached (Hi or H0 are accepted), LLR is reset to zero, the sampling y1,y2,â€¦ is re-started and a new sequence of calculations and comparisons is performed to validate Hi or H0 anew.
The power of SPRT comes from the simplicity of the LLR(n) expression for some common distributions. For the normal distribution we are considering and under the
i.i.d. assumption mentionded above, the P(Y{n} |H0) and P(Y{n} |H1) probabilities reduce to
leading to the following expression for LLR :
For each new sample yn the update of this expression is trivial with the cost of one multiplication and two additions:
Similarly, the LLR(n) expression for testing H3 (with Ïƒ32 = V Ïƒ02) against H0 is:
leading to an also fast and cheap LLR update as the sampling {Yn} continues.
Change of mean
The script
sprt.R illustrates a simple implementation of these formulas and the SPRT decision process. A random sequence with 1000 points (t=1,â€¦1000) is generated from a normal distribution with Âµ = 0 and Ïƒ = 1. The signal is modified between t=101 and t=200 with samples drawn from a normal distribution with Âµ = 1 and Ïƒ = 1.
R> vals <- rnorm(1000,mean=0,sd=1)
R> vals[101:200] <- rnorm(100,mean=1,sd=1)
The resulting signal is captured in the plot below, for t=1,..200
The SPRT LLR equation for change of mean are used with the following null and alternate hypothesis:
H0 : Âµ0 = 0 and Ïƒ0 = 1
H1 : Âµ1 = 0.8 and Ïƒ1 = 1
The value Âµ1 = 0.8 is arbitrary as we are not supposed to know if the signal mean shifted and by how much. The mysprt(..) function (see the
script) is used to calculate LLR(n), perform the comparisons, decide whether to accept the null or alternate hypothesis or keep sampling and updating LLR or restart the LLR sequence from zero after a decision was taken. The values false alarm and missed alarm parameters Î± and Î² were both set to 0.05. The plot_sprt() function generates several plots as shown below.
The first plot illustrates LLR(t) for t=1,â€¦200. Starting on the left one can see that LLR(1) is in the undecided region log(A) < LLR < log(B). At t=12 LLR falls under log(A) confirming H0. LLR is reset to zero and a new LLR(n) sequence is calculated until, at t=23 H0 is confirmed again. These sequences continue several times until t=101 when the mean shift kicks in. Very fast, at t=105 LLR rises above log(B) and H1 is confirmed. LLR is reset and at t=119 H1 is confirmed again. H1 continues to be confirmed over several sequences until we reach t=200 where
mysprt() was stopped (arbitrarily as in a real application it should continue running)
Over the t=1,..,200 interval, the application of SPRT generated 26 LLR (restart to H1/H0 decision) sequences. The length of a sequence corresponds to the number of samples/time needed to decide in favor of H1 or H0. The next two plots show the length of each successive sequence and the distribution (histogram) of the sequence lengths.
The decision time is generally short; for some sequences only 3 samples are necessary to validate H0 or H1, for some other up to 15 samples are required.
The next plot illustrates LLR without restart, that an LLR updated disregarding the log(A) and log(B) boundaries without stop and without decision. We see that as long as the samples come from the un-shifted signal (t=1,..,100), the non-restarted LLR continues to drop. Once the signal changes, LLR reverts the trend and keeps climbing. A posteriori, it is very clear where the signal change occurred : at t~100.
Change of variance
In this section we discuss the SPRT results when detecting a change in variance. For this purpose the original signal is modified between t=501 and t=600 with samples drawn, this time, from a normal distribution with Âµ = 0 and Ïƒ = 1.4
R> vals[501:600] <- rnorm(100,mean=0,sd=1.4)
The plot below represents the signal for t=401,..,600
We examined this subset of the signal with SPRT by calculating LLR with the following null and alternate hypothesis:
H0 : Âµ0 = 0 and Ïƒ0 = 1
H3 : Âµ3 = 0 and Ïƒ3 = 1.5
Î± and Î² were again set to 0.05. Note that we are constructing an alternate hypothesis with a Ïƒ3 value higher than the actual Ïƒ of the modified signal. The LLR sequences are illustrated below, together with the lengths of each sequence and the sequence lengths distribution.
One can see that the LLR sequences are longer and quite a few require a number of sample of O[30-50] to decide in favor of H1 or H0. The variance-shift detection appears, for this case, to be a tougher problem. But the comparison with the mean-shift case is not apple to apple and the {Âµ0,Ïƒ0,Âµi,Ïƒi,Î±,Î²} parameter space needs to be properly explored before drawing conclusions.
Practical aspects
For real world processes, signal shifts occur in multiple ways and SPRT should be applied for both mean and variance and for shifts in both directions (positive and negative). SPRT could be also run for multiple shift values and false and missed alarms probabilities. The efficient monitoring of a large number of signals with multiple SPRT tests per signal requires task parallelism capabilities and a flexible management of the various types of alarms which can be generated.
Often, SPRT is applied not to the raw signals but to the residuals between signals and signal predictions. The 'normal-operation' signal behavior is learned via specific techniques during a generally off-line training stage, after which the system is put in surveillance mode where departure from â€˜normalityâ€™ is caught, on-line, by SPRT at the earliest stage of process shifts.
Mathematical expressions for LLR can also be derived for distributions other than Gaussian but the analytical complexity can escalate quickly.
Some of these topics and especially how to put this technique into production, using, for example ORE's Embedded R Execution mechanism, will be expanded in future blogs.