Oracle CEP Applications and CQL Aggregation

It is not uncommon for stream processing applications to make use of aggregation. Typical examples of this include calculating the average price of an equity over the last N milliseconds, counting the number of calls that a call center processes each minute, or summing up the revenue being generated by an online ad campaign in real-time.  Writing a CQL query to handle these types of use cases may seem straightforward, however, most applications need to do more than compute a simple aggregate value and this can cause the query complexity to increase. In this blog post I take a look at some of the finer points of CQL aggregation that arise when one writes a real world application.

Let's begin with a simple example. Consider the query below.


istream(
SELECT sum(severity) as totalseverity
FROM alerts [rows 3])


The idea is that this query processes a stream of alert events and calculates a stream of derived events that aggregate the last three alerts into a single "complex" event.  The output events are output as a stream, hence the need for the istream operator. The example is simplified for ease of explanation, but aggregating several low-level alerts into a higher level alert that a system administrator is interested in seeing is a common scenario in system monitoring CEP applications.  Given a stream of input events with the following severities

alerts: 2, 7, 1, 4, 10, 1, 8, …


This query will output the stream

0, 2, 9, 10, 12, 15, 19, …


So far, so good.  But, you may be wondering why the initial 0 event is output. This has to do with the CQL's SQL heritage.  In order for CQL to be SQL compliant, it must generate this event. Also notice that no event was output when the second input event of 1 was received. This is because the output value did not change in this instance (the previous output value was 1+4+10=15 as well). Our query only generates an output event when the query result changes.


Now, suppose that the system administrator only wants to see an aggregated event that is the composition of 3 low-level events. In other words,  the system administrator shouldn't be bothered with the initial 0, 2 and 9 events that are output before the input window is full. What does this query look like?


istream(
SELECT sum(severity) as totalseverity
FROM alerts[rows 3]
HAVING COUNT (*) = 3)


As the query above shows, this requirement can be met by using the HAVING clause. The having clause ensures that an output event will only be generated when there are three events in the input window. Here is the output stream for this query


10, 12, 15, 19, …
 

Next, suppose that the system administrators only want to see the aggregated alert for each consecutive sequence of low-level alerts. In other words, the administrators don't want to see multiple aggregated events that contain the same low-level event. Here is a query that accomplishes this:




istream(
SELECT sum(severity) as totalseverity
FROM alerts[rows 3 slide 3]
HAVING COUNT (*) = 3)

 


Notice the addition of the slide clause in the window definition. Specifying a slide of 3 means that the query will only be evaluated once for every three events, instead of being evaluated for every event. Here is the new sequence of output events


 

10, 15,...
 

10 (2+7+1) is the first value output, followed by 15 (4+10+1), etc.  We have cleaned up the output stream and reduced the number of events that are seen by system administrators quite a bit -- from seven down to two. These examples have illustrated the capabilities that CQL has when it comes to controlling the output of aggregation queries.  Every CQL developer should understand these techniques.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today