Oracle CEP: Enriching the Results of CQL Aggregation Queries

In this post I continue to look at CQL aggregation queries and the issues that arise when writing real-world applications. One requirement that I have seen over and over in my interactions with CEP customers is the need to include some additional data in the output of the aggregation query, in addition to the aggregated value itself.  I shall call the addition of this additional data to the query result "enriching" the aggregation query. Enrichment examples include adding the symbol of an equity to the query result when calculating the moving average price, including the call center location when calculating the number of calls processed in the last minute, or selecting the campaign ID in the case of an advertising application that is aggregating revenue in real-time.

Let's consider some examples of enrichment queries in the context of the system monitoring example that was used in the previous post to this blog. Previously, we defined the following query

 
istream(
SELECT sum(severity) as totalseverity
FROM alerts[rows 3 slide 3]
HAVING COUNT (*) = 3)




 

This query processes a stream of incoming alert events each of which contains a severity code. The query sums up the severities of every three alert events into an outgoing aggregate or "complex" event. The query makes sure that it doesn't generate any output unless there are three alert events that compose the result and it makes sure that only consecutive sequences of incoming alert events generate output. For example, consider the input stream


alerts: 2, 7, 1, 4, 10, 1, 8, ...

The query will generate the following output:




10, 15,...



10 (2+7+1) is the first value output, followed by 15 (4+10+1), etc.  Now, suppose that in addition to selecting the totalseverity value, we also need to include the name of the machine room where the alerts were generated. For simplicity, I shall assume that all of the alerts in the incoming stream come from the same machine room. Here is the query:


 

istream (
SELECT machineroom as mr,
sum(severity) as totalseverity
FROM alerts[rows 3 slide 3]
GROUP BY machineroom
HAVING COUNT (*) = 3)
 


If we assume that all of the incoming alerts come from machine room 'mr10', this query will generate outbound events like the following:

 

('mr10', 10), ('mr10', 15), ...
 

Another common enrichment requirement is to enrich the aggregation result with a value from the most recent event participating in the aggregation.  For example, suppose that each incoming alert event contains the name of the system administrator who should be contacted and that we want to include the name of a system administrator in each outgoing complex event, as well. One way to do this is to use the name of the system administrator from the last event received. Here is a query that implements this strategy:

istream (
SELECT alerts.machineroom as mr,
recent.administrator as admin,
sum(alerts.severity) as total
FROM alerts[rows 3 slide 3] as alerts,
alerts [rows 1] as recent
GROUP BY alerts.machineroom, recent.administrator
HAVING COUNT (*) = 3)
 

Given the following input stream:

 

alerts: ('mr10','Hoyong',2), ('mr10','Alex',7), ('mr10','Eric',1) , ('mr10','Anand',4), 
('mr10','Seth',10), ('mr10','Andy',1), ('mr1','Manju',8), ...
 

This query produces the following output stream:

 

('mr10', 'Eric', 10), ('mr10', 'Andy', 15), ...

 

Notice that Eric and Andy are the administrators in the third and sixth incoming alert events, respectively.  It's worth saying a few words about how this query works conceptually, especially for those not familiar with CQL/SQL.  The query executes by first evaluating the FROM clause. The FROM clause specifies a type of join called a Cartesian product between the alerts and recent windows. Conceptually, this appends the recent row to each of the  rows in the alerts window. Then, the query groups these concatenated rows by the alerts.machineroom and recent.administrator columns, but since these columns have the same value in every row, this operation doesn't actually do anything.  Next, the HAVING clause is evaluated to see if there are three row concatenated rows or not. This test eliminates the startup events that don't include three rows. Finally, the select clause is evaluated which selects the machineroom and administrator columns (which are the same in every row) and aggregates the severity column. Note also, that the slide clause causes the query to be executed (modulo startup) once for every three events.

 


Technorati Tags: ,,,

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today