Email Article

Needles in Haystacks: Pattern Matching on Events in the Aleri Streaming Platform

by Jon Riecke, Lead Platform Architect

Suppose you are a systems administrator, and want to respond to potential break-ins quickly. The system continuously produces records of login attempts (perhaps encoded in a log file), and you'd like an alert if there were three unsuccessful attempts to login to an account within five minutes. How do you look for that pattern, especially when most of the attempts are not suspicious? But that's hardly the only example of “pattern matching”. One might look, for instance, for patterns of stock trades, or price movements, to react quickly if certain market conditions arise. It's important to respond efficiently, in real-time. And it's important that the pattern-matching rules be simple to write, transparent to read, and easy to maintain.

Complex Event Processing (CEP) systems seem to be divided into camps: those based on “rules” (and who derive a heritage from expert systems), and those based on “relational databases” or SQL. A relational approach to these pattern-matching problems is possible, but usually messy, involving complicated “self-joins” and filters that obscure the problem.

It's for that reason that the next release, Release 3.0, of the Aleri Streaming Platform, builds in primitive support for pattern matching. It uses a powerful but simple syntax based on three ideas: Microsoft's LINQ syntax, matching syntax from functional languages, and the syntax of linear-time temporal logic.

The syntax starts from Microsoft's LINQ, which stands for "Language INtegrated Query". LINQ allows one to write queries using a “from-where-select” notation instead of SQL's “select-from-where”. Here's an example:


              from c in colors
              let middle = c.Substring (1, c.Length - 2)
              where middle.Contains ("e")
              select new { color = c.color, mid = middle }
            

Pattern matching also borrows ideas from functional languages too, e.g., CAML, Standard ML, Haskell, and F#. Here's an example of the match operator in CAML that looks for certain values (Left, Right) in Button records, and runs different code for each:


               match event with
               Button{type1=Left} -> ...
               | Button{type1=Right} -> ...
            

Finally, the syntax uses certain operations from temporal logic. More precisely, it has a “within” clause to specify a time window and contains Boolean operations for combining events.

It's easier to explain with a few examples.

Example 1: This example checks to see whether a broker sends a buy order on the same stock as one of his or her customers, then inserts a buy order for the customer, and then sells that stock. It creates a “buy ahead” event when those actions have occurred in that sequence.


              within 5 minutes
              from 
                  BuyStock[Symbol=sym; Shares=n1; Broker=b; Customer=b] as Buy1,
                  BuyStock[Symbol=sym; Shares=n2; Broker=b; Customer=c] as Buy2,
                  SellStock[Symbol=sym; Shares=n1; Broker=b; Customer=b] as Sell
              on Buy1 fby Buy2 fby Sell
              output [Symbol=sym; Shares=n1; Broker=b];
            

The on clause specifies the temporal relationships between events. The fby construct is an abbreviation of “followed by”, meaning that one event is followed by another. That event might not be the next event—it must just occur at some point after the preceding event. Note the use of the variables sym, n1, n2, b, and c. These are bound to the values of the matching fields. Because the same variable sym is used in three patterns, the values in the three events must be the same (this is what is meant by “unification”). Different variables might have the same value, though (e.g.,n1 and n2.)

Example 2: The following example shows Boolean operations on events. The rule describes a possible theft condition, when there has been a product reading on a shelf (possibly through RFID), followed by a non-occurrence of a checkout on that product, followed by a reading of the product at a scanner near the door.


              within 12 hours
              from
                  ShelfReading[TagId=tag; ProductName=pname] as onShelf,
                  CounterReading[TagId=tag] as checkout,
                  ExitReading[TagId=tag; AreaId=area] as exit
              on onShelf fby not(checkout) fby exit
              output [TagId=t; ProductName=pname; AreaId=area];
            

One may also use the Boolean “and” and “or” operations inside the “on” clause.

Example 3: The next example shows the example at the beginning of the article, where we want to raise an alert if a user tries to login to an account unsuccessfully three times.


              within 5 minutes
              from
                LoginAttempt[IpAddress=ip; Account=acct; Result=false] as login1,
                LoginAttempt[IpAddress=ip; Account=acct; Result=false] as login2,
                LoginAttempt[IpAddress=ip; Account=acct; Result=false] as login3,
                LoginAttempt[IpAddress=ip; Account=acct; Result=true] as login4
              on (login1 fby login2 fby login3) and not(login4)
              output [Account=acct];
            

This example uses constants true and false instead of variables in part of the pattern. This means that the Result fields must hold exactly these values.

Example 4: People wishing to break into computer systems often scan a number of TCP/IP ports for an open one, and attempt to exploit vulnerabilities in the programs listening on those ports. Here’s a rule that checks whether a single IP address has attempted connections on three ports, and whether those have been followed by the use of the “sendmail” program.


              within 30 minutes
              from 
                Connect[Source=ip; Port=22] as c1,
                Connect[Source=ip; Port=23] as c2,
                Connect[Source=ip; Port=25] as c3
                SendMail[Source=ip] as send
              on (c1 and c2 and c3) fby s1
              output [Source=ip];
            

Again, notice the use of constants in the pattern.

These examples barely scratch the surface of what's possible. For instance, rules are contained in Pattern Streams, and multiple rules can be used with one Pattern Stream. That means that multiple rules could be used to look for the same type of event. Also, the “output” statement can be replaced by arbitrary imperative code, written in the SPLASH language of the Aleri Streaming Platform. In that code, one could, for instance, check further conditions with if-statements, or loop over data previously stored. Finally, Pattern Streams can be combined with more relational-style computations to track, for example, dynamic statistics.

The combination of these features makes the Aleri Streaming Platform a particularly powerful CEP engine. Users don't need to choose between the rules approach or the relational approach—they get both in one package.