Friday, December 26, 2008

Footnotes to Philip Howard's - "Untangling Events"

My employer, IBM, does not allow to transfer vacation days across years, thus, even that I do not celebrate any major holiday this week, I have decided that this is a good time to spend the rest of my vacation days for the year, and takes two weeks off (one of them is already behind me) - spending some time with my children, taking care of some neglected health issues, and also reading books (it is rainy and cold, not a time to wonder around much...). I have looked today a little bit on the Web to see if I have missed something, and found out on David Luckham's site, a reference to Philip Howard from Bloor who writes about - untangling events. I understood that Philip is trying to look at various event-related marketing terms and determine whether there are synonyms, whether there is a distinct market for each... In doing that he is trying to list the various functions done by event processing applications and then gets to the (unsurprising) conclusion that each application does some subset of this functionality. but at the end he admits that he did not get very far and left the question unanswered, promising to dive more into it.

In essence he is right in the conclusion -- all the various functions create some continuum which a specific application may need all or a subset of them. Typically there is a progression - starting from getting the events and disseminate them (pub/sub with some filtering), then advancing to do the same with transformation, aggregation, enrichment etc -- so the dissemination relate to derived events and not just to the raw events, and then advancing to pattern detection to determine what cases need reactions ('situations') and what events should participate in the derived events (yes - I still owe one more posting to formally define derived events).

One can also move above all of these and deal with uncertain events, mine event patterns, or apply decision techniques for routing.

I think that there are multiple dimensions of classification of applications:
  • Based on functionality; as noted above.
  • Based on non-functional requirements -- QOS, scalability in state, event throughput etc,
  • Based on type of developers --- programmers vs. business developers
  • Based on goal of the application --- e.g. diagnostics, observation, real-time action...

There may be more classifications --- the question is whether we can determine a distinct market segments ? probably yes -- with some overlaps. This requires empirical study, and indeed this is one of the targets of the EPTS use-cases working group that is chartered to analyze many different use cases and try to classify them. Conceptually for each type there should be a distinct benchmark that determines its important characteristics.

Still - I think that all the vendors that are going after "event processing" in the large sense will strive to support all functionality. As analog: not all programs requires the rich set of built-in functions that exist in programming languages, but typically languages are not offered for subsets of the functionality. Likewise -- looking at DBMS products, most vendors support the general case. Note that there is some tension between supporting the general case and supporting a specific function in the most efficient way, but I'll leave this topic to when I am blogging in an earlier hour of the day --- happy holidays.

Wednesday, December 24, 2008

On Data Mining and Event Processing

Today I have travelled to Beer Sheva, the capital of the Negev, which is the south part of Israel, that consists mostly of desert. I have visited Ben-Gurion University, met some old friends, and gave a talk in a well-attended seminar on "the next generation of event processing". I have travelled by train (2 hours and 15 minutes each direction), and since my last visit there five years ago or so, they have built a train station from which there is a bridge which goes to the campus, very convenient. Since I am not a frequent train rider in Israel, I discovered that in both ends of the line, there are no signs saying which trains go on which track, and this is assumed to be common knowledge... Although they do notify when a train entered the station where it is going and from which track, but still they have a point to improve.

Since some of the people attended my talk were data mining people they have wondered about the relationships of event processing and data mining, since I've heard this question before I thought the answer will be of interest to more people.

In the "pattern detection" function of event processing, there is a detection in run-time of patterns that have been predetermined, thus, the system knows what patterns it is looking for, and the functionality is to ensure correct and efficient detection of the predetermined patterns.

Data mining is about looking at historical data to find various things. One of the things that can be found are patterns that have some meaning and we know that if this pattern occurs again then it requires some reaction. The classic example is "fraud detection", in which, based on mining past data, there is a determination that a certain pattern of action indicates suspicion of fraud. In this case the data mining determines the pattern, and the event processing system finds in run-time that this pattern occurs. Note, that not all patterns can be mined from past data, for example- if the pattern is looking for violation of regulations, then the pattern stands for the regulation, but this is not mined, but determined by some regulator and is given explicitly.

So - typically data mining is done off-line and event processing is performed on-line, but again, this is not true for all cases, there are examples in which event processing and mining are mixing in run-time. An example is: there is a traffic model, according to which the traffic light policies are set, but there is also constant monitoring of the traffic, and when the monitored traffic is deviating significantly then the traffic model has to be changed and the policies should be set according to the new traffic model, this is kind of mix between event processing and mining, since the actual mining process is triggered by events, and the patterns may change dynamically as the result of this mining process.

More - Later.