Saturday, December 22, 2007

On the envelope for CEP


In two recent blogs - Mark Tsimelzon from Coral8 argued for CEP server vs. embedded CEP libraries, while Paul Vincent from TIBCO argued for the need for infrastructure stack outside the
Since I avoid making product evaluations in this Blog, I'll talk about the principle.
Conceptually we have a model of "event producer" that produces the events, "event consumer" that consumes the event, and the "event processor" which processes the events and stands in the middle. There are two questions about envelopes:
(1). Is "event processor" an embedded capabilities inside applications or an server.
(2). If "event processor" is a server, is it independent server or part of a larger middleware ?
The answer, as anything is not binary (zero or one), there are cases in which there is a need to have event processing as embedded capabilities (e.g. inside pervasive devices), but Mark is right that the big majority of "event processors" are tend towards the server.
The second question is more interesting --- we have today both stand-alone server and servers that are part of a larger middleware, the rational behind being part of a larger middleware, stems from the fact that event processing is not isolated and has various relations with different applications in the enterprise, it is true that the loose-coupling nature of event driven architectures eases the task of separating it from the applications, but still the integration is the most costly part of building event processing applications, and means to ease the integration has already built into application integration middleware, and if the event processing server is a stand-alone one, there is a need to re-invent this integration, as Paul Vincent rightly say: every $ a CEP vendor spends on middleware integration is a $ less on interesting CEP functionality.
Furthermore, there are some event processing infrastructure and functions (pub/sub, routing - for simple event processing, and ESB mediations like - enrichment, transformation etc, that are already there). Thus, it seems that the ROI will be higher if event processors will be implemented on top of a "middleware stack".
An interesting observation is that from the point of view of application integration middlewares - event processing is becoming a key feature, and there are already some predictions that the standard event processing programming model (which is still not there!), will be the basis for application servers of the future, e.g. Gartner's XTP that I have once discussed and should discuss more.

Thursday, December 20, 2007

On - "one size fits all" and Event Processing


Like commercial TV station - if a Blog wants to get "rating" one have to put somewhat controversial - the number of visitors to this Blog has more than doubled in the last few days when I had exchanges of opinions and folk stories with Tim Bass, anyway -- I got tired and did not continue that discussion. One question that I have received somehow related was -- does the fact that I don't think it is worth talking about ESP and CEP as separate entities means that I believe that there is a "one size fits all" in event processing ? well - this is a fair question, in the past I did believe it is true, until I read Mike Stonebraker in his immortal assertion: "One size fits all is a concept whose time has come and gone" Actually, I ceased to believe in it a little earlier, I think that the event processing area is not a monolithic area, and there are some variations needed - however:
  • I don't believe that ESP vs. CEP is the right type of partition in this area;
  • There may be a need to have various implementation under one roof (the heterogeneous framework approach),

For the first point -- what is the right type of partition ? this is a multi-dimensional questions and we still have to learn more to know the most useful combinations.

One of the important dimensions is the "reason for use" dimension, and here in an internal IBM study we got to five different reasons to use, I'll write about it in one of the next postings.

EPTS has recently launched a workgroup that tries to identify these classifications by doing a comprehansive survey of use cases that will be compared using the same template. A team that consists of Tao Lin (SAP), Dieter Gawlick (Oracle) and Pedro Bizzaro (University of Coimbra, Portugal) is working on this template, and a larger team will handle this survey and analysis -- the end result - a collaborative white paper about the state of the practice in event processing is expected somewhere in the second quarter of 2008. Stay tuned.

More - Later.

Wednesday, December 19, 2007

On deleted event, revised event and converse event

First, congratulations to my student Ayelet Biger, who has successfully taken today her M.Sc. thesis defense exam. Ayelet's thesis topic has been - Complex Event Processing Scalability by Partition which deals with parallel execution of CEP logic, when there are complex dependencies among the different agents. I'll discuss this issue in one of the later postings - we still need to compose a paper for one of the coming conferences on this thesis. Ayelet is my 17th M.Sc. student that has been graduated (together with 5 Ph.D. students makes it the 22nd thesis exam). Most of the students have done theses on - active databases, temporal databases (my past interest areas) and in the last few years to event processing. Supervising graduate students is a great way to work on new ideas that I don't have ability to work on in my regular work, the only thing that is needed are three more hours in each day...

Today's topic is inspired by a recent blog that I have recently read by Marco Seiriƶ. Marco is one of the pioneers in EP blogging, I've started reading his blog in January 2006, when he started the blog as "Blog on ESP", however at some point his blog became "Marco writes about complex event processing", another evidence that the name ESP has disappeared. Anyway, in his Blog, Marco talks about event model, I'll not discuss event model today, but concentrate in one interesting point that Marco raises about "undoing events". This is indeed a pragmatic issue with some semantic difficulties. There are systems in which events can be deleted, and some actions can be triggered by the event deletion. However, event is not a regular data and cannot be treated as such, since event represents something that happens in reality, then conceptually events are "append only" - in database terms, one can only insert events, but not modify or delete them. Deleting events also blocks the way from the ability to trace decisions/actions or have retrospective processing of the events. So - when in reality we need to delete/undo/revise events:

  1. when event is entered by mistake - typically not the event itself, but some details in the event attributes, we'll need a possibility to revise event.
  2. when we wish an event no longer to effect the processing.
  3. when the event itself expired or we'll not need it anymore, and don't need to use it in any other processing - including retrospective.

The first case is a revision case - if we are in an "append only" mode, then the way to do it is to enter another event, and have the possibility that it will override an existing event (or set of events) for the purpose of processing. Example: somebody sent bid for an electronic auction and realized that one of the details (say: the price he is ready to buy) is wrong, then he can add another bid that will override the first bid. Why not delete the original bid ? it may be possible that the original bid is already in process, and the overriding cannot stop this process, even if not, there is a possibility that for retrospective processing we'll need to reconstruct past state which includes the original bid (these considerations are actually not new, we have thoroughly discussed these issues within the temporal database community a decade ago when we (Sushil Jajodia, Sury Sripada and myself) edited a book about temporal databases research and practice

The second case is even more interesting, but similar in type of thinking, here we would like to eliminate an event from taking effect, this can be done by sending a "converse event" that reversing the effect of the event - e.g. cancel bid. The implementation problem is that this event, and maybe its descendant events may have being flowing all over the event processing networks, with some even getting out from the EPN with actions triggered, some in process, and some are part of a state, but have not been processed yet (e.g. since a pattern has not been detected yet). Theoretically there is a possibility to apply something similar to "truth maintenance system" in AI that includes also the action and compensate for all actions, but this complicates the system, so recommended only when it is critical to do it (I'll discuss such cases in another postings), when the event has not gone out from the EPN, it is still possible to stop it, most system does not provide a language primitive to do it globally in an EPN, and recently I have watched a concrete customer case, where they had to do it manually.

The third case is the "vacuuming" case - when an event is no longer needed (in agents' state, in the global state etc..), I never got deep into this issue, but thought intuitively that it is a relatively easy problem; however, when this issue has been discussed in the Dagstuhl seminar last year, the claim was that the general issue of event vacuuming is still an open question.

I'll stop here now -- spent enough time on this one... more - later

Monday, December 17, 2007

CEP and the story of the captured traveller














Reading the recent posting of my friend Tim Bass entitled "CEP and the story of the Fish" I decided to answer with another story (from the other side of Asia) :

A traveller went in the jungle somewhere on the globe and unfortunately was captured by a tribe that is still using ancient weapons. He is brought to the chief, and the chief says - " You have trespassed into the tribe's territory, which is punishable by death, however, I am a very curious person, if you'll show me something I haven't seen before I'll let you go"; our unlucky traveller started to look in his pockets and the only meaningful thing he found was a lighter, so he took his chance, showing it to the chief saying: "this thing makes fire", however, since he was in under a big pressure, he pressed once - no fire, pressed twice - no fire, in the third time the lighter indeed has produced the promised fire, the chief did not hesitate and said "let him go", so our relieved traveller muttered to himself - "I knew that they have not seen a lighter", but surprisingly to him the chief said - "oh, I have seen many lighter, but a Zippo lighter that does not light in the first time I have never seen".

When someone disagrees with somebody else, it is very easy to assume that my point of view is right since I am smarter / knows more / more qualified / older / more experienced / generally always right etc... My preference is not to doubt the wisdom, experience or qualification of anybody that I am arguing / discussing / debating with, but make the arguments on the issue and not on the person who makes the arguments....

Enough introduction -- now for the main message of this posting, the term CEP (Complex Event Processing) has more or less agreed now in the industry to denote "computing that performs operations on complex events", where complex event is an "abstraction or aggregation of events". The term complex does not say that the processing is complex, but that it deals with complex events, as defined. Complex event processing is typically detecting predefined patterns that can be expressed by queries/rules/patterns/scripts and are deterministic in nature. Regardless if I think that this is the best term, I think that it is important to have common agreed terminology, otherwise we are confusing the industry, the customers (and sometimes ourselves). Now, Tim Bass claims that since event processing with stochastic/probabilistic/uncertain nature is more complex than what we call "complex event processing", we have to call this one - "complex event processing", and rename what we call "complex event processing" to be "simple event processing". Unfortunately, it is too late for that - and also not justified, again, since the "complex" in the "complex event processing" does not say that this is "complex processing of events" but that this is "processing of complex events" (very common misconception !). Bottom line: yes - there is another class of event processing capabilities that requires techniques from AI, machine learning, OR etc.. and that is not deterministic in nature; no - I don't think we should call it "complex event processing", we have suggested the term "intelligent event processing" which I have already referred to in previous posting , there are a variety of other postings that I have dedicated to terminology.

More - later

Sunday, December 16, 2007

On Event Stream Processing






This is in part a response to my friend and colleague Claudi for his recent post in the CEP Interest Group

There are many types of streams in the universe - the Gulf stream that affects the weather, a water stream who provide pastoral nature sight, and an audio stream, to name just a few.
In the event processing area the name "stream" appears first in the database research community, as a research project in Stanford. Interestingly the name "event" is never mentioned, and the term "data stream" is the central concept. The first one who made a blend of the "stream" concept and "event processing" concept is my friend Mark Palmer from Progress who did not like the "complex" word and thought that the term "event stream processing" will be more accepted, Mark certainly did not mean to talk about data streams in the academic sense. In the discussion session of the term event stream processing in Wikipedia
Mark writes:
ESP SHOULD GO AWAY AND I HELPED CREATE THE PROBLEM!!!
You are completely correct in my opinion; these should be merged. And I say this from the perpsective of the software vendor that popularized and caused the confusion in the first place. I'm the general manager of the Progress Apama software division and we coined the term "event stream processing" in April of 2005 when we acquired Apama for $30M - we didn't like the term "complex event processing" and decided to make up another term. Yes, stream processing, and data stream processing have been used as terms in academia, but we made up the term ESP as a synonym for CEP. Some on this list will argue that there are subtle, technical differences, but, being in the center of this quagmire of a debate, I think they should be merged, and that ESP should basically go away!
- Mark Palmer, General Manager, Progress Apama, mpalmer@PROGRESS.COM

Another indication of the blurring between ESP and CEP is that the vendor descendants of the academic projects - Streambase and Coral8 now positioned themselves as "complex event processing" vendors. Both have "complex event processing" all over their homepages, Streambase labels its product as - "complex event processing platforms" (well -- we'll discuss platforms in another posting); Coral8 has a portal which is offers self-service CEP. Aleri which also provides SQL oriented API, also uses the term CEP, although they are also using the term "Aleri streaming platform" as the way to do CEP. Thus, while the term "stream processing" is very much alive in the academic database community - see the VLDB 2007 program, for example, it seems that the market has already voted on the unification of these two terms, behind the CEP term.
Why did it happen ? - in the beginning we have seen some 2 x 2 matrices, showing that CEP is - complex and l0w-performance, while ESP is simple and high-performance. It does not seem that any vendor thought it is positioned in one of the extremes, since most applications are somewhat in the middle, and confusing the customers with two names from vendors who have competed on roughly the same applications and customers did not help any of the vendors, thus, the market wisely moved to one name (BTW - this name could have also been "event stream processing" as Progress suggested, but for some reason the term CEP has caught, and some potential customers are still nervous about the word "complex", but it got the traction nevertheless).
This has been until now discussion on branding, and did not answer the questions - whether there are real differences between ESP and CEP ? in some cases, people indicated theoretical differences, the most notable is: stream processing is ordered, while CEP is partially-ordered.
It may be true, though, I was never convinced that "total order" is an inherent property of stream, it is just the way it was happened to be defined in the academic projects, but I think that the more important difference is - whether we start from set-oriented thinking (stream processing) or from individual-event-oriented thinking (event processing), and there are pros and cons of thinking in each of them, but the bottom line is that real applications may be mixed, they may have ordered events from the same type (e.g. when we are looking at trends in time-series), or it can have unordered events of the same type (e.g. when we are looking at information from various sensors whose original timestamps may not be synchronized), in fact, it can have both in the same application. It is true that the space of CEP applications is not monolithic, but there are other classifications that are more useful then the classification of partial vs. ordered set, thus, for practical purposes, let's assume that "stream processing" as defined by those who are looking for the theoretical differences indeed covers a subset of the space of functionality, however - this subset is not important enough to have separate products covering it, or even to mention it as a sub-class.
Last but not least -- an answer to Claudi on his claim that there is not really a CEP engine, since none of the current products know how to obtain general relations among events and calculate transitive closures.
My answer is that event relationship definitions do exist, but this is not the main point, the point is that one may claim that "there is not really a CEP engine that contains all the possible language features that one can think of", and this is true, the EP discipline is young, and I am sure that we just scratched the surface, and EP products will include many features that we event did not think of them today (otherwise it is an indication that the area has failed!), however, without talking about a certain feature, CEP engines do exist today, none is perfect, but probably sufficient for big majority of the existing applications today, so theoretical perfection may not be the criterion to call something "CEP engine", we'll have to settle in "sufficient conditions"
I'll relate to relations among events, including transitive closure in another postings - but the way they exist or don't exist does not really matter for the question. Long posting today - so this is all for now.