Saturday, September 15, 2007

Meta-language as a standard for EPL (event processing language)

I have started my welcome address to the participants of the Dagstuhl seminar on event processing by saying "Bonan Matenon" - nobody in the audience knew what is it, well, it is "good morning" in Esperanto. This was indicative that the fine idea of Zamenhof has not been very successful (although we do have in Haifa a Zamenhof Street).

In previous post about "event processing and the Babylon tower" I have discussed the multiple languages, and stated that it does not seem feasible, in the near future, to get agreement about single programming style for event processing. However, this does not say that we cannot advance in this area, and the topic I am raising for discussion in the coming EPTS event processing symposium is about working on a "standard meta-language" as a starting point.
Such meta-language should include the following:

  • Semantics of the meta-language primitives
  • Definition of event flow between the primitives (the event processing network)
  • Definition of non-functional properties for this certain application

The meta-language will serve as the standard basis, and vendors will be able to provide translators to their own languages and implementations, and one of the descendants of this meta-language may one day become THE EPL language... But -- we have still have to get there. As said, in the coming EPTS meeting I intend to provide a call for a community effort to work on this topic. Stay tuned.

Friday, September 14, 2007

In-Line vs. Observations

I am back in business after spilling water on my laptop (total-loss)... Fortunately, the hard-disk was not damaged, and I after getting the excellent IBM support to work in holiday, I have (another) laptop, not recommended to spill - water, coffee or anything else on computers...

Anyway, last post I have discussed RTE vs. BAM, and succeeded to confuse some people, so it needs some clarification. First - as said before, event processing is a collection of functions, that derive, detect pattern, enrich, transform, and some other functions that take events as input, processes the events, and produces (possibly other) events as output.

The distinction is - where, relative to running transactions, the event processing is done. In-Line event processing means that the event processing is done inside a transaction, not observing the transaction from the outside. In this case, the event creation may be part of the transaction (e.g. an application emits event, but the result actions belong to the same transaction), or a starting-point for a transaction. To give a concrete example (repeating the same example from the previous post) - some application emits an event saying "cancel order" (this by itself may or may not be a result of event processing function), this should be atomic with the execution of the cancellation, if cancellation is not possible, then some compensation is needed in the producer application. Likewise, an event can start a transaction that spans all its descendants and actions, if an action in the consumer cannot execute, or there is some validation issue, the entire thread of the "event processing network" should abort/compensate, this is in contrast to "observation mode" -- observe if there is some anomaly, and if there is - notify somebody. In the In-Line case, the transaction can change its course due to event processing, in the observation case - the impact is indirect. Of course, in some cases we have combinations of all. There are two issues that should be discussed: what is the relationship between "observation" mode of event processing to what the market knows as BAM ? another question --- what is the semantic meaning of rollback or compensation when talking about events? more discussion on each of these issues - later. Happy New Year.

Thursday, September 13, 2007

Event processing and transactions - real, real time and real time enterprise

One of the common believes about "event driven architectures" are that they are loosely coupled in the sense that the producer, processing and consumer are all distinct, and there are no dependency among them. This is true in some type of applications, but in this short article I would like to discuss another type of event processing applications that turns to become significant, I am using the TLA RTE (Real-Time Enterprise) to classify these type of applications. Since sensitivity to "correct terminology" is one of my known eccentricities, I would say a few words about the "real time" issue, before continuing. The term "real-time" in the "correct" sense denotes - action that has to be completed by a time constraint, types of real-time systems (hard or soft) are determined according to the damage inflicted if the time constraint is not satisfied. However, in the marketing slang, real-time is interpreted as "as fast as possible", which of course is different, and does not necessarily have time constraints, but best effort to act fast. The term "Real Time Enterprise" belongs to the latter interpretation, the use of the term "Real Time" here is loose, and typically says that an enterprise acts to changing conditions fast and not necessarily deterministic, sometimes "real time" is being used to denote "online" to contrast with "batch". One of the properties of RTE is that event processing is done within transactions, as noted before, this may be contrary to some beliefs, however it is very common misconception to look at one type of event processing applications and think that it represents the whole spectrum. Why do we need event processing as part of transactions ? -- take a simple example --- there is a transaction that processes some order, and there is an event that cancels the order, this can be a direct cancellation event, or a derived event, derived from some other event that relates to the customer making the order. The order itself is an atomic transaction (may be a long duration transaction), the event effects the transaction (e.g. cause "rollback" or "compensate") and the transaction effect the event (e.g. if the transaction is in the phase that the order cannot be cancelled, it may cause the event processing part to compensate). Since the event processing part typically does not manage transactions, and the transaction is typically larger than the event processing part, RTE also requires some level of integration between the EP part and the transaction management middleware.
RTE applications, in the event processing context, are thus defined as those applications that impact running transaction directly. Other types of applications that use event processing in observation mode (e.g. BAM) are typically looking at running transaction from the outside, while RTE applications are indeed part of the transaction processing. More on BAM applications and other types - later.

Wednesday, September 12, 2007

Event processing - hard-coded ? specialized software ? generic software ?

Some people what's new about event processing? have not we done event processing for ages ?
so what is the new enunciation here ? it is true that we had done event processing for long time, sometimes in "exception" mode, but it has been done in a hard-coded way embedded in regular programming. I guess that still today if we'll take most of the implemented functionality that falls under the category of event processing, we'll find that they are hard-coded within applications. The new thing is the existence of generic software. When should we used generic software ? if all what is needed in event processing sums up in one or two functions - it is probably not cost-effective to purchase, install, learn and develop in a generic tool. It starts to be cost-effective if one of these conditions is satisfied:
  • The "event processing" functions required in the application quantity is at least medium.
  • The complexity of these functions is not trivial, there is a benefit of programming it in high-level language rather than lower-level language
  • The event processing is not internal to a single application - thus there is a connectivity issue (e.g. pull events, using adapters to re-format the event, listen to events, publish events etc..) . Again - hard-coding all of this may not be cost-effective.
  • Agility requirement - frequent changes. Again - easier when there are higher-level abstractions.
  • Need to enable control on the behavior by business users without involving programmers - this is a mountain we have not conquered yet, but conceptually will be easier with generic tool.

The analog, in a way, is generic DBMS with all its utilities vs. implementing your own database using file systems. There are certainly cases in which it is worthwhile to use file systems, and not use the abstraction layer that DBMS provides, but in most cases the TCO (total cost of ownership) is substantially lower, when the level of abstraction goes up. Note that the main cost saving is maintenance time and not in development time, especially if agility is a consideration.

There is another variation between the "generic" software and hard-coding which is a specialized function. It still provide high-level language, but the function is limited to a specific application / application types. Should we use generic software for all such cases ? or are we better in using specialized software, which raises an interesting question - should we strive for the "one side fits all' ?

More of this discussion will continue later

Tuesday, September 11, 2007

Event Processing - A paradigm shift ?

In the Dagstuhl seminar we held earlier this year Roy Schulte has raised the question: "Will Event Processing (EDA) become a paradigm shift in the next few years or not?”. If my memory does not mislead me we have discussed it in one of the evenings in the wine cellar, that's the reason that I don't remember much from that discussion. Anyway, thinking back about this issue and looking at a recent article claiming that "complex event processing is still on the launch pad"
we can realize that the paradigm shift has not happened yet (maybe it has happened for us which live and breeze this topic, but not in a large scale anyway). Looking at one paradigm shift that has succeeded - relational databases - we can analyze some of the reasons for that success :
  • We need an underlying theory in two areas one - semantic (like relational algebra) and the second engineering-oriented theory (like query optimization) that is built on top of the first one;
  • We need vendors that understand the theory to generate products that implement it
  • We need to be able to explain the developers community (with the various types of developers - a topic for another article) how to use it --- a good textbook like Date's book about relational database is a good step, but development of use patterns, methodologies etc.. will be helpful too...
  • Standards (topic for another forthcoming article) are complementary --- but relational databases have been paradigm shift before the SQL standard was published.

The state of the practice in event processing is similar to the database area in the pre-relational time, there is several approaches, all of them grown up from implementations. This does not at all undermine the importance of the first generation of event processing products, without more experience in the field, we cannot get anywhere...

While the various vendors continue to incrementally advance the products, Some of the effort (perhaps a community effort) should go now towards bridging the gap between the first generation and the "paradigm shift"...

Bottom line: Event processing has a big potential to make the paradigm shift that Roy Shculte is talking about, making event processing a major paradigm in enterprise computing. It can happen, it should happen, and we should make it happen - but there are mountains to climb and oceans to cross.

More on the challenges and obstacles as well as the futuristic vision - later

Monday, September 10, 2007

If SQL extensions are the answer then what is the question ?

Surprisingly, I realized that some people really read the blog, so I'll try from time to time to touch controversial topics to get it more spicy... The title is, of course, inspired by the title of a famous paper http://portal.acm.org/citation.cfm?id=4583, and the question is indeed - does SQL fit "event processing", or which types of event processing applications does SQL style of programming fit, if any ? There were some discussions on this area, some of them with passion of religious nature (well - this has also been true for Prolog at that time). Some arguments for it are: event processing applications require the use of database anyway, thus, the developer will need to use SQL anyway, thus, it will be easier of the developer to use only single type of programming... Well - this argument can go in the opposite direction -- let's assume that we decide that for event processing there is a more natural style of programming, we can use this style and generate SQL "under the cover" to communicate with database. Another argument is that SQL is declarative, however, contrary to the popular belief among database people (and I am originally part of this community), SQL is not the only declarative language in the universe. Another claim is that there is a lot of prior knowledge about query optimization in SQL, this is true, but much of it is unhelpful for the EP case anyway.

If the idea (as most vendors aspire) is to have a most general event processing language, there are some cases, in which I find mismatch between SQL type of thinking and EP thinking, let me point out some of them: first -- SQL is set-oriented, which means in case of "join" it conceptually starts from the Cartesian product, and then creates subsets by the select and project operators. In event processing some of the applications are set-oriented (e.g. finding trends in time-series) but many of them are "event at a time", where for each individual event, there is a check if some pattern is matched. While, it is possible (sometimes with difficulties) to express pattern matching in SQL, it is not a natural way to think about it, especially Second - SQL lacks abstractions that allow to fine tune the semantics. In the past I have presented a relatively simple example on the Yahoo CEP-Interest group, and have been shown SQL solutions that can solve it, but with a price of highly complex queries. Anybody interested in the details can read the example in: http://tech.groups.yahoo.com/group/CEP-Interest/message/678 there are some follow-up actions that have shown how it is done in SQL, and you can get your own impression.

However, since event processing is not a monolithic area, there may be a place for specific cases, which do not intend to provide a general language, is there a benefit to use SQL in such specific cases ?
This goes into the issue of relationships between databases and event processing which deserves more attention and will be a topic of one of the next postings on this blog.


More - Later

Sunday, September 9, 2007

EP, CEP, ESP, DSP, SEP, MEP, BPEP and more

Still in the Babylon tower, there is some discussion about terms, and thus the importance of the consensus glossary, that will hopefully be finalized soon. Today I'll discuss some of the *-EP terms. Two years ago I suggested to use a common name "Event Processing", for the discipline we are dealing it, this was, first, a compromise between two competing names "event stream processing" that Mark Palmer from Progress tried to promote, and the "complex event processing" name that David Luckham coined, and was supported by some. The first point of confusion was that Mark Palmer did not mean what in the academia was defined as "data stream processing", if we look at Apama, this is clearly not the same concept. The rationale behind the name "event processing" is that nobody had strong objections to it, and also typically names of areas consist of two words and not three (example: information retrieval, data management, image processing, autonomic computing and more...), it seems that there is now an agreement that event processing is the name of the entire discipline, so how the other names are positioned against it? Roy Shculte had a presentation saying that EP = {SEP + MEP + CEP + BPEP [which he omitted in some later presentations]}, this is a good starting direction.
According to this distinction - SEP (simple event processing) deals with filtering and routing (e.g. pub/sub) and is the most pervasive means of event processing, MEP (Mediated event processing) enables transformation (translation, aggregation, splitting), validation and enrichment of events. CEP (Complex Event Process) derive events based on detected patterns in the event history. BPEP is really making the consumer and producer - ready for event processing, by instrumentation on one side, and orchestration on the other side, thus, while it is part of the architecture, it is somewhat different creature.

There is, however, some potential overlap between the different levels -- i.e. aggregation can be either MEP or CEP. One possible border line is "stateless" (MEP) vs. "statefull" (CEP), but this will limit the concept of MEP, another possible border is that MEP may have a state, but in this state raw events are not preserved and only accumulative information is preserved.

And what about DSP, ESP etc.. -- some people in the community makes the distinction that CEP processes "posets" (partial order sets), while ESP processes sequences (totally ordered set). Since sequence is a special case of poset, one can claim that if poset is supported than also sequence is supported. In some applications the order is important, and we can indeed look at them as a class of event processing applications, but there are other classes of applications with characteristics - examples: transactional applications, applications that support uncertain events, applications that require retrospective processing, applications that do not require recoverability and more - thus, the "total order" is just one of many sub-types of event processing. Furthermore, in the same application, some patterns may require total order, while others don't require it.

A common misconception about "complex event processing" stems for the possible ambiguity in parsing this expression. While some people interpret it as "complex processing of events", the meaning is "processing of complex events"; this processing can be quite simple, indeed...

Interestingly - we see that the term "stream processing" which came from the academia. did not survive as a marketing term. Looking at the homepage of "streambase" I find the terms CEP and complex event processing all over ( the term "stream processing" has disappeared), Coral8 also have "complex event processing" in their title, and so is Apama - that used "event stream processing" in the past. Apama is now talking about "event processing platform" and "complex event processing language", which seems to be consistent with what's written here.

Bottom line: It seems that Event Processing is catching as the name of the area (or end-2-end game), Complex Event Processing as the name of the part that does processing of multiple events (detect patterns and derive events), and stream processing - is very much alive in the academic community, but did not survive in the market.

More - about the relations of event processing to databases -- later ...