Enabling Knowledge-Based Complex Event
                  Processing

                                 Kia Teymourian

                         Supervisor: Prof. Adrian Paschke
                    Freie Universitaet Berlin, Berlin, Germany
                       {kia, paschke}@inf.fu-berlin.de


      Abstract. Usage of background knowledge about events and their re-
      lations to other concepts in the application domain can improve the ex-
      pressiveness and flexibility of complex event processing systems. Huge
      amounts of domain background knowledge stored in external knowl-
      edge bases can be used in combination with event processing in or-
      der to achieve more knowledgeable complex event processing. In this
      dissertation, I address the challenges of adding formalized vocabular-
      ies/ontologies and declarative rules to the area of event processing for
      enabling more intelligent event processors which can understand the se-
      mantics of events.


1   Motivation
In many business organizations some of the important complex events cannot be
used in process management, because they are not detected from the workflows
and decision makers can not be informed about them. Detection of events is one
of the critical factors for the event-driven systems and business process manage-
ment. Because of current successes in business process management (BPM) and
enterprise application integration (EAI), many organizations know a lot about
their own activities, but this huge amount of event information can not be used
in the decision making process. The permanent stream of low level events in
business organizations needs an intelligent real-time event processor. The detec-
tion of occurrence of complex events in the organization can be used to optimize
the management of business processes.
    Semantic models of events can improve event processing quality by using
event meta-data in combination with ontologies and rules (knowledge bases).
The combination of event processing and knowledge representation can lead to
novel semantic-rich event processing engines. These intelligent event processing
engines can understand what is happening in terms of events, can (process)
state and know what reactions and processes it can invoke, and furthermore
what new events it can signal. The identification of critical events and situa-
tions requires processing vast amounts of data and metadata within and outside
the systems. Knowledge about event types and their hierarchies i.e. specializa-
tion, generalization, or other forms of relations between events can be useful.
Semantic (meta) models of events can improve the quality of event processing
by using event metadata in combination with ontologies and rules (knowledge
bases). Event knowledge bases can represent complex event data models which
link to existing semantic domain knowledge such as domain vocabularies / on-
tologies and existing domain data. Semantic inference is used to infer relations
between events such as e.g. transitivity or equality between event types and their
properties. Temporal and spatial reasoning on events can be done based on their
data properties, e.g. a time ontology describing temporal quantities.
    The usage of background knowledge in event processing can have several use
cases such as: e-health, business activity monitoring, fraud detection, etc.
    Use Case - High Level Stock Market Monitoring: Companies have
some business dependencies to each other, e.g., a company C1 produces raw
material M1 , the business of another company C2 depends on this raw material
for its production and might have big troubles if they can not supply the material.
A third company C3 financed the company C2 and might have some financial
problems if the company C2 have some material troubles. Let’s consider that
Mr. Smith is a stock broker and has access to a stock exchange event stream
like: (N ame, “V OW 00 ), (P rice, 20.24), (V olume, 8, 835) Mr. Smith might be interested
in this dependency chain and can define a complex event detection pattern for
this special complex event without even knowing what these companies are. He
might be interested to know when the prices for these three companies have
started falling.
    Mr. Smith might also be interested in special kinds of stocks and would
like to be informed if there are some interesting stocks available for sale. His
special interest or his special stock handling strategy can be described in high
level language which describes the interest using background knowledge about
companies. Mr. Smith would like to start a query on the event stream similar to
the following query: Buy Stocks of Companies, Who have production facilities
in Europe and produce products from Iron and have more than 10,000 employees
and are at the moment in reconstruction phase and their price/volume increased
stable in the past 5 minutes.
    As we can see, the above query cannot be processed without having back-
ground knowledge which can define the concepts in this query. Mr. Smith needs
an intelligent system which can use background knowledge about companies.
A background knowledge like the following should be integrated and processed
together with the event data stream in a real-time manner so that interesting
complex events can be timely detected.
{ (OPEL, belongsTO, GM),(OPEL, isA, automobilCompany),
  (automobilCompany, produce, Cars), (Cars, areFrom, Iron),
  (OPEL, hatProductionFacilitiesIn, Germany), (Germany, isIn, Europe),
  (OPEL, isA, MajorCorporation), (MajorCorporation, have, over10,000employees),
  (OPEL, isIn, reconstructionPhase), ... }


2    Research Problem
The existing event processing approaches are dealing primarily with the syntac-
tical processing of low-level signals, constructive event database views, streams,
and primitive actions. They provide only inadequate expressiveness to describe
the ontological semantics of events, actions, agents, states, processes, tempo-
ral/spatial concepts and other related concepts. They also do not provide ade-
quate description methods for the complex decisions, behavioral logics including
expressive situations, pre- and post-conditions, complex transactional (re-) ac-
tions, and work-flow like executions. All of these are needed to declaratively
represent many real-world domain problems on a higher level of abstraction. My
dissertation will address the following two main problems of the existing event
processing approaches:
    1. Lacking Knowledge Representation Methods: Event processing needs
a knowledge (metadata) representation methodology. The current event process-
ing systems do not provide any knowledge representation methods for events,
and there is no precise logical semantics about events and other related con-
cepts. There is a need for methods which can include ontological semantics of
all related concepts to the event processing without affecting the scalability and
real-time processing. A formal specification can build a stable foundation which
is needed for any describing and reasoning about a system. It is also needed
for comparing different systems without misunderstandings. Event processing
needs as its basis a formalization and specification which can describe events,
event patterns, situations, pre- and post-conditions, (re-) actions etc. Definition
of events by logic is not addressed in current complex event processing solutions.
    In this dissertation, I will address the challenge of knowledge representation
for complex event processing (CEP) which integrates the domain and applica-
tion specific ontologies for events, complex events, situations, actions and other
concepts related to CEP.
    2. Limited Processing and Integration Method of Background Knowl-
edge with Event Stream: The processing approach of current event processing
engines often rely on processing of simple event signals. They do not implement
any usage of metadata about events or other related concepts from the appli-
cation domain. The existing on-the-fly in-memory processing methods do not
address the challenges of integration of background knowledge and semantic en-
richment of events or event queries (complex event definitions/patterns). In this
dissertation, I will address the nature of the trade-off real-time high-performance
processing of events and expressiveness reasoning on background information.
The advantages and disadvantages of alternative processing methods for the fu-
sion of event stream and background knowledge should be investigated which
can be used without effecting the real-time processing or scalability.
3   Fusion of Events and Background Knowledge
The fusion of background knowledge with data from an event stream can help
the event processing engine to know more about incoming events and their re-
lationships to other related concepts. I propose to use a Knowledge Base (KB)
which can provide background knowledge (conceptual and assertional, T-Box
and A-Box of an ontology) about the events and other non-event resources. This
means that events can be detected based on reasoning on their type hierarchy,
temporal/spatial relationships, or their relationship to other objects in the appli-
cation domain. The connections to other relevant concepts and objects means for
example the relationship of a stock market event (price change) to the products
or services of a company.
    The benefits of using background knowledge in CEP are higher expressiveness
and flexibility. Expressiveness means that an event processing system can pre-
cisely express CEP patterns and reactions. Flexibility means that a CEP system
is able to integrate new business changes into the systems in a fraction of time
rather than changing the whole event processing rules. Furthermore, complex
event processing can benefit from the knowledge representation and semantic
web technologies, because a central problem of event processing is information
integration for which these technologies have already been proven to be a valid
solution.
    I propose to use external KBs for the storage and reasoning on background
knowledge. The background knowledge about events and other non-event con-
cepts/objects is described in description logic. The knowledge in the KB can be
stored in the Resource Description Framework (RDF) data format1 in an exter-
nal triple store (special kind of databases for storage and management of RDF
data). This knowledge can be queried from the event processing agents based on
the demands of the event query rules. The external KB also includes a descrip-
tion logic to reason on the relations between events and other relevant non-event
objects in the application domain. The KB can be queried by using SPARQL2
queries and the results are then included in the event processing engine.

4     Knowledge Representation for CEP
Ontologies play an important key role in the knowledge-based CEP. They should
be the conceptualization of the application domain to allow reasoning on events
and other non-event concepts. I propose that event processing domain should
be described by a modular and layered ontology model which can be reused
in different application areas. Important general concepts such as event, action,
situation, space/place, time, agent and process should be defined based on meta-
models and pluggable ontologies which are in a modularized ontological top-level
structure. These general concepts defined in the top-level ontologies can be fur-
ther specialized with existing domain ontologies and ontologies for generic tasks
and activities. The applications ontologies for specialize these domain and task
concepts with respect to a specific application, often on a more technical plat-
form specific level.
Event Query Rules: Event query rules (Complex event Patterns) can be con-
siderd as declarative rules which are used to detect complex events from streams
of raw events. These event queries have a hybrid semantic, because they use
event operation algebra to detect events and they use SPARQL queries to in-
clude background knowledge about these events and their relationships.
    The event query rules allow simple event algebra operations, similar to Snoop
[6] i.e. event operations like Sequence (Ordered), Disjunction (Or), Xor (Mu-
tually Exclusive), Conjunction (And), Concurrent (Parallel), Any, Aperiodic,
1
    http://www.w3.org/RDF/
2
    SPARQL http://www.w3.org/TR/rdf-sparql-query/
Periodic, Operator (generic Operator). Futher higher interval-based event op-
erations like (BEFORE, MEETS, OVERLAP, . . . ) can also be used. My event
query rules also include SPARQL query predicate to query external KBs, the
SPARQL queries are used in a rule in combination with event operation alge-
bra. This hybrid use of SPARQL query with event operation algebra can be
categorized into several categories.
4.1   Categorize of Event Query Rules
Event query rules can be categorized into several categories based on the usage
of knowledge queries (SPARQL queries) inside the query rule. As previously de-
scribed, the semantics of the whole event query is a hybrid semantic of description
logic and event operation algebra which defines the semantics of event detection.
In this section we describe the most important and interesting categories of event
sQuery rules. This categorization is not a complete classification of all possible
rule combinations, our aim is more to emphasize interesting rule combinations
which can be processed using different event processing approaches.
Category A - Single SPARQL Query: In this category, the event query rule
includes only one single knowledge query and uses its results in one or more
variables within the event detection rule. A SPARQL query is used to import
knowledge about event instances or types. One or more attributes of events are
used to build the basic triple pattern inside the SPARQL query. Category A
event sQuery rules can be categorized into three subcategories:
Category A1 - Raw SPARQL: This category of sQuery rule is the simplest
form of these event query rule. The included SPARQL query is only about the
resources in the background knowledge. The background knowledge query is in-
dependent from the event stream, however the complex event detection is defined
on the results of this query in combination with the event stream. In some cases,
on each event the SPARQL query should be resent to the KB to update the
latest results from the KB.
Category A2 - Generated SPARQL: In this category of sQuery rules with
each incoming event a different SPARQL query is generated and sent to the
target knowledge base. The attribute/values of an event instance are used to
generate basic triple patterns of a SPARQL query. Based on user definitions
some of the tuples (attribute, value) of an event instance are selected and used
to generate a single SPARQL Query.
Category A3 - Generated SPARQL from Multiple Events: The query is
similar to A2, but the SPARQL query is generated from multiple events. Within
a data window (e.g., a sliding time window) from two or more events a single
SPARQL query is generated. Multiple events are used to generate the single
SPARQL query, the event processing waits for receiving some new events and
then generate a SPARQL query based on the emitted events, and query for the
background knowledge about them.
Category B - Several SPARQL Queries: Queries of this category include
several SPARQL queries and combine them with event detection rules. This
means that several A category rules are combined together which can build a
category B. The category B of rules are able to combine results from KBs with
events using event operation algebra.
Category B1 - Several SPARQL Queries in AND, OR and SEQ Op-
erations: The category B1 is based on the category B, but the results from the
SPARQL query predicates are combined with AND, OR, SEQ or similar event
algebra operations. The whole query is evaluated on sliding windows of event
streams. The SPARQL query predicates are not depending on each other, i.e.,
the results from one is not used in another SPARQL predicate, so that they are
not depending on the results of the other SPARQL query.
Category B2 - Chaining SPARQL Queries: In category B2 several SPARQL
queries are generated and executed in sequence. They can be generated based
on the results of the previous SPARQL query. Each SPARQL query can be gen-
erated from a set of events (e.g., included in a slide of event stream by means
of a sliding window, a counting or timing window). This means that different
data windows can be defined to wait until some events happened and then a
SPARQL query is executed. SPARQL queries might be defined in a sequence
chain. The results are directly used for event processing or used in another fol-
lowing SPARQL query.
Category B3 - Chained and Combined SPARQL Queries: In this cate-
gory SPARQL queriesVare used   W in combination
                                         L        with all possible event algebra
operations like, AND , OR         , SEQ , Negation ¬ , etc. The event opera-
tions are used for combining the results from several SPARQL queries or several
SPARQL L   queries areVused in combination
                                W            with event algebra operations like:
((sparql1 sparql2 ) sparql3 ¬sparql4 )
    This category of event query rules is the general form of queries and has the
highest possible complexity, because the results from external KBs are used in
combination with event operations or the attribute/values from incoming events
are used for generation of complex SPARQL queries.
5   Integration and Processing of Event Stream with KB
Based on the above discussed categories of event query rules, different event pro-
cessing approaches are possible to satisfy the requirements of event processing
agent (EPA), e.g., high performance, scalability and elasticity. In the following
different processing approaches are disscused:
Polling the Knowledge Base: The basic approach is to execute a query on
the KB on each incoming event. After events are emitted and received in EPA,
the EPA sends one or more queries to KB for every event. The problems of this
approach, scalability and real-time processing, makes it impossible to use it for
time-sensitive use cases like algorithmic trading or fraud-detection systems.
Knowledge Query First (KQF): For the processing of some rule categories,
it is possible to execute the SPARQL query in advance and offline to the live
event stream, i.e. executtion of SPARQL query, before the events are emitted to
the system. The results of knowledge queries can be cached in the main mem-
ory and be processed together with the events. Nevertheless, the results of the
knowledge query can be old results from the knowledge base, hence they should
be updated from time to time, e.g., by executing the whole query or pushing the
result differences to the event processing agent.
Plan-Based Processing (PBP) This approach is about processing of event
query based on an optimal plan for its sub-queries to avoid any unnecessary
costs or losses of time. Some rule categories like category B1 rules, have sev-
eral SPARQL queries which use multistep knowledge acquisition from external
KBs. These SPARQL queries are combined in AND, OR, SEQ or similar event
operations and the whole query should be evaluated in a time window. This
makes it possible that the SPARQL queries can be executed in a sequence one
after another or in a parallel setting. An execution plan can be generated to find
out which execution plan is the low cost plan and which execution plan can be
considered as high performance execution plan.
Event Query Preprocessing (EQPP) Event Query Preprocessing (EQPP)
means that the complex query is preprocessed before the query is executed
against the incoming stream event data. The original complex event query can
be preprocessed by use of a KB and rewritten into several simple new queries.
The original complex event query Qa is preprocessed under the usage of a KB
and divided into a set of simple event queries like {q1 , ...qn }. A simple query is
here a query which can be processed only with the information from the event
stream and there is no need for using background knowledge. In the next step,
these new queries can be syntactically processed on a network of event process-
ing agents. The complex query Qa can be considered as a propositional formula
which can be converted to conjunctive normal form (CNF) Qa ← q1 ∧ ... ∧ qn , i.e.
if all of the simple queries are given, then the complex event query is satisfied.
The preprocessing is done by a processing agent which can access the KB and
divide the complex query into several simple queries. The complex query Qa
can also be mapped in disjunctive normal form (DNF) Qa ← q1 ∨ ... ∨ qn , i.e.
when one of the simple queries is triggered, then the complex event query will
be satisfied and triggered.

6     Related Work
The state of the art approaches for event processing can be distinguished into
two categories, rule-based approaches and non-rule-based approaches. Some of
the event processing systems use non-deterministic finite state automata like
Cayuga[4] or ESPER3 . Many event processing languages have been proposed
like, Snoop [6], Cayuga Event Language (CEL)[4], XChangeEQ [5]. Also sev-
eral data stream processing systems have been proposed like Telegraph[8] which
are targeted at handling continuous queries over high-throughput data streams.
These systems are also related to the event processing systems[7].
    Some stream reasoning languages and processing approaches are also pro-
posed. Barbieri et.al. propose Continuous SPARQL (C-SPARQL) [3] as a lan-
guage for continuous query processing and Stream Reasoning. Stream reason-
ing approaches for reasoning on RDF stream are not designed for fusion of
background KBs and event stream. One of the recent rule-based systems is
ETALIS [2]. ETALIS is a rule-based stream reasoning and complex event pro-
cessing (CEP). ETALIS is implemented in Prolog and uses Prolog inference
3
    Esper: http://esper.codehaus.org , May 2012
engine for event processing. EP-SPARQL [1] is a language for complex events
and stream reasoning. The formal semantics of EP-SPARQL is along the same
lines as SPARQL. EP-SPARQL can be used in ETALIS for reasoning on RDF
triple stream (event stream can be mapped to RDF stream). I have discussed
CEP approaches which are most related for our knowledge-based CEP.

7    Future Work
My future steps are to work on more details of knowledge representation for
events, situations, actions, and other related concepts. One of my tasks is to
work on details of event query preprocessing algorithms for rewriting of complex
event queries to several simple queries which can be distributed over an event
processing network. Furthermore, I have to work on the described plan-based
approach and specification of heuristics which can be used for selection of the
optimized processing plan for a given query.
References
1. Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. Ep-sparql: a
   unified language for event processing and stream reasoning. In Proceedings of the
   20th international conference on World wide web, WWW ’11, pages 635–644, New
   York, NY, USA, 2011. ACM.
2. Darko Anicic, Paul Fodor, Sebastian Rudolph, Roland Stühmer, Nenad Stojanovic,
   and Rudi Studer. Etalis: Rule-based reasoning in event processing. In Sven Helmer,
   Alexandra Poulovassilis, and Fatos Xhafa, editors, Reasoning in Event-Based Dis-
   tributed Systems, volume 347 of Studies in Computational Intelligence, pages 99–124.
   Springer Berlin / Heidelberg, 2011.
3. Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, and Michael Grossniklaus.
   An execution environment for c-sparql queries. In Proceedings of the 13th Inter-
   national Conference on Extending Database Technology, EDBT ’10, pages 441–452,
   New York, NY, USA, 2010. ACM.
4. Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher,
   Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. Cayuga:
   a high-performance event processing engine. In SIGMOD ’07: Proceedings of the
   2007 ACM SIGMOD international conference on Management of data, pages 1100–
   1102, New York, NY, USA, 2007. ACM.
5. François Bry and Michael Eckert. Rule-based composite event queries: The language
   xchangeeq and its semantics. In Proceedings of First International Conference on
   Web Reasoning and Rule Systems, Innsbruck, Austria (7th–8th June 2007), volume
   4524 of LNCS, pages 16–30, 2007.
6. S. Chakravarthy and D. Mishra. Snoop: an expressive event specification language
   for active databases. Data Knowl. Eng., 14:1–26, November 1994.
7. Sharma Chakravarthy and Qingchun Jiang. Stream Data Processing: A Quality
   of Service Perspective Modeling, Scheduling, Load Shedding, and Complex Event
   Processing. Springer Publishing Company, Incorporated, 1st edition, 2009.
8. Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin,
   Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred
   Reiss, and Mehul A. Shah. Telegraphcq: continuous dataflow processing. In Pro-
   ceedings of the 2003 ACM SIGMOD international conference on Management of
   data, SIGMOD ’03, pages 668–668, New York, NY, USA, 2003. ACM.