=Paper= {{Paper |id=Vol-2361/short8 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2361/short8.pdf |volume=Vol-2361 |dblpUrl=https://dblp.org/rec/conf/benevol/HubrechtsKRWSRM18 }} ==None== https://ceur-ws.org/Vol-2361/short8.pdf
           Mete: a Meta Rete Interface for Distributed
                     Rule-based Systems
        Maarten Hubrechts∗ , Kennedy Kambona† , Thierry Renaux‡ , Simon Van de Water§ , Mathijs Saey¶ ,
                                 Coen De Rooverk and Wolfgang De Meuter∗∗

                                        Software Languages Lab, Vrije Universiteit Brussel
                                                       Brussels, Belgium

                   Email: ∗ maarten.hubrechts@vub.be, † kennedy.kambona@vub.be, ‡ thierry.renaux@vub.be,
        § simon.van.de.water@vub.be, ¶ mathijs.saey@vub.be, k coen.de.roover@vub.be, ∗∗ wolfgang.de.meuter@vub.be



   Abstract—In Complex Event Processing (CEP), continuous                One way of implementing Complex Event Processing is by
streams of information are analyzed to deduce useful insights         using a rule-based approach [2]. In rule-based CEP systems,
from the incoming data. CEP can be implemented using rule-            declarative rules are used to identify complex events. Declar-
based systems. These are systems which are used to act upon
states in which an object domain can be. It does so by providing      ative rules are combinations of a pattern and a consequent;
facts that represent that object domain and rules that describe       when input data matches the pattern, the consequent should
when certain actions must happen. In order to determine this, a       be executed. In order to determine which rules are matched
rule interpreter is used to find the rules that match the available   by the available facts, a rule interpreter is needed [3].
facts. A widely used example of such an interpreter is the               A widely used example of such an interpreter is the Rete
Rete algorithm. However, since Rete was first introduced in the
seventies, changes to the algorithm are needed to make it relevant    algorithm [4]. Rete is used to efficiently match many facts to
to today’s standards in CEP. One of those changes is the support      many patterns. However, the traditional algorithm was devel-
for deployment in a cloud setting or on a computer cluster to         oped for single-threaded use. This means that the algorithm
enhance the processing performance. Previous research describes       does not specify how to take advantage of the additional
a distributed variant of the Rete algorithm which distributes the     compute power of multi-core computers or computer clusters.
workload across a computer cluster. Inescapably, distributing
the work over different machines introduces additional meta-          This problem was tackled by [5] who adapted the algorithm
concerns such as load balancing and fault tolerance.                  to operate on a machine consisting of a large number of
   In this paper, we claim that the complexity of maintaining         processing elements, each with its own local storage. Another
CEP systems can be reduced by disentangling these meta-               extension based on this by [6], introduced the distributed
concerns from the base application. We introduce a meta Rete          rule-based CEP system CloudParte. Such distributed variants
system called Mete. Mete reasons about, and intercedes in, a
distributed base Rete network. Mete allows programmers to add         of Rete made it possible to scale up rule processing by
and modify meta-concerns such as auditing, fault tolerance, load      distributing the work over different machines.
balancing, or logging, without requiring any modification to the         However, a distributed variant of the Rete algorithm comes
base application code.                                                with a new set of challenges. Examples of such challenges
   Index Terms—CEP, Meta Programming, Rete, Rule-based                are making the algorithm resistant to partial failures of the
Systems
                                                                      computer cluster and implementing a load balancing technique
                                                                      to evenly distribute the work over the different machines.
                       I. I NTRODUCTION
                                                                      These challenges are all meta-concerns. Ideally, implementing
   In today’s world enterprises continuously generate a sub-          and modifying solutions to these challenges should be done in
stantial amount of data. This leads to never ending streams           such a way that the application logic is left unaffected. By dis-
of information. Yet, the true potential of having this data is        entangling the concerns, adaptive and perfective maintenance
only reached when it is analysed; processing data allows en-          [7] could be significantly simplified.
terprises to gain useful insights. This is where Complex Event           So far, there is no abstract and general way to tackle
Processing, or CEP, comes into play. CEP is a method of               the meta-concerns of a distributed rule-based CEP system.
analyzing continuous streams of data and deriving conclusions         While the declarative nature of rule-based systems naturally
based on them [1]. The data itself concerns actions that happen       lends itself to disentangling the meta-concerns from the base
in the scope of the enterprise. These data points are called          application logic, current rule based systems impose a hard
events. A combination of events is called a complex event.            line between the declarative rules and their implementation.
Complex events are typically some situation, opportunity or           While changes to the application logic require rewriting the
threat to the enterprise that must be acted upon. Hence, when         declarative rules, user demands for enhancements and ex-
complex event gets detected, a certain action is executed as a        tensions to the rule-based system require rewriting elements
consequence.                                                          of the rule-based system’s core implementation. In practice
this entails that modifications are implemented by writing               modifications to its core implementation. This is cumbersome
software in the rule-based system’s implementation language.             and error-prone, and discourages maintenance and adaptation.
For far-reaching concerns such as distribution, fault tolerance,            In order to facilitate evolving CloudParte’s code base, we
and load balancing, this requires the programmer to fully                introduce a meta Rete system called Mete. Mete extends a
understand the entire code base of the rule-based system.                CloudParte system by providing a secondary Rete network.
   We propose to instead tackle such meta-concerns using                 In this secondary Rete network, meta-concerns of the base
a meta Rete protocol. Through such a meta Rete protocol,                 Rete network are reified as logical facts. Logical rules in this
programmers can introspect the workings of a rule-based                  secondary Rete network hence reasons about the base Rete
system, and intercede in its workings. Crucially, the meta Rete          network, effectively constituting a meta Rete network. We refer
protocol is exposed to the rule-based language as logical facts          to the rules of the meta Rete network as meta rules. The facts
and actions which can be invoked from the consequent of a                in the meta Rete network are called meta facts.
rule matching.                                                              More concretely, the meta facts describe how the distributed
   We developed a prototypical meta Rete system called Mete.             base Rete network is structured. For example, they define
Mete provides an abstract and general way to make extensions             which nodes are present in the base system and on which ma-
to a distributed CEP system based on the Rete algorithm. The             chine they are located. The meta Rete network is responsible
novelty of Mete is that it introduces a second Rete network,             for spawning the base Rete network, restoring failed nodes,
called the meta Rete network. The meta Rete network reasons              and relocating nodes to balance the load.
about the distributed base Rete network. This enables us to                 When Mete initializes, it first spawns the meta Rete network
write meta rules which can make fundamental changes to the               on a single machine, called the master machine. Then, the
base Rete network in a declarative manner, which reduces                 distributed base Rete program is parsed and compiled into
the burden of maintenance and configuration. Tackling other              meta facts representing it. Next, the meta facts are passed to
meta-concerns, or modifying the existing solutions, can be               the meta Rete network. Since the initial situation is merely an
done without having to touch — nor know the details of —                 exceptional case of a valid configuration — all nodes of the
either the base application logic or the rule-based system’s             base Rete are present but none are scheduled on a machine —
implementation.                                                          the meta Rete can perform its function by spawning the base
   Section II describes the architecture of Mete. In section IV          Rete network on the different machines.
we sketch an evaluation of our prototype. In order to validate              Not all meta rules are devoted to spawning the base Rete
our system, we implement the meta-concern of fault tolerance             network. Some meta rules can, e.g., implement checkpointing
using only declarative meta-rules. To verify that the system             strategies. In general, meta rules provide solutions to any user
is indeed fault tolerant, we run it with and without induced             requirement that might present themselves in the future.
failures, and confirm that the results computed by the base                 Summarized, the system contains two active Rete networks,
Rete are unaffected by partial failures of the computer cluster          shown in fig. 1:
it runs on.                                                                 1) The first network is the meta Rete network which has the
                                                                                distributed base Rete network as its object domain. Meta
                      II. A RCHITECTURE                                         rules allow to describe situations that the distributed base
                                                                                Rete network can be in. These rules make it possible to
   The Rete algorithm was initially designed for single-                        influence and extend the behavior of that base network.
threaded use. However, these days we are dealing with data on                   This means that the base Rete network’s behavior can
too large a scale too tackle that way. Much more processing                     be adapted according to new requirements, which means
power can be brought to bear by running workloads on                            that the algorithm can evolve without knowing a priori
multiple cores or even multiple machines. This is why variants                  what these requirements are.
of Rete were introduced that can be parallelized [8]–[12]                   2) The second network is the base Rete network. The
or distributed across a computer cluster [5], [6]. All those                    base Rete network reasons about a specific application
systems must take care of the additional challenges that present                domain, e.g., a supermarket’s customers and their pur-
themselves in a parallel or distributed context. For instance, the              chases. The base Rete network can be distributed across
algorithm must be made resource-aware. For distributed cases,                   multiple physical machines, in which case we refer to it
the algorithm must be made resistant against partial failures                   as the distributed base Rete network.
of the hardware.
   Our work expands on the distributed Rete system Cloud-                                III. A M ETA R ETE I NTERFACE
Parte [6]. CloudParte splits up a Rete network and spreads                  The meta Rete network implements a specific meta Rete in-
the Rete nodes among multiple machines that are part of the              terface. This interface specifies which constructs are available
computer cluster CloudParte runs on. CloudParte provides the             to user-defined meta rules. User-defined meta rules provide a
means for asynchronously exchanging messages between the                 way for application developers to extend and configure the
Rete nodes. However, CloudParte’s solution to the challenges             CEP system. They reason about meta facts, i.e., facts that
of a distributed setting are hard-coded. Extending its capa-             represent the state and structure of the distributed base Rete
bilities, or changing the trade-offs that were made, requires            network.



                                                                     2
                                                                          1   Architecture
                                                                          2   [Node (id: NodeId)]
                                                                          3   [AlphaNode (id: NodeId)]
                                                                          4   [BetaNode (id: NodeId)]
                                                                          5   [NodeF (node_id: NodeID, f: Fct]
                                                                          6   [NodeInputs (node_id: NodeId, sides: Sides)]]
                                                                          7   [Edge (from_id: FromNodeId, to_id: ToNodeId,
                                                                          8          side: Side)]
                                                                          9   [NodePid (node_id: NodeID, node_pid: NodePid)]
                                                                      10      [ManagerPid (manager_id: ManagerId,
                                                                      11                   manager_pid: ManagerPid)]
                                                                      12      [Location (node_id: NodeId, manager_id: ManagerID)]
                                                                      13
                                                                      14      Timing
                                                                      15      [TimerElapsed (timer_id: TimerId)]
                                                                      16
                                                                      17      Info
      Fig. 1. Schematic representation of the architecture of Mete.   18      [LoadCheck (id: NodeId, output_log_size: OutputLogSize)]
                                                                      19
                                                                      20      Snapshotting
                                                                      21      [Snapshot (id: NodeId, timestamp: Timestamp,
                                                                      22                 sides: Sides, output_log: OutputLog)]
                                                                      23      [AgendaSnapshot (timestamp: Timestamp,
   The meta interface consists of two parts. First, there are the     24                       activations: Activations)]
meta facts produced by the rule engine. These can be matched          25
                                                                      26      Wildcard
as logical facts by the user-defined meta rules. Second, there        27      [_ ()]
are the meta facts that are understood by the rule engine.                    Listing 1. Patterns of meta facts that can be matched in the patterns of user-
These can be asserted into — or retracted from — the rule                     defined meta rules.
engine’s knowledge base by user-defined meta rules. The                      The meta facts by which the rule engine’s state can be
former enables the meta-rules to respond to the state of the              modified, are shown in Listing 2. To modify the state, meta
rule engine. The latter enables meta-rules to modify the state            facts matching these patterns can be asserted into the rule
of the rule engine.                                                       engine’s knowledge base — or retracted from the knowledge
                                                                          base. There are three categories of meta facts by which the
   The meta facts by which the state of the rule engine can
                                                                          rule engine’s state can be modified:
be inspected, are shown in Listing 1. They are divided in five
different categories:                                                     Timing meta facts make or remove timers. When a Re-
                                                                               questTimer meta fact gets asserted a timer is created.
                                                                               Whenever the specified time is up, a TimerElapsed meta
Architecture patterns match meta facts which represent the                     fact (described previously) is asserted into the knowledge
    structure of the distributed base Rete network. For ex-                    base.
    ample, line 2 shows a pattern of a Node meta fact.                    Info meta facts serve a similar role, but instead of requesting
    This pattern matches every Rete node that is part of                       timer events, they request state-introspection meta facts
    the distributed Rete network, since each such node is                      from the Info category.
    represented in the knowledge base by a Node meta fact.                Snapshotting meta facts request different kinds of snapshots
    Another example is the pattern describing Edge meta                        to be made. Line 8 shows the meta fact RequestSnapshot
    facts, shown on lines 7 and 8. This pattern matches the                    that when asserted, fires a built-in meta rule that will
    meta facts of all edges that connects two Rete nodes of                    assert a Snapshot meta fact (described previously). Line 9
    the distributed base Rete network.                                         shows the RequestMetaSnapshot that when asserted, fires
Timing patterns match TimerElapsed meta facts. The pres-                       built-in meta rules that take a snapshot of the meta Rete
    ence of a TimerElapsed meta fact in the knowledge base                     network itself, and stores it to persistent storage.
    indicates that a certain timers threshold has elapsed.
                                                                          1   Timing
Info patterns match meta facts containing additional infor-               2   [RequestTimer (timer_id: TimerId, ms: Ms)]
    mation about a base Rete node that is not related to                  3
                                                                          4   Info
    its structure. A concrete example are LoadCheck meta                  5   [RequestLoadCheck (node_id: NodeId)]
    facts. These communicate the amount of facts that the                 6
                                                                          7   Snapshotting
    corresponding base Rete node forwarded to its succes-                 8   [RequestSnapshot (node_id: NodeId)]
    sor(s). Info meta facts can for example be useful for                 9   [RequestMetaSnapshot ()]
    implementing a load-balancing strategy.                                   Listing 2. Patterns of meta facts that can be asserted in the consequents of
Snapshotting patterns match meta facts that are related to                    user-defined meta rules.
    snapshotting [13] the state of the different Rete networks.
Wildcard patterns are all patterns that are automatically
    matched exactly once, when the system gets started.
    Wildcard patterns can be used to initialize components
    of the meta Rete network, or to initialize the distributed
    base Rete network when the system gets started.



                                                                      3
                                                                                    We introduce Mete, a meta Rete system that reasons about a
                                                                                 base Rete network by means of a set of special rules and facts.
                                                                                 The architecture of Mete contains two active Rete networks.
                                                                                 The distributed base Rete network is reasons about the object
                                                                                 domain of the application via facts and user-defined rules.
                                                                                 Mete then provides a meta Rete network — a non-distributed
                                                                                 network that reasons about the base Rete network using meta
                                                                                 facts and meta rules. The distributed base Rete network is
                                                                                 represented by meta facts, describing its state and structure.
                                                                                 Built-in meta rules are provided that can match these meta
                                                                                 facts. The meta rules describe the extensions to the behavior of
                                                                                 the distributed base Rete network. We validate our approach by
                                                                                 enforcing fault tolerance in a distributed rule-based system. By
                                                                                 introducing Mete, perfective maintenance is improved since
Fig. 2. Throughput (in activations per 5 second interval) with a simulated       it offers an abstract and general way of extending the Rete
failure of entire distributed Rete network.                                      algorithm.
                                                                                                               R EFERENCES
                          IV. E VALUTION                                          [1] David C. Luckham. The Power of Events: An Introduction to Complex
                                                                                      Event Processing in Distributed Enterprise Systems. Addison-Wesley
   In order to validate Mete, we implemented fault tolerance                          Longman Publishing Co., Inc., Boston, MA, USA, 2001.
as a set of user-defined rules, using only the meta Rete API                      [2] Adrian Paschke and Alexander Kozlenkov. Rule-based event processing
                                                                                      and reaction rules. In International Workshop on Rules and Rule Markup
from section III. Together, these meta rules ensure that any                          Languages for the Semantic Web, pages 53–66. Springer, 2009.
node involved in either the distributed base Rete network or                      [3] Frederick Hayes-Roth. Rule-based systems. Communications of the
the meta Rete network can fail without influencing what the                           ACM, 28(9):921–932, 1985.
                                                                                  [4] Charles L. Forgy. Expert systems. chapter Rete: A Fast Algorithm for
rule engine detects. For space constraints and to limit the                           the Many Pattern/Many Object Pattern Match Problem, pages 324–341.
scope of this paper — we introduce a meta Rete interface,                             IEEE Computer Society Press, Los Alamitos, CA, USA, 1990.
not fault tolerance — we do not list the exact set of meta                        [5] Michael A Kelly and Rudolph E Seviora. A multiprocessor architecture
                                                                                      for production system matching. In AAAI, pages 36–41, 1987.
rules by which we implemented fault tolerance. The gist is                        [6] Janwillem Swalens, Thierry Renaux, Lode Hoste, Stefan Marr, and
as follows: each situation in which a fault of a component                            Wolfgang De Meuter. Cloud parte: elastic complex event processing
would lead to a global failure is handled by one or more                              based on mobile actors. In Proceedings of the 2013 workshop on
                                                                                      Programming based on actors, agents, and decentralized control, pages
user-defined rules. These make sure that each node (Node)                             3–12. ACM, 2013.
is regularly (RequestTimer/TimerElapsed) persisted (Request-                      [7] Bennet P Lientz and E Burton Swanson. Problems in application
Snapshot/RequestMetaSnapshot), and respawn the nodes with                             software maintenance. Communications of the ACM, 24(11):763–769,
                                                                                      1981.
that state after a crash.                                                         [8] A. Gupta, C. Forgy, A. Newell, and R. Wedig. Parallel algorithms and
   We conducted an experiment where 5000 facts were sent                              architectures for rule-based systems. SIGARCH Comput. Archit. News,
to the base Rete network, which contains a rule which fires                           14(2):28–37, May 1986.
                                                                                  [9] Mostafa M. Aref and Mohammed A. Tayyib. Lana-match algorithm:
whenever one of those facts arrives. That base rule hence fires                       A parallel version of the rete-match algorithm. Parallel Computing,
exactly 5000 times. After a delay of five seconds, we induce a                        24:763–775, 1998.
failure in all the nodes of the distributed base Rete network. As                [10] Kemal Oflazer. Partitioning in parallel processing of production systems.
                                                                                      1987.
fig. 2 shows, the user-defined meta rules succeed in respawning                  [11] A Gupta, CL Forgy, D Kalp, A Newell, and M Tambe. Parallel ops5 on
the base Rete nodes. After the induced failure, no base rule                          the encore multimax. In Proc. Int’l Conf. on Parallel Processing, pages
fires for a period of fifty seconds. After this period, the rule                      271–280, 1988.
                                                                                 [12] Toru Ishida. Parallel rule firing in production systems. IEEE Transac-
gets fired again until all 5000 activations took place. This                          tions on Knowledge and Data Engineering, 3(1):11–17, 1991.
demonstrates that our system is robust against crashes of the                    [13] E. Gendelman, L. F. Bic, and M. B. Dillencourt. An application-
distributed base Rete network.                                                        transparent, platform-independent approach to rollback-recovery for mo-
                                                                                      bile agent systems. In Proceedings 20th IEEE International Conference
                                                                                      on Distributed Computing Systems, pages 564–571, April 2000.
                          V. C ONCLUSION
   In this paper we propose an abstract and general way of
extending the capabilities of rule-based systems running on the
Rete algorithm in a distributed setting. Previously, extending
the algorithm had to be done by manually adding language-
specific code directly into the implementation. However, this
approach is undesirable because it lacks the advantages that
stem from suitable software maintenance techniques such as
perfective maintenance.



                                                                             4