<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Transformation of Pipeline Stage Algorithms to Event-Driven Code</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michal Brabec</string-name>
          <email>brabec@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Bednárek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petr Malý</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering Faculty of Mathematics and Physics, Charles University Prague</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>1214</volume>
      <fpage>13</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>Many software systems publish their interface using event-driven programming, where the application programmers create routines, called as responses to the events. One of such systems is the Bobox parallel framework, where the elements of a parallel pipeline react to events signalizing the arrival of input data. However, many algorithms are more efficiently described using classical serial approach, where the application code calls system routines to wait for required events. In this paper, we present a method of code transformation producing eventdriven code from serial code and we concentrate mostly on its first part, responsible for the management of variables. Besides a friendlier programming environment, the transformation often improves the structure of the code with respect to compiler optimizations.</p>
      </abstract>
      <kwd-group>
        <kwd>Method Inversion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Many frameworks and libraries present their interface as
event-driven programming [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where the user code reacts
to a set of events produced by the framework at runtime.
      </p>
      <p>Implementing an application in this fashion is not as
difficult in areas like user interface, where most complex
operations are performed by an application kernel and the
interface just presents the results.</p>
      <p>
        This approach is not very practical for complex
algorithms, where the event-driven design can significantly
increase complexity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], making the implementation
difficult to maintain and test. The difference in complexity is
illustrated by a simple merge algorithm, the basic
implementation is in Listing 1 and the event-driven version is in
Listing 2.
      </p>
      <p>The merge algorithm takes two input streams and
merges them into a single output stream. The input
streams provide the data one at a time, the method get ()
provides current value and the method next () obtains next
value. The results are passed on to the output stream using
the put () method.
The merge algorithm, presented in Listing 1 merges two
ordered sets and then it passes the results to the output
stream. The implementation is simple to understand and
test, but there is a hidden event management in the code.
The method next() retrieves new data, and, if there is
nothing in the input buffer, it waits. Here the application is
waiting for an event (the arrival of input data).
w h i l e ( l e f t . h a s _ d a t a ( ) &amp;&amp;</p>
      <p>r i g h t . h a s _ d a t a ( ) )</p>
      <p>
        Listing 1: Basic join implementation for Bobox
An event-driven implementation is usually less
comprehensive and it can be difficult to maintain or verify [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
at least in the case of complex algorithms. The Listing
2 shows an event-driven implementation of merge
algorithm. It is more complex than the simple version
presented in Listing 1 and it contains significant code
duplication. The implementation is further complicated by the
fact that we must check if the current event is the event we
were waiting for.
      </p>
      <p>The event driven implementation uses the method
notify_next () instead of next () . This method only
requests the framework to call the task again, once new data
is available. The task does not wait for the data.
1.2
We propose a special technique called method inversion
that removes waiting for events. Instead the code reacts
to an event and it ends when the event is processed. The
transformation produces an algorithm with a behavior
similar to the code in Listing 2. In this paper we concentrate
mostly on the first part of the optimization responsible for
data management, also called movement of variables. In
the first part, we must identify currently used variables,
preserve their value and later restore them, so the
computation can be safely interrupted and resumed.</p>
      <p>The rest of the paper is organized as follows:
Section 2 describes our motivations for this work along with
the main problems we try to solve, we also present an
overview of our development environment here. In
section 3 we put method inversion in the context of other
works related to Bobox and we discuss similar works
concentrating on parallel pipelines. Section 4 contains the
algorithm that implements method inversion. Section 5
concentrates on the effects the transformation has on the
entire pipeline and it concludes the results of our research;
we also discuss possible ways to improve presented
optimizations and similar future work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <sec id="sec-2-1">
        <title>Parallel Pipelines</title>
        <p>
          Parallel pipelines represent a great way to utilize the full
capacity of contemporary computers [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], because they are
highly scalable and they offer a suitable framework for
many applications, including database operations [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          A parallel pipeline is a set of independent tasks
connected by data streams, where the tasks can communicate
only via passing messages and data through the streams.
The streams pass the output of one task to the next task
which uses it as input. Tasks can be run in parallel, with
each task processing different data [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
        </p>
        <sec id="sec-2-1-1">
          <title>Source (left)</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Source (right)</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Merge</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>Output</title>
          <p>Figure 1 shows a simple pipeline for merging data.
Source tasks produce input, the Merge task merges its two
input streams and the Output task stores the results. The
arrows represent data streams that transport data between
tasks.</p>
          <p>Parallel pipelines can be treated as event-driven
systems, where the main event is the arrival of input data.
Each task processes input data and then waits for new data,
basically reacting to new data as an event.</p>
          <p>Calling a method that waits for an event (new data) is
referred to as a blocking call in this paper, because the
task is blocked until there is new data available.</p>
          <p>
            One of the most important problems of pipelines is the
fact that different tasks may process their data at a
different rate, which may lead to a situation where parts of the
pipeline are engaged in complex computations while
others are waiting for input [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. It is necessary that the
waiting tasks stop their computations and release any system
resources that can be used for other tasks, but doing that
by hand can be difficult (Listing 2).
          </p>
          <p>We propose a way to automatically produce
eventdriven code from serial code without any additional
information, like code annotations, attributes or special
comments. This allows users to develop simple tasks that
can be easily maintained. The compiler removes blocking
calls and transforms the code for the target environment.
We designed this transformation for the Bobox framework,
but it can be used for other environments.
2.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Bobox</title>
        <p>
          In this paper we focus on parallel pipelines developed for
the Bobox framework. The Bobox framework provides
a runtime environment used to execute a general (even
non-linear) pipeline in parallel [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The actual execution of
the pipeline is managed by the framework, which executes
tasks (called boxes) in parallel according to the pipeline
structure defined by the users.
        </p>
        <p>
          There are two ways to use Bobox framework, users can
either define the pipeline by hand or they can use an
interface that produces the pipeline from another code, like
a SPARQL query [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>We developed the method inversion to expose all the
blocking calls (waiting for data) so they can be removed.
This way we can optimize separate tasks and improve the
efficiency of the entire pipeline.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Development Environment</title>
        <p>
          We designed a special development environment to
apply presented optimizations automatically. The
environment should simplify the implementation of the Bobox
boxes and we intend to allow programmers to design entire
Bobox pipelines in the environment [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The structure of
the environment is in Figure 2, where the bold components
are part of the development environment.
        </p>
        <p>The boxes are implemented in the C# programming
language and they are compiled to a CIL assembly. The
optimized CIL code is then transformed to C++ source code
for Bobox framework.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Goals</title>
        <p>We have designed the method inversion to optimize tasks
in parallel pipelines and to simplify their development.
This way we can improve the pipeline efficiency while
making the development simpler at the same time.</p>
        <p>Second goal is to use the method inversion as a general
transformation that can simplify the development of many
applications.</p>
        <p>Last goal is to improve the structure of the Bobox tasks
that might allow the compiler to apply more advanced
optimizations.</p>
        <sec id="sec-2-4-1">
          <title>C# source code</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Bobox C# interface library</title>
          <p>Bobox
application
C#
compiler
Parts of</p>
          <p>ParallaX
development
enironment</p>
          <p>Bobox
framework</p>
          <p>C++
compiler</p>
        </sec>
        <sec id="sec-2-4-3">
          <title>CIL code</title>
          <p>ParallaX
optimizer</p>
        </sec>
        <sec id="sec-2-4-4">
          <title>Optimized</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>CIL code</title>
          <p>ParallaX</p>
          <p>C++
compiler</p>
          <p>C++
source
code</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <sec id="sec-3-1">
        <title>Event-Driven Programming</title>
        <p>
          There are very few works concentrating on the
generation of event-driven programs from serial code. There
are works that discuss event-driven programming, but
they consider mostly applications already implemented as
event based [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. There are few systems that try to
simplify the development of event-driven programs [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], but
they do not fully hide the events from users, instead they
require special annotations or comments.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Parallel Pipeline Optimizations</title>
        <p>
          Optimizations for parallel pipelines concentrate mostly
on task distribution and scheduling according to available
CPUs [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and other system resources [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], because load
balancing has a major impact on the pipeline efficiency.
The solution used in Bobox tasks are distributed among
processors at the beginning of the pipeline execution [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
and the system migrates them very carefully to minimize
communication overhead, which is especially useful in
distributed environments [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          Most parallel pipeline environments leave task
optimization to the users and that is why there are not many
task-specific optimizations. There are development
environments that generate tasks or even pipeline structure
from user code [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and these systems can optimize the
entire pipeline because they can influence its structure. Our
development environment is designed for a similar
purpose [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], but the current implementation concentrates on
separate boxes.
3.3
Bobox optimizations concentrate mostly on the structure
and execution of the entire pipeline, leaving the task
optimization for the users. Most works concentrate on
task scheduling. There are general scheduling
optimizations [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] as well as special strategies based on
dataflow [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] or input structure.
        </p>
        <p>
          Bobox is used mostly for query processing and there are
many works presenting optimizations specific for different
query languages [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], but there are no optimizations for
the tasks produced by users.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Method Inversion</title>
      <p>The inversion transformation is used to expose the parts
where a pipeline task is waiting for input data to arrive.
These so called blocking calls represent a problem,
because it means that the task waits for data without doing
anything useful thus blocking an execution unit (CPU).
This is a problem, even when a task is waiting passively,
because then it is necessary to swap it for another task.</p>
      <p>The inversion requires three main steps. First, we must
back up all the required data so we can safely interrupt
and resume execution. Second, blocking calls must be
replaced by non-blocking methods that only notify the
framework without waiting. Third, it is necessary to
restructure the code so it is possible to safely resume the
execution.</p>
      <p>We present the last two steps in a simplified version, but
the first step is the main focus of this paper because it is
necessary for any implementation of the inversion and it
can be used for other optimizations.</p>
      <p>The inversion is presented on a simple code shown in
Listing 3, the code applies a function f () on all values
in an input stream and it passes the results to an output
stream.
Our ultimate goal is to restructure the code to expose all
the blocking calls so we do not have to jump inside a
complex control flow to resume computations. Such an
algorithm would produce code that can be efficiently
parallelized and further optimized.</p>
      <p>Results of such transformation are presented in
Listing 4, while the original code is in Listing 3. The switch
statement manages the restoration of variables and then it
resumes computation by jumping into the original code.
We use goto to minimize code duplication.
v o i d d o _ s o m e t h i n g ( )
{
s w i t c h ( e v e n t )
{
c a s e INPUT_RECEIVED :</p>
      <p>i c = 0 ; g o t o l o o p ;
c a s e OUTPUT_SENT :</p>
      <p>o c = 0 ; g o t o l o o p ;
}
l o o p :
k = min ( i S i z e −i c , o S i z e −o c ) ;
f o r ( i = 0 ; i &lt;k ; + + i )</p>
      <p>oBuf [ o c + i ] = f ( i B u f [ i c + i ] ) ;
i f ( i c + i &gt;= i S i z e )
{
o c += k ;
r e t u r n WAIT_FOR_INPUT ;
} e l s e
{
i c += k ;
r e t u r n WAIT_FOR_OUTPUT ;
}
}</p>
      <p>Listing 4: Product of the advanced inversion</p>
      <p>This transformation is still under development, but we
implemented a simpler version. It lacks certain advantages
of the advanced version, but it is easier to implement.</p>
      <p>The simple version does not move the blocking calls
outside of the loop, instead it simply jumps inside any
control flow, after restoring all the necessary variables. The
result of this inversion algorithm is in Listing 5, where the
methods used for input and output are not blocking, they
just inform the system to send or receive data when there
are resources available.
4.2</p>
      <sec id="sec-4-1">
        <title>Movement of Variables</title>
        <p>The goal of our transformation is to release the thread
of execution during blocking operations. Releasing the
thread causes loss of the local storage associated with the
s w i t c h ( s t a t e )
{
c a s e WAIT_FOR_INPUT :
i f ( e v e n t ! = i n p u t ) r e t u r n ;
i C u r r = 0 ; i S i z e = s i z e ; g o t o i L o o p ;
c a s e WAIT_FOR_OUTPUT :
i f ( e v e n t ! = o u t p u t ) r e t u r n ;
o C u r r = 0 ; o S i z e = s i z e ; g o t o oLoop ;
}
thread, namely the registers and stack. Therefore, we need
a backup storage for the local variables that are live
during a blocking operation. The backup storage will usually
consist of data members of a dynamically allocated class
object.</p>
        <p>Moving aggregate variables, i.e. arrays and structures,
is expensive or even impossible (due to aliasing); in most
cases, the best strategy is storing aggregate variables in the
backup storage for their complete lifetime.</p>
        <p>Unaliased scalar variables (including dissociated
structures) may be moved easily and their movement between
the backup storage and the local storage offers the
advantage of faster access to the local storage. The faster
access to the local storage is primarily given by the
ability of compilers to allocate registers for important
localstorage objects (i.e. local variables); furthermore, scalar
local-storage variables allow the compilers to apply many
advanced optimization steps while similar optimization
for backup storage members would require difficult
interprocedural analysis of code.</p>
        <p>The movement of unaliased scalar variables between
the backup storage and the local storage must be carefully
statically planned in order to achieve correctness and
efficiency. For the planning of movement, we developed the
static movement planner algorithm described in the
following paragraphs. The algorithm inserts the necessary
movement operations at appropriate places of the code; in
addition, it may minimize the number of movements at the
cost of duplication of code.</p>
        <p>Let GCFG = (VCFG, ECFG) be the control-flow graph of
a procedure, where VCFG are inidividual statements and
ECFG the transitions between them. Let I ∈ VCFG denotes
the initial node (entry point) of the control-flow graph and
F ⊆ VCFG denotes all the final nodes (return statements).
Each statement is either non-blocking or blocking.</p>
        <p>We will assume in the following paragraphs that there is
only one type of blocking operation and that there are no
parameters (nor return values) to this operation. The set of
blocking statements is denoted B ⊆ VCFG.</p>
        <p>The non-blocking statements include access to variables
and memory, calculations, and calls to non-blocking
procedures. For simplicity of the description, we will assume
that all write operations in a statement are ordered after
all read operations in the same statement. The relations
R,W ⊆ VCFG × X where X is the set of variables determine
which statements read/write which variables.</p>
        <p>Let x ∈ X be a scalar local variable with no alias. With
respect to x, the input code may contain only two
operations: read (Rx) and write (Wx). In the output code, we
keep the read/write operations working on the local
storage and we add two additional operations: store (Sx) for
copying the variable from the local storage to the backup
storage and load (Lx) for copying in the opposite direction.
The load/store operations are always inserted between the
original statements, i.e., they are attached to the edges of
the original control flow graph. Note that after an Sx or Lx
operation, both the local storage and the backup storage
contain identical copies of the variable.</p>
        <p>In the input code, the blocking statements do not
affect the local storage. In the inverted code, blocking
statements are broken into a pair of a return statement and an
additional entry point. The inversion causes that the
local storage is destroyed by every B operation and must be
substituted by the backup storage.</p>
        <p>The purpose of the static movement planner is
inserting L and S operations so that the semantics of the code
is not affected by the destruction of the local storage by
B operations. The simplest solution would be inserting
a S-L pair around every B operation; however, it is not an
optimal solution with respect to the number of load/store
operations required.</p>
        <p>Essentially, the variable x is used to transport a value
from a W operation to an R operation, across a path in the
control-flow graph which contains no other W operations.</p>
        <p>If such a path contains a B operation, the transported
value must be saved by inserting S somewhere between the
W and the first B, and loaded back to local storage using
an L operation inserted after any B followed by R.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Static Movement Planner</title>
        <p>The first phases of the algorithm determine the following
set of relations between the control flow graph nodes and
the variables:</p>
        <p>NB, NL, NB, NL, PB, PL, PB, PL ⊆ VCFG × X</p>
        <p>For a statement s ∈ VCFG and a variable x ∈ X , hs, xi ∈
NB indicates that there is a control-flow path starting at s
which needs x located in the backup storage. Similarly,
hs, xi ∈ NL indicates that there is a path where x is needed
in the local storage.</p>
        <p>The next two relations are complementary: hs, xi ∈ NB
means that there is a path starting at s where x is not needed
in the backup storage, similarly for NL and the local
storage.</p>
        <p>Furthermore, hs, xi ∈ PB indicates that there is a
controlflow path ending at s which leaves x present in the backup
storage; similarly for PL and the local storage. Finally,
PB and PL indicate the ends of paths where x is not present
in the backup or local storage, respectively.</p>
        <p>The sets NB, NL, NB, and NL are computed by backward
propagation Algorithm 1.</p>
        <p>Algorithm 1 Backward propagation algorithm
Require: (VCFG, ECFG) control flow graph; F, B ⊆ VCFG; X
set of variables; R,W ⊆ VCFG × X
Ensure: NB, NL, NB, NL ⊆ VCFG × X
1: NB, NL, NB, NL := 0/
2: NB0 := 0/
3: NL0 := R
4: N0B := W ∪ (F × X )
5: N0L := W ∪ ((B ∪ F ) × X )
6: while NB0 ∪ NL0 ∪ N0B ∪ N0L 6= 0/ do
7: NB := NB ∪ NB0 ; NL := NL ∪ NL0
8: NB := NB ∪ N0B ; NL := NL ∪ N0L
9: NB0 , NL0 , N0B, N0L := 0/
10: for ha, bi ∈ ECFG do
11: NB0 [a] := NB0 [a] ∪ (NB[b] \ W [a] \ NB[a])
12: N0L[a] := N0L[a] ∪ (NL[b] \ R[a] \ NL[a])
13: if a ∈ B then
14: NB0 [a] := NB0 [a] ∪ (NL[b] \ W [a] \ NB[a])
15: else
16: NL0 [a] := NL0 [a] ∪ (NL[b] \ W [a] \ NL[a])
17: N0B[a] := N0B[a] ∪ (NB[b] \ NB[a])
18: end if
19: end for
20: end while</p>
        <p>The algorithm essentially enlarges the four relations
from the initial state until stable. The primed versions
of the relations represent the increments added to the
relations at lines 7 and 8. The initial state, formed at the
lines 2 to 5, reflects the fact that a read operation induces
immediate need for local storage while a write operation
or a return statement causes that the previous value is no
longer needed. In addition, any blocking operation aborts
any need for local storage.</p>
        <p>Lines 11 to 18 propagate the relations backward across
a control-flow edge ha, bi. Line 11 states that a need for
backup propagates backward unless a write is
encountered. Line 12 propagates the absence of requirement for
local storage unless a read is encountered.</p>
        <p>Line 14 corresponds to blocking operations – if a
variable is needed in the local storage after a blocking
statement, it is needed before the blocking statement in the
backup storage, unless the blocking statement writes the
variable. Lines 16 and 17 propagate through non-blocking
statements – the need for local storage propagates unless
a write is encountered, the absence for backup storage
propagates unconditionally.</p>
        <p>The sets PB, PL, PB, and PL are computed by forward
propagation Algorithm 2.</p>
        <p>Algorithm 2 Forward propagation algorithm
Require: (VCFG, ECFG) control flow graph; F, B ⊆ VCFG; X
set of variables; R,W ⊆ VCFG × X
Ensure: PB, PL, PB, PL ⊆ VCFG × X
1: PB, PL, PB, PL := 0/
2: PB0 := 0/
3: PL0 := W ∪ R
4: P0B := W ∪ (I × X )
5: P0L := (B ∪ I) × X
6: while PB0 ∪ PL0 ∪ PB ∪ P0L 6= 0/ do</p>
        <p>0
7: PB := PB ∪ PB0 ; PL := PL ∪ PL0
8: PB := PB ∪ P0B ; PL := PL ∪ P0L
9: PB0 , PL0 , P0B, P0L := 0/
10: for ha, bi ∈ ECFG do
11: PB0 [b] := PB0 [b] ∪ (PB[a] \ W [b] \ PB[b])
12: P0L[b] := P0L[b] ∪ (PL[a] \ R[b] \ W [b] \ PL[b])
13: if b ∈ B then
14: PB0 [b] := PB0 [b] ∪ (PL[a] \ W [b] \ PB[b])
15: else
16: PL0 [b] := PL0 [b] ∪ (PL[a] \ PL[b])
17: P0B[b] := P0B[b] ∪ (PB[a] \ PB[b])
18: end if
19: end for
20: end while</p>
        <p>The algorithm is analogous to the backward propagation
with different propagation rules reflecting the following
observations: After any write or read operation, the value
is present in the local storage (line 3, 12); after any write,
the value of backup storage is obsolete (line 4, 11). The
backup storage is initialized by moving from local storage
before a blocking statement (line 14, 17); any blocking
statement invalidates the local storage (line 5, 16).</p>
        <p>The final stage of the static movement planner is the
determination of control-flow edges where the variables shall
be loaded or stored, i.e. the determination of the relations
L, S ⊆ ECFG × X .</p>
        <p>The relations are calculated locally from the relations
created in the previous phases using the following rules:
hha, bi, xi ∈ S ⇔ ha, xi ∈ PB ∧ hb, xi ∈ NB \ (NB \ PB)
hha, bi, xi ∈ L ⇔ ha, xi ∈ PL ∧ hb, xi ∈ NL \ (NL \ PL)
The two rules are based on the same observation:
A copy towards a storage may be placed at an ha, bi edge if
the value was not present (P) in the storage after the
statement a and it may be required (N) by a path starting at b.
Nevertheless, if the value is not needed (N) at some of the
other paths starting at b, copying would be premature and
shall be postponed until the point where all paths need the
variable. However, if the value may already be present (P)
at b (due to a different path ending there), postponing is
not possible because the copy operation will disturb the
existing value.</p>
        <p>The intersection ZB = NB ∩ NB ∩ PB ∩ PB denotes the
places where the rules create suboptimal code: In such
nodes, there exist outgoing paths that need the validity of
the storage as well as outgoing paths where the storage is
not needed; at the same time, incoming paths that ensure
the presence as well as incoming paths without the
presence exist. Consequently, placing a store operation here is
both necessary and premature. Similar sub-optimality
exist at the intersection ZL = NL ∩ NL ∩ PL ∩ PL for the load
operation.</p>
        <p>These sub-optimal placements may be avoided by code
duplication: For each node-variable pair hb, xi in the
intersection ZB, the node b shall be split into b1 and b2
and the incoming edges arranged so that hb1, xi ∈/ PB and
hb2, xi ∈/ PB. The definition ofPB and PB ensures that this
process is plausible because the source nodes for the
incoming edges will be split, too. Similar splitting process
shall be applied for ZL. When repeated for all variables,
the ZB and ZL relations become empty and, consequently,
the insertion of the S and L operations becomes optimal.
However, it comes at the cost of code expansion, in the
worst case exponential with respect to the number of
variables.
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Simple Method Inversion</title>
        <p>We can perform the actual inversion, once all the necessary
variables are identified and saved. The simple inversion
algorithm is shown in Listing 6. The steps are explained
in following text. Listing 5 shows an example of a code
produced by this algorithm.</p>
        <p>The blocking methods are internal to the Bobox
interface library (provided as part of the development
environment), so they are called only inside the library methods
and we expose them by integrating (inlining) all the library
methods into the inverted method.</p>
        <p>Locate blocking calls This step is very simple, because
we have a complete list of all potentially blocking methods
and this means that we simple check the code and record
all the places where the methods are called.
/ / l o c a t e a l l t h e b l o c k i n g m e t h o d s
L o c a t e B l o c k i n g C a l l s ( ) ;
/ / s a v e a s t a t e f o r e a c h c a l l
S t o r e S t a t e s ( ) ;
/ / u s e non−b l o c k i n g m e t h o d s i n s t e a d
R e p l a c e B l o c k i n g C a l l s ( ) ;
/ / i n s e r t r e t u r n and l a b e l
I n s e r t R e t u r n R e s u m e ( ) ;
/ / s t a t e m a c h i n e f o r e v e n t d i s p a t c h
C r e a t e S t a t e M a c h i n e ( ) ;
/ / s a v e and r e s t o r e l o c a l v a r i a b l e s
P a t c h L o c a l D a t a ( ) ;</p>
        <p>Listing 6: Original code befor inversion
Store states We must create a separate state for each
located blocking call, not just each method, but for each
actual call instruction in the code. This way we can resume
the computation at the exact location where we left off.
We simply create State1 - StateN, where N is the number
of located blocking calls. We store the state into a member
variable of the box.</p>
        <p>Replace blocking calls In this step we must replace the
blocking methods for methods that do not wait for data.
These methods just notify the system to call the task, once
an event occurs. There is such a method for each blocking
method and we can assure this, because all the methods
are hidden in the Bobox interface library that is part of
the development environment. So we simply replace the
called blocking method for its safe version.</p>
        <p>Insert return and resume Now we must insert constructs
that allow us to interrupt the computation and later resume
it at the same place. To do this we must place a return
statement after the former blocking call and immediately
after the return we place a label so we can jump back via
a goto statement.</p>
        <p>Create state machine We must create switch statement to
complete the the state machine mechanics, so we can jump
to a correct place, based on the actual state. We create
a switch that takes the actual state as a parameter and it
jumps on the appropriate label placed in the code in the
previous step. There is one thing we must do before the
jump, we must check if the received event is the event we
are waiting for. If the received event is not the correct one,
we simply jump out. We can ignore events, since they
inform us about new data in an input buffer and we will
process it later.
Patch local data In the last step we use the results of
the Variable movement algorithm (Section 4.2) to save and
restore necessary variables.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>We have successfully implemented a first version of
method inversion for CIL intermediated code. We are able
to eliminate blocking calls and we can produce an
eventdriven code from sequential implementation.</p>
      <p>The actual implementation produces a modified CIL
code that can be used in .NET implement of Bobox, but
the code cannot be used in the main C++ implementation,
because the compiler from CIL to C++ is not yet ready.
We are not yet able to measure the effect of the method
inversion on the efficiency of C++ Bobox pipeline, on the
other hand it produces working boxes that we tested in the
.NET implementation of Bobox.</p>
      <p>We are not yet able to present reliable experimental
results, because we can use the optimization only in the
.NET implementation of Bobox which is designed mostly
for testing and prototype evaluation. The .NET version is
not optimized for speed and the method inversion has only
a minimal effect on its efficiency.</p>
      <p>In the future we will be mostly concentrating on the
development of the advanced method inversion that would
produce a code with better structure and behavior.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This work was supported in part by the grant
P103/13/08195S of the Grant Agency of the Czech
Republic and the grant 122214 of the Charles University
Grant Agency.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Van Cutsem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mostinckx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Gonzalez</given-names>
            <surname>Boix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dedecker</surname>
          </string-name>
          , and W. De Meuter, “Ambienttalk:
          <article-title>Objectoriented event-driven programming in mobile ad hoc networks</article-title>
          ,
          <source>” in Chilean Society of Computer Science</source>
          ,
          <year>2007</year>
          .
          <source>SCCC'07. XXVI International Conference of the. IEEE</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dunkels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Voigt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          , “
          <article-title>Protothreads: simplifying event-driven programming of memory-constrained embedded systems</article-title>
          ,”
          <source>in Proceedings of the 4th international conference on Embedded networked sensor systems. ACM</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Asenjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          , and C. Ca¸scaval, “
          <article-title>Load balancing using work-stealing for pipeline parallelism in emerging applications</article-title>
          ,”
          <source>in Proceedings of the 23rd international conference on Supercomputing. ACM</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Falt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bednarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cermak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Zavoral</surname>
          </string-name>
          , “
          <article-title>On parallel evaluation of SPARQL queries,”</article-title>
          <source>in DBKDA</source>
          <year>2012</year>
          ,
          <source>The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications. IARIA</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Asenjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Cascaval</surname>
          </string-name>
          , “
          <article-title>Analytical modeling of pipeline parallelism,” in Parallel Architectures</article-title>
          and
          <string-name>
            <given-names>Compilation</given-names>
            <surname>Techniques</surname>
          </string-name>
          ,
          <year>2009</year>
          . PACT'
          <volume>09</volume>
          . 18th International Conference on. IEEE,
          <year>2009</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Falt</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Yaghob</surname>
          </string-name>
          , “
          <article-title>Task scheduling in data stream processing</article-title>
          .” in DATESO,
          <year>2011</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bednárek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dokulil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yaghob</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Zavoral</surname>
          </string-name>
          , “
          <article-title>The Bobox project parallelization framework and server for data processing</article-title>
          ,” Charles University in Prague,
          <source>Technical Report</source>
          , vol.
          <volume>1</volume>
          , p.
          <year>2011</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brabec</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Bednárek</surname>
          </string-name>
          , “
          <article-title>Programming parallel pipelines using non-parallel C# code</article-title>
          ,
          <source>” CEUR Workshop Proceedings</source>
          , vol.
          <volume>1003</volume>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>87</lpage>
          ,
          <year>2013</year>
          . [Online]. Available: http://ceur-ws.
          <source>org/</source>
          Vol-1003
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghosal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Henzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Kirsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Sanvido</surname>
          </string-name>
          , “
          <article-title>Event-driven programming with logical execution times</article-title>
          ,
          <source>” in Hybrid Systems: Computation and Control</source>
          . Springer,
          <year>2004</year>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          , and T. Millstein, “
          <article-title>Tasks: language support for event-driven programming,” in Proceedings of the 2007 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation</article-title>
          .
          <source>ACM</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>134</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Garofalakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          , “
          <article-title>Parallel query scheduling and optimization with time-and space-shared resources</article-title>
          ,
          <source>” SORT</source>
          , vol.
          <volume>1</volume>
          , no.
          <source>T2</source>
          , p.
          <fpage>T3</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Subhlok</surname>
          </string-name>
          and G. Vondran, “
          <article-title>Optimal latency-throughput tradeoffs for data parallel pipelines,” in Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures</article-title>
          .
          <source>ACM</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cermak</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Zavoral</surname>
          </string-name>
          , “
          <article-title>Achieving high availability in D-Bobox,”</article-title>
          <source>in DBKDA</source>
          <year>2014</year>
          ,
          <source>The Sixth International Conference on Advances in Databases, Knowledge, and Data Applications. IARIA</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>92</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chambers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raniwala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Henry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bradshaw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Weizenbaum</surname>
          </string-name>
          , “
          <article-title>Flumejava: easy, efficient data-parallel pipelines,” in ACM Sigplan Notices</article-title>
          , vol.
          <volume>45</volume>
          , no. 6. ACM,
          <year>2010</year>
          , pp.
          <fpage>363</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bednárek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dokulil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yaghob</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Zavoral</surname>
          </string-name>
          , “
          <article-title>Dataflow awareness in parallel data processing,” in Intelligent Distributed Computing VI, ser</article-title>
          . Studies in Computational Intelligence, G. Fortino,
          <string-name>
            <given-names>C.</given-names>
            <surname>Badica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malgeri</surname>
          </string-name>
          , and R. Unland, Eds. Springer Berlin Heidelberg,
          <year>2013</year>
          , vol.
          <volume>446</volume>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>154</lpage>
          . [Online]. Available: http: //dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -32524-3_
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] --, “
          <article-title>Data-flow awareness in parallel data processing,” in Intelligent Distributed Computing VI, ser</article-title>
          . Studies in Computational Intelligence, G. Fortino,
          <string-name>
            <given-names>C.</given-names>
            <surname>Badica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malgeri</surname>
          </string-name>
          , and R. Unland, Eds. Springer Berlin Heidelberg,
          <year>2013</year>
          , vol.
          <volume>446</volume>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>154</lpage>
          . [Online]. Available: http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -32524-3_
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bednárek</surname>
          </string-name>
          , “
          <article-title>Output-driven xquery evaluation,” in Intelligent Distributed Computing, Systems</article-title>
          and Applications. Springer,
          <year>2008</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>