-

Series

1613-0073

Transformation of Pipeline Stage Algorithms to Event-Driven Code

Michal Brabec

brabec@ksi.mff.cuni.cz 0

David Bednárek

Petr Malý

0 0 Department of Software Engineering Faculty of Mathematics and Physics, Charles University Prague

2014

1214 13 20

Many software systems publish their interface using event-driven programming, where the application programmers create routines, called as responses to the events. One of such systems is the Bobox parallel framework, where the elements of a parallel pipeline react to events signalizing the arrival of input data. However, many algorithms are more efficiently described using classical serial approach, where the application code calls system routines to wait for required events. In this paper, we present a method of code transformation producing eventdriven code from serial code and we concentrate mostly on its first part, responsible for the management of variables. Besides a friendlier programming environment, the transformation often improves the structure of the code with respect to compiler optimizations.

Method Inversion

Many frameworks and libraries present their interface as event-driven programming [ 1 ], where the user code reacts to a set of events produced by the framework at runtime.

Implementing an application in this fashion is not as difficult in areas like user interface, where most complex operations are performed by an application kernel and the interface just presents the results.

This approach is not very practical for complex algorithms, where the event-driven design can significantly increase complexity [ 2 ], making the implementation difficult to maintain and test. The difference in complexity is illustrated by a simple merge algorithm, the basic implementation is in Listing 1 and the event-driven version is in Listing 2.

The merge algorithm takes two input streams and merges them into a single output stream. The input streams provide the data one at a time, the method get () provides current value and the method next () obtains next value. The results are passed on to the output stream using the put () method. The merge algorithm, presented in Listing 1 merges two ordered sets and then it passes the results to the output stream. The implementation is simple to understand and test, but there is a hidden event management in the code. The method next() retrieves new data, and, if there is nothing in the input buffer, it waits. Here the application is waiting for an event (the arrival of input data). w h i l e ( l e f t . h a s _ d a t a ( ) &&

r i g h t . h a s _ d a t a ( ) )

Listing 1: Basic join implementation for Bobox An event-driven implementation is usually less comprehensive and it can be difficult to maintain or verify [ 2 ], at least in the case of complex algorithms. The Listing 2 shows an event-driven implementation of merge algorithm. It is more complex than the simple version presented in Listing 1 and it contains significant code duplication. The implementation is further complicated by the fact that we must check if the current event is the event we were waiting for.

The event driven implementation uses the method notify_next () instead of next () . This method only requests the framework to call the task again, once new data is available. The task does not wait for the data. 1.2 We propose a special technique called method inversion that removes waiting for events. Instead the code reacts to an event and it ends when the event is processed. The transformation produces an algorithm with a behavior similar to the code in Listing 2. In this paper we concentrate mostly on the first part of the optimization responsible for data management, also called movement of variables. In the first part, we must identify currently used variables, preserve their value and later restore them, so the computation can be safely interrupted and resumed.

The rest of the paper is organized as follows: Section 2 describes our motivations for this work along with the main problems we try to solve, we also present an overview of our development environment here. In section 3 we put method inversion in the context of other works related to Bobox and we discuss similar works concentrating on parallel pipelines. Section 4 contains the algorithm that implements method inversion. Section 5 concentrates on the effects the transformation has on the entire pipeline and it concludes the results of our research; we also discuss possible ways to improve presented optimizations and similar future work. 2 2.1

Motivation Parallel Pipelines

Parallel pipelines represent a great way to utilize the full capacity of contemporary computers [ 3 ], because they are highly scalable and they offer a suitable framework for many applications, including database operations [ 4 ].

A parallel pipeline is a set of independent tasks connected by data streams, where the tasks can communicate only via passing messages and data through the streams. The streams pass the output of one task to the next task which uses it as input. Tasks can be run in parallel, with each task processing different data [ 5 ]

Source (left) Source (right) Merge Output

Figure 1 shows a simple pipeline for merging data. Source tasks produce input, the Merge task merges its two input streams and the Output task stores the results. The arrows represent data streams that transport data between tasks.

Parallel pipelines can be treated as event-driven systems, where the main event is the arrival of input data. Each task processes input data and then waits for new data, basically reacting to new data as an event.

Calling a method that waits for an event (new data) is referred to as a blocking call in this paper, because the task is blocked until there is new data available.

One of the most important problems of pipelines is the fact that different tasks may process their data at a different rate, which may lead to a situation where parts of the pipeline are engaged in complex computations while others are waiting for input [ 6 ]. It is necessary that the waiting tasks stop their computations and release any system resources that can be used for other tasks, but doing that by hand can be difficult (Listing 2).

We propose a way to automatically produce eventdriven code from serial code without any additional information, like code annotations, attributes or special comments. This allows users to develop simple tasks that can be easily maintained. The compiler removes blocking calls and transforms the code for the target environment. We designed this transformation for the Bobox framework, but it can be used for other environments. 2.2

Bobox

In this paper we focus on parallel pipelines developed for the Bobox framework. The Bobox framework provides a runtime environment used to execute a general (even non-linear) pipeline in parallel [ 7 ]. The actual execution of the pipeline is managed by the framework, which executes tasks (called boxes) in parallel according to the pipeline structure defined by the users.

There are two ways to use Bobox framework, users can either define the pipeline by hand or they can use an interface that produces the pipeline from another code, like a SPARQL query [ 4 ].

We developed the method inversion to expose all the blocking calls (waiting for data) so they can be removed. This way we can optimize separate tasks and improve the efficiency of the entire pipeline. 2.3

Development Environment

We designed a special development environment to apply presented optimizations automatically. The environment should simplify the implementation of the Bobox boxes and we intend to allow programmers to design entire Bobox pipelines in the environment [ 8 ]. The structure of the environment is in Figure 2, where the bold components are part of the development environment.

The boxes are implemented in the C# programming language and they are compiled to a CIL assembly. The optimized CIL code is then transformed to C++ source code for Bobox framework. 2.4

Goals

We have designed the method inversion to optimize tasks in parallel pipelines and to simplify their development. This way we can improve the pipeline efficiency while making the development simpler at the same time.

Second goal is to use the method inversion as a general transformation that can simplify the development of many applications.

Last goal is to improve the structure of the Bobox tasks that might allow the compiler to apply more advanced optimizations.

C# source code Bobox C# interface library

Bobox application C# compiler Parts of

ParallaX development enironment

Bobox framework

C++ compiler

CIL code

ParallaX optimizer

Optimized CIL code

ParallaX

C++ compiler

C++ source code

Related Work Event-Driven Programming

There are very few works concentrating on the generation of event-driven programs from serial code. There are works that discuss event-driven programming, but they consider mostly applications already implemented as event based [ 9 ], [ 2 ]. There are few systems that try to simplify the development of event-driven programs [ 10 ], but they do not fully hide the events from users, instead they require special annotations or comments. 3.2

Parallel Pipeline Optimizations

Optimizations for parallel pipelines concentrate mostly on task distribution and scheduling according to available CPUs [ 11 ] and other system resources [ 12 ], because load balancing has a major impact on the pipeline efficiency. The solution used in Bobox tasks are distributed among processors at the beginning of the pipeline execution [ 7 ] and the system migrates them very carefully to minimize communication overhead, which is especially useful in distributed environments [ 13 ].

Most parallel pipeline environments leave task optimization to the users and that is why there are not many task-specific optimizations. There are development environments that generate tasks or even pipeline structure from user code [ 14 ] and these systems can optimize the entire pipeline because they can influence its structure. Our development environment is designed for a similar purpose [ 7 ], but the current implementation concentrates on separate boxes. 3.3 Bobox optimizations concentrate mostly on the structure and execution of the entire pipeline, leaving the task optimization for the users. Most works concentrate on task scheduling. There are general scheduling optimizations [ 6 ] as well as special strategies based on dataflow [ 15 ] [ 16 ] or input structure.

Bobox is used mostly for query processing and there are many works presenting optimizations specific for different query languages [ 4 ] [ 17 ], but there are no optimizations for the tasks produced by users. 4

Method Inversion

The inversion transformation is used to expose the parts where a pipeline task is waiting for input data to arrive. These so called blocking calls represent a problem, because it means that the task waits for data without doing anything useful thus blocking an execution unit (CPU). This is a problem, even when a task is waiting passively, because then it is necessary to swap it for another task.

The inversion requires three main steps. First, we must back up all the required data so we can safely interrupt and resume execution. Second, blocking calls must be replaced by non-blocking methods that only notify the framework without waiting. Third, it is necessary to restructure the code so it is possible to safely resume the execution.

We present the last two steps in a simplified version, but the first step is the main focus of this paper because it is necessary for any implementation of the inversion and it can be used for other optimizations.

The inversion is presented on a simple code shown in Listing 3, the code applies a function f () on all values in an input stream and it passes the results to an output stream. Our ultimate goal is to restructure the code to expose all the blocking calls so we do not have to jump inside a complex control flow to resume computations. Such an algorithm would produce code that can be efficiently parallelized and further optimized.

Results of such transformation are presented in Listing 4, while the original code is in Listing 3. The switch statement manages the restoration of variables and then it resumes computation by jumping into the original code. We use goto to minimize code duplication. v o i d d o _ s o m e t h i n g ( ) { s w i t c h ( e v e n t ) { c a s e INPUT_RECEIVED :

i c = 0 ; g o t o l o o p ; c a s e OUTPUT_SENT :

o c = 0 ; g o t o l o o p ; } l o o p : k = min ( i S i z e −i c , o S i z e −o c ) ; f o r ( i = 0 ; i <k ; + + i )

oBuf [ o c + i ] = f ( i B u f [ i c + i ] ) ; i f ( i c + i >= i S i z e ) { o c += k ; r e t u r n WAIT_FOR_INPUT ; } e l s e { i c += k ; r e t u r n WAIT_FOR_OUTPUT ; } }

Listing 4: Product of the advanced inversion

This transformation is still under development, but we implemented a simpler version. It lacks certain advantages of the advanced version, but it is easier to implement.

The simple version does not move the blocking calls outside of the loop, instead it simply jumps inside any control flow, after restoring all the necessary variables. The result of this inversion algorithm is in Listing 5, where the methods used for input and output are not blocking, they just inform the system to send or receive data when there are resources available. 4.2

Movement of Variables

The goal of our transformation is to release the thread of execution during blocking operations. Releasing the thread causes loss of the local storage associated with the s w i t c h ( s t a t e ) { c a s e WAIT_FOR_INPUT : i f ( e v e n t ! = i n p u t ) r e t u r n ; i C u r r = 0 ; i S i z e = s i z e ; g o t o i L o o p ; c a s e WAIT_FOR_OUTPUT : i f ( e v e n t ! = o u t p u t ) r e t u r n ; o C u r r = 0 ; o S i z e = s i z e ; g o t o oLoop ; } thread, namely the registers and stack. Therefore, we need a backup storage for the local variables that are live during a blocking operation. The backup storage will usually consist of data members of a dynamically allocated class object.

Moving aggregate variables, i.e. arrays and structures, is expensive or even impossible (due to aliasing); in most cases, the best strategy is storing aggregate variables in the backup storage for their complete lifetime.

Unaliased scalar variables (including dissociated structures) may be moved easily and their movement between the backup storage and the local storage offers the advantage of faster access to the local storage. The faster access to the local storage is primarily given by the ability of compilers to allocate registers for important localstorage objects (i.e. local variables); furthermore, scalar local-storage variables allow the compilers to apply many advanced optimization steps while similar optimization for backup storage members would require difficult interprocedural analysis of code.

The movement of unaliased scalar variables between the backup storage and the local storage must be carefully statically planned in order to achieve correctness and efficiency. For the planning of movement, we developed the static movement planner algorithm described in the following paragraphs. The algorithm inserts the necessary movement operations at appropriate places of the code; in addition, it may minimize the number of movements at the cost of duplication of code.

Let GCFG = (VCFG, ECFG) be the control-flow graph of a procedure, where VCFG are inidividual statements and ECFG the transitions between them. Let I ∈ VCFG denotes the initial node (entry point) of the control-flow graph and F ⊆ VCFG denotes all the final nodes (return statements). Each statement is either non-blocking or blocking.

We will assume in the following paragraphs that there is only one type of blocking operation and that there are no parameters (nor return values) to this operation. The set of blocking statements is denoted B ⊆ VCFG.

The non-blocking statements include access to variables and memory, calculations, and calls to non-blocking procedures. For simplicity of the description, we will assume that all write operations in a statement are ordered after all read operations in the same statement. The relations R,W ⊆ VCFG × X where X is the set of variables determine which statements read/write which variables.

Let x ∈ X be a scalar local variable with no alias. With respect to x, the input code may contain only two operations: read (Rx) and write (Wx). In the output code, we keep the read/write operations working on the local storage and we add two additional operations: store (Sx) for copying the variable from the local storage to the backup storage and load (Lx) for copying in the opposite direction. The load/store operations are always inserted between the original statements, i.e., they are attached to the edges of the original control flow graph. Note that after an Sx or Lx operation, both the local storage and the backup storage contain identical copies of the variable.

In the input code, the blocking statements do not affect the local storage. In the inverted code, blocking statements are broken into a pair of a return statement and an additional entry point. The inversion causes that the local storage is destroyed by every B operation and must be substituted by the backup storage.

The purpose of the static movement planner is inserting L and S operations so that the semantics of the code is not affected by the destruction of the local storage by B operations. The simplest solution would be inserting a S-L pair around every B operation; however, it is not an optimal solution with respect to the number of load/store operations required.

Essentially, the variable x is used to transport a value from a W operation to an R operation, across a path in the control-flow graph which contains no other W operations.

If such a path contains a B operation, the transported value must be saved by inserting S somewhere between the W and the first B, and loaded back to local storage using an L operation inserted after any B followed by R. 4.3

Static Movement Planner

The first phases of the algorithm determine the following set of relations between the control flow graph nodes and the variables:

NB, NL, NB, NL, PB, PL, PB, PL ⊆ VCFG × X

For a statement s ∈ VCFG and a variable x ∈ X , hs, xi ∈ NB indicates that there is a control-flow path starting at s which needs x located in the backup storage. Similarly, hs, xi ∈ NL indicates that there is a path where x is needed in the local storage.

The next two relations are complementary: hs, xi ∈ NB means that there is a path starting at s where x is not needed in the backup storage, similarly for NL and the local storage.

Furthermore, hs, xi ∈ PB indicates that there is a controlflow path ending at s which leaves x present in the backup storage; similarly for PL and the local storage. Finally, PB and PL indicate the ends of paths where x is not present in the backup or local storage, respectively.

The sets NB, NL, NB, and NL are computed by backward propagation Algorithm 1.

Algorithm 1 Backward propagation algorithm Require: (VCFG, ECFG) control flow graph; F, B ⊆ VCFG; X set of variables; R,W ⊆ VCFG × X Ensure: NB, NL, NB, NL ⊆ VCFG × X 1: NB, NL, NB, NL := 0/ 2: NB0 := 0/ 3: NL0 := R 4: N0B := W ∪ (F × X ) 5: N0L := W ∪ ((B ∪ F ) × X ) 6: while NB0 ∪ NL0 ∪ N0B ∪ N0L 6= 0/ do 7: NB := NB ∪ NB0 ; NL := NL ∪ NL0 8: NB := NB ∪ N0B ; NL := NL ∪ N0L 9: NB0 , NL0 , N0B, N0L := 0/ 10: for ha, bi ∈ ECFG do 11: NB0 [a] := NB0 [a] ∪ (NB[b] \ W [a] \ NB[a]) 12: N0L[a] := N0L[a] ∪ (NL[b] \ R[a] \ NL[a]) 13: if a ∈ B then 14: NB0 [a] := NB0 [a] ∪ (NL[b] \ W [a] \ NB[a]) 15: else 16: NL0 [a] := NL0 [a] ∪ (NL[b] \ W [a] \ NL[a]) 17: N0B[a] := N0B[a] ∪ (NB[b] \ NB[a]) 18: end if 19: end for 20: end while

The algorithm essentially enlarges the four relations from the initial state until stable. The primed versions of the relations represent the increments added to the relations at lines 7 and 8. The initial state, formed at the lines 2 to 5, reflects the fact that a read operation induces immediate need for local storage while a write operation or a return statement causes that the previous value is no longer needed. In addition, any blocking operation aborts any need for local storage.

Lines 11 to 18 propagate the relations backward across a control-flow edge ha, bi. Line 11 states that a need for backup propagates backward unless a write is encountered. Line 12 propagates the absence of requirement for local storage unless a read is encountered.

Line 14 corresponds to blocking operations – if a variable is needed in the local storage after a blocking statement, it is needed before the blocking statement in the backup storage, unless the blocking statement writes the variable. Lines 16 and 17 propagate through non-blocking statements – the need for local storage propagates unless a write is encountered, the absence for backup storage propagates unconditionally.

The sets PB, PL, PB, and PL are computed by forward propagation Algorithm 2.

Algorithm 2 Forward propagation algorithm Require: (VCFG, ECFG) control flow graph; F, B ⊆ VCFG; X set of variables; R,W ⊆ VCFG × X Ensure: PB, PL, PB, PL ⊆ VCFG × X 1: PB, PL, PB, PL := 0/ 2: PB0 := 0/ 3: PL0 := W ∪ R 4: P0B := W ∪ (I × X ) 5: P0L := (B ∪ I) × X 6: while PB0 ∪ PL0 ∪ PB ∪ P0L 6= 0/ do

0 7: PB := PB ∪ PB0 ; PL := PL ∪ PL0 8: PB := PB ∪ P0B ; PL := PL ∪ P0L 9: PB0 , PL0 , P0B, P0L := 0/ 10: for ha, bi ∈ ECFG do 11: PB0 [b] := PB0 [b] ∪ (PB[a] \ W [b] \ PB[b]) 12: P0L[b] := P0L[b] ∪ (PL[a] \ R[b] \ W [b] \ PL[b]) 13: if b ∈ B then 14: PB0 [b] := PB0 [b] ∪ (PL[a] \ W [b] \ PB[b]) 15: else 16: PL0 [b] := PL0 [b] ∪ (PL[a] \ PL[b]) 17: P0B[b] := P0B[b] ∪ (PB[a] \ PB[b]) 18: end if 19: end for 20: end while

The algorithm is analogous to the backward propagation with different propagation rules reflecting the following observations: After any write or read operation, the value is present in the local storage (line 3, 12); after any write, the value of backup storage is obsolete (line 4, 11). The backup storage is initialized by moving from local storage before a blocking statement (line 14, 17); any blocking statement invalidates the local storage (line 5, 16).

The final stage of the static movement planner is the determination of control-flow edges where the variables shall be loaded or stored, i.e. the determination of the relations L, S ⊆ ECFG × X .

The relations are calculated locally from the relations created in the previous phases using the following rules: hha, bi, xi ∈ S ⇔ ha, xi ∈ PB ∧ hb, xi ∈ NB \ (NB \ PB) hha, bi, xi ∈ L ⇔ ha, xi ∈ PL ∧ hb, xi ∈ NL \ (NL \ PL) The two rules are based on the same observation: A copy towards a storage may be placed at an ha, bi edge if the value was not present (P) in the storage after the statement a and it may be required (N) by a path starting at b. Nevertheless, if the value is not needed (N) at some of the other paths starting at b, copying would be premature and shall be postponed until the point where all paths need the variable. However, if the value may already be present (P) at b (due to a different path ending there), postponing is not possible because the copy operation will disturb the existing value.

The intersection ZB = NB ∩ NB ∩ PB ∩ PB denotes the places where the rules create suboptimal code: In such nodes, there exist outgoing paths that need the validity of the storage as well as outgoing paths where the storage is not needed; at the same time, incoming paths that ensure the presence as well as incoming paths without the presence exist. Consequently, placing a store operation here is both necessary and premature. Similar sub-optimality exist at the intersection ZL = NL ∩ NL ∩ PL ∩ PL for the load operation.

These sub-optimal placements may be avoided by code duplication: For each node-variable pair hb, xi in the intersection ZB, the node b shall be split into b1 and b2 and the incoming edges arranged so that hb1, xi ∈/ PB and hb2, xi ∈/ PB. The definition ofPB and PB ensures that this process is plausible because the source nodes for the incoming edges will be split, too. Similar splitting process shall be applied for ZL. When repeated for all variables, the ZB and ZL relations become empty and, consequently, the insertion of the S and L operations becomes optimal. However, it comes at the cost of code expansion, in the worst case exponential with respect to the number of variables. 4.4

Simple Method Inversion

We can perform the actual inversion, once all the necessary variables are identified and saved. The simple inversion algorithm is shown in Listing 6. The steps are explained in following text. Listing 5 shows an example of a code produced by this algorithm.

The blocking methods are internal to the Bobox interface library (provided as part of the development environment), so they are called only inside the library methods and we expose them by integrating (inlining) all the library methods into the inverted method.

Locate blocking calls This step is very simple, because we have a complete list of all potentially blocking methods and this means that we simple check the code and record all the places where the methods are called. / / l o c a t e a l l t h e b l o c k i n g m e t h o d s L o c a t e B l o c k i n g C a l l s ( ) ; / / s a v e a s t a t e f o r e a c h c a l l S t o r e S t a t e s ( ) ; / / u s e non−b l o c k i n g m e t h o d s i n s t e a d R e p l a c e B l o c k i n g C a l l s ( ) ; / / i n s e r t r e t u r n and l a b e l I n s e r t R e t u r n R e s u m e ( ) ; / / s t a t e m a c h i n e f o r e v e n t d i s p a t c h C r e a t e S t a t e M a c h i n e ( ) ; / / s a v e and r e s t o r e l o c a l v a r i a b l e s P a t c h L o c a l D a t a ( ) ;

Listing 6: Original code befor inversion Store states We must create a separate state for each located blocking call, not just each method, but for each actual call instruction in the code. This way we can resume the computation at the exact location where we left off. We simply create State1 - StateN, where N is the number of located blocking calls. We store the state into a member variable of the box.

Replace blocking calls In this step we must replace the blocking methods for methods that do not wait for data. These methods just notify the system to call the task, once an event occurs. There is such a method for each blocking method and we can assure this, because all the methods are hidden in the Bobox interface library that is part of the development environment. So we simply replace the called blocking method for its safe version.

Insert return and resume Now we must insert constructs that allow us to interrupt the computation and later resume it at the same place. To do this we must place a return statement after the former blocking call and immediately after the return we place a label so we can jump back via a goto statement.

Create state machine We must create switch statement to complete the the state machine mechanics, so we can jump to a correct place, based on the actual state. We create a switch that takes the actual state as a parameter and it jumps on the appropriate label placed in the code in the previous step. There is one thing we must do before the jump, we must check if the received event is the event we are waiting for. If the received event is not the correct one, we simply jump out. We can ignore events, since they inform us about new data in an input buffer and we will process it later. Patch local data In the last step we use the results of the Variable movement algorithm (Section 4.2) to save and restore necessary variables.

Conclusion and Future Work

We have successfully implemented a first version of method inversion for CIL intermediated code. We are able to eliminate blocking calls and we can produce an eventdriven code from sequential implementation.

The actual implementation produces a modified CIL code that can be used in .NET implement of Bobox, but the code cannot be used in the main C++ implementation, because the compiler from CIL to C++ is not yet ready. We are not yet able to measure the effect of the method inversion on the efficiency of C++ Bobox pipeline, on the other hand it produces working boxes that we tested in the .NET implementation of Bobox.

We are not yet able to present reliable experimental results, because we can use the optimization only in the .NET implementation of Bobox which is designed mostly for testing and prototype evaluation. The .NET version is not optimized for speed and the method inversion has only a minimal effect on its efficiency.

In the future we will be mostly concentrating on the development of the advanced method inversion that would produce a code with better structure and behavior.

Acknowledgment

This work was supported in part by the grant P103/13/08195S of the Grant Agency of the Czech Republic and the grant 122214 of the Charles University Grant Agency.

[1]

Van Cutsem ,

Mostinckx ,

E. Gonzalez

Boix ,

Dedecker , and W. De Meuter, “Ambienttalk: Objectoriented event-driven programming in mobile ad hoc networks , ” in Chilean Society of Computer Science , 2007 . SCCC'07. XXVI International Conference of the. IEEE , 2007 , pp. 3 - 12 .

[2]

Dunkels ,

Schmidt ,

Voigt , and

Ali , “ Protothreads: simplifying event-driven programming of memory-constrained embedded systems ,” in Proceedings of the 4th international conference on Embedded networked sensor systems. ACM , 2006 , pp. 29 - 42 .

[3]

Navarro ,

Asenjo ,

Tabik , and C. Ca¸scaval, “ Load balancing using work-stealing for pipeline parallelism in emerging applications ,” in Proceedings of the 23rd international conference on Supercomputing. ACM , 2009 , pp. 517 - 518 .

[4]

Falt ,

Bednarek ,

Cermak , and

Zavoral , “ On parallel evaluation of SPARQL queries,” in DBKDA 2012 , The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications. IARIA , 2012 , pp. 97 - 102 .

[5]

Navarro ,

Asenjo ,

Tabik , and

Cascaval , “ Analytical modeling of pipeline parallelism,” in Parallel Architectures and

Compilation

Techniques , 2009 . PACT' 09 . 18th International Conference on. IEEE, 2009 , pp. 281 - 290 .

[6]

Falt and

Yaghob , “ Task scheduling in data stream processing .” in DATESO, 2011 , pp. 85 - 96 .

[7]

Bednárek ,

Dokulil ,

Yaghob , and

Zavoral , “ The Bobox project parallelization framework and server for data processing ,” Charles University in Prague, Technical Report , vol. 1 , p. 2011 , 2011 .

[8]

Brabec and

Bednárek , “ Programming parallel pipelines using non-parallel C# code , ” CEUR Workshop Proceedings , vol. 1003 , pp. 82 - 87 , 2013 . [Online]. Available: http://ceur-ws. org/ Vol-1003

[9]

Ghosal ,

T. A.

Henzinger ,

C. M.

Kirsch , and

M. A.

Sanvido , “ Event-driven programming with logical execution times , ” in Hybrid Systems: Computation and Control . Springer, 2004 , pp. 357 - 371 .

[10]

Fischer ,

Majumdar , and T. Millstein, “ Tasks: language support for event-driven programming,” in Proceedings of the 2007 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation . ACM , 2007 , pp. 134 - 143 .

[11]

M. N.

Garofalakis and

Y. E.

Ioannidis , “ Parallel query scheduling and optimization with time-and space-shared resources , ” SORT , vol. 1 , no. T2 , p. T3 , 1997 .

[12]

Subhlok and G. Vondran, “ Optimal latency-throughput tradeoffs for data parallel pipelines,” in Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures . ACM , 1996 , pp. 62 - 71 .

[13]

Cermak and

Zavoral , “ Achieving high availability in D-Bobox,” in DBKDA 2014 , The Sixth International Conference on Advances in Databases, Knowledge, and Data Applications. IARIA , 2014 , pp. 92 - 97 .

[14]

Chambers ,

Raniwala ,

Perry ,

Adams ,

R. R.

Henry ,

Bradshaw , and

Weizenbaum , “ Flumejava: easy, efficient data-parallel pipelines,” in ACM Sigplan Notices , vol. 45 , no. 6. ACM, 2010 , pp. 363 - 375 .

[15]

Bednárek ,

Dokulil ,

Yaghob , and

Zavoral , “ Dataflow awareness in parallel data processing,” in Intelligent Distributed Computing VI, ser . Studies in Computational Intelligence, G. Fortino,

Badica ,

Malgeri , and R. Unland, Eds. Springer Berlin Heidelberg, 2013 , vol. 446 , pp. 149 - 154 . [Online]. Available: http: //dx.doi.org/10.1007/978-3- 642 -32524-3_ 19

[16] --, “ Data-flow awareness in parallel data processing,” in Intelligent Distributed Computing VI, ser . Studies in Computational Intelligence, G. Fortino,

Badica ,

Malgeri , and R. Unland, Eds. Springer Berlin Heidelberg, 2013 , vol. 446 , pp. 149 - 154 . [Online]. Available: http://dx.doi.org/10.1007/978-3- 642 -32524-3_ 19

[17]

Bednárek , “ Output-driven xquery evaluation,” in Intelligent Distributed Computing, Systems and Applications. Springer, 2008 , pp. 55 - 64 .