=Paper= {{Paper |id=None |storemode=property |title=Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks |pdfUrl=https://ceur-ws.org/Vol-1422/9.pdf |volume=Vol-1422 |dblpUrl=https://dblp.org/rec/conf/itat/BrabecB15 }} ==Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks== https://ceur-ws.org/Vol-1422/9.pdf

J. Yaghob (Ed.): ITAT 2015 pp. 9–16
Charles University in Prague, Prague, 2015

Hybrid Flow Graphs: Towards the Transformation of Sequential Code into
Parallel Pipeline Networks

Michal Brabec, David Bednárek

Department of Software Engineering
Faculty of Mathematics and Physics, Charles University Prague
{brabec,bednarek}@ksi.mff.cuni.cz

Abstract: Transforming procedural code for execution by Our approach is focused at target environments in which
specialized parallel platforms requires a model of compu- an application is presented in two layers – the declarative
tation sufficiently close to both the sequential program- upper layer represented as a graph of operators, the lower
ming languages and the target parallel environment. In layer consisting of a procedural implementation of indi-
this paper, we present Hybrid Flow Graphs, encompassing vidual operators. In such a setting, a language front-end
both control flow and data flow in a unified, pipeline based transforms the application source code into an intermedi-
model of computation. Besides the definition of the Hy- ate representation; then, a strategic phase decides where
brid Flow Graph, we introduce a formal framework based to put the boundary between the declarative upper layer
on graph rewriting, used for the specification of Hybrid and the procedural lower layer. Finally, the upper layer
Flow Graph semantics as well as for the proofs of cor- is converted into the final graph while procedural code (in
rectness of the associated code transformations. As a for- source, bytecode, or binary form) is generated from the
malism particularly close to pipeline-based runtime envi- lower layer.
ronments which include many modern database engines, Realization of this idea involves many particular prob-
the Hybrid Flow Graphs may become a powerful means lems; in this paper, we deal with the intermediate represen-
for the automatic parallelization of sequential code under tation which serves as the backbone connecting the front-
these environments. end, strategy, and generator phases. Since the upper layer
Keywords: compiler, graph, optimization, parallelism will finally be represented by a graph, it seems reasonable
that also the intermediate representation be graph based.
However, such a graph shall also be able to completely
1 Introduction represent the lower, procedural part of the application. Al-
Parallelization of procedural code has been studied for though the theory of compiling uses graphs in many ap-
several decades, resulting in a number of mature im- plications including the representation of the code, the
plementations, usually tightly coupled with compilers control-flow and data-flow aspects of the code are usually
of FORTRAN or C. In most cases, the parallelization approached differently. In our approach, the control flow
techniques are targeted at multi-threaded environment, and the data flow is represented by the same mechanism
sometimes augmented with task-based parallelism. This because the strategic phase must be able to make the cuts
shared-memory approach is perfectly suited for numeric inside both the control flow and the data flow.
applications; however, there are niches of computing In this paper, we present Hybrid Flow Graphs (HFG) as
which may benefit from a different run-time paradigm. In an intermediate representation of procedural code, capa-
particular, database systems make use of execution plans ble of representing control-flow and data-flow in the same
which explicitly denote the data flows between operators; layer. We will demonstrate two forms of the HFG, the
similar graph-based representation is often used in stream- Sequential HFG and the General HFG. The former form
ing systems. Besides other advantages, the explicitly de- has its execution constrained to sequential, equivalent to
noted data flow allows easier distribution of the computa- the execution of the procedural code which it was gen-
tion across different nodes, compared to the complexity of erated from. The latter form exhibits independent exe-
potentially random access in the shared-memory model of cution of individual operations, showing available paral-
computation. lelism. Of course, making operations independent requires
Nevertheless, database and streaming systems lack the careful dependency analysis – we assume that the required
generality of procedural programming languages. To ex- points-to/alias/dependency analyses were made before the
tend the applicability of such runtime systems towards conversion to HFG, using some of the algorithms known
general programming, a system capable of compiling (a from the theory of compiler construction.
subset of) a well-known procedural language is required. We will describe the operational semantics of both the
This paper demonstrates an important step towards such sequential and the independent forms of HFG and we will
a system – an intermediate representation capable of de- also demonstrate the conversion between them. We will
scribing procedural code using the language of data-flow use graph rewriting as the vehicle for the description of
oriented graphs. both the semantics and conversions. The use of graph
10 M. Brabec, D. Bednárek

rewriting systems allows for some generality of the ap- The flow graph is not only a compiler data represen-
proach, both in the description and in the implementation. tation, it is a processing model as well, similar to KPN
In particular, the set of operations provided by the inter- graphs [14]. It can be used as a source code for specialized
mediate code may be defined almost arbitrarily as long as processing environments, where frameworks for pipeline
their semantics is depicted by graph rewriting rules. On parallelism are the best target, since these frameworks use
the other hand, the set of control-flow operations is fixed similar models for applications [15]. One such a system is
since it forms the non-trivial part of the transformation the Bobox framework [16], where the flow graph can be
from sequential code. used to generate the execution plan similarly to the way
In this paper, we will use the set of types and operators Bobox is used to execute SPARQL queries [17].
borrowed from the C# language, more exactly, from its
standard bytecode representation called CIL [1]. This in-
2.1 Graph Rewriting
cludes some artifacts specific to the CIL, namely the stack
and the associated operations. Since the Sequential HFG A graph rewriting system (GRS) is a set of rules R trans-
mimics the behavior of the CIL abstract machine, the stack forming one graph to another. The rewriting systems are
operations are preserved in the HFG. However, they must very similar to grammars where each rule has a left side
disappear in the transformation to the General HFG; oth- that has to be matched to the graph and a right side that
erwise, the presence of stack operations would prohibit al- replaces it.
most any parallelism. Since the removal of stack opera- The GRS are very simple and they are best explained
tions is a non-trivial sub-problem, we include the corre- by an example. Figure 2 shows the CIL based rewriting
sponding (CIL-specific) algorithm in this paper, as well as system, where each rule emulates a single CIL instruction.
the description of the preceding conversion of CIL into the We use extended rewriting rules where ∗ represents
sequential HFG. any node (wildcard) and we add Greek letters to identify
The rest of the paper is organized as follows: We review unique wildcard nodes where necessary. Variable names
the related work in Section 2. The hybrid flow graph and (x or y) represent only nodes with data. Expressions are
its semantics is defined in Section 3. We define the se- used to simplify data manipulation. This is necessary so
quential hybrid flow graph in Section 4. In Section 5, we we do not have to create a rule for each combination of
present an algorithm producing a sequential HFG from a instruction or inputs.
source code. Section 6 presents the algorithms that trans-
form a general HFG to a sequential HFG.
3 Hybrid Flow Graph
2 Related Work A hybrid flow graph (HFG) is an annotated directed graph,
where nodes together with their labels represent operations
The flow graph described in this paper is similar but not and edges represent queues for transferring data. The di-
identical to other modeling languages, like Petri nets [2] rection of the edge indicates the direction of data flow.
or Kahn process networks [3]. The main difference is that Figure 1 shows a general hybrid flow graph for a simple
the flow graph was designed for automatic generation from function (see Listing 1 for source code).
the source code, where the other languages are generally
used to model the application prior to implementation [4] v o i d SimpleLoop ( i n t a ) {
or to verify a finished system [5]. The flow graph is sim- f o r ( i n t i =a ; i < 5 ; )
ilar to the graph rewriting system [6], which can be used {
to design and analyze applications, but it is not convenient i = call ( i );
for execution. There are frameworks that generate GRS }
from procedural code like Java [7], though the produced }
graphs are difficult to optimize. The flow graph has simi- Listing 1 Simple function containing a single loop
lar traits to frameworks that allow applications to be gen-
erated from graphs, like UML diagrams [8] [9], but we A hybrid flow graph is a general concept that can be
concentrate both on graph extraction and execution. used regardless of operations O or data types T . A plat-
The flow graph is closely related to graphs used form specific flow graph model is defined by specify-
in compilers, mainly the dependence and control flow ing O and T based on the target platform. The plat-
graphs [10], where the flow graph merges the informa- form specific definition is HFG_MODEL plat f orm = (O ∪
tion from both. The construction of the flow graph and {special operations}, T ), where O and T are platform
its subsequent optimization relies on compiler techniques, specific and special operations are platform independent
mainly points-to analysis [11], dependence testing [12] (see Section 3.1). The hybrid flow graph semantics (ex-
and control-flow analysis [13]. In compilers, graphs result- plained in Section 3.2) are also platform specific.
ing from these techniques are typically used as additional As our research is focused on C#, we use CIL [1] in-
annotation over intermediate code. structions to specify node operations. The most common
Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 11

Figure 2: Semantics of selected basic operations
Figure 1: The HFG representation of the C# code from
Listing 1

instruction in our examples are: ldl x (load variable or
constant), stl x (write to a variable), cle (compare less or
equal), add etc. (mathematical operations) [1]. We omit
data types for the edges, because they are not important
for the graph construction.

3.1 Representation of Control Flow
Control flow operations are transformed into platform in-
dependent special operations which interact with the data-
flow carried through the platform specific basic opera-
tions. Graphical notation of special operations is in Fig-
ure 3, where a broadcast is a black square and a merge is
a white square. Figure 3: Semantics of selected special operations
A broadcast node has a single input and a variable num-
ber of outputs. It represents an operation that creates
a copy of the input for each output. it repeats while it has an input, we specify this behavior
A merge node has a single output and three inputs. It by changing its label to ldc − r. The definition of special
represents an operation that accepts data from two sources operations is illustrated in Figure 3, the figure contains the
and passes them to a single operation based on a condition, definition of broadcast and merge. We use special graphi-
it is used to merge data flow after a conditional branch. cal notation for special operations, since they are platform
A loop merge node is a special version of the merge independent (broadcast is black square, merge is a white
node with two inputs for conditional branches, it is used square).
to merge data in loops. The input is split into two pairs, We can use the graph rewriting rules defined in Figure 2
where each contains one data input and one condition in- and 3 to perform the computation of a hybrid flow graph.
put. The node first reads data from the first pair and then The computation of a flow graph with a for-loop is shown
it loops, while the second pair has a positive value in the in Figure 4. Each part of Figure 4 contains the HFG state
condition input. after application of a single rule and the computation starts
after application the ldc5 rule.
3.2 Semantics of Hybrid Flow Graphs
The hybrid flow graph semantics define the way nodes 4 Sequential HFG
process data and communicate. Basically, nodes represent
operations or data and edges represent unbounded queues In this section, we define the semantics of a sequential hy-
(FIFO) for data transfer. brid flow graph using the graph transformation systems.
Semantics are platform specific, similar to the hybrid A sequential hybrid flow graph is a special HFG that uses
flow graph model. We define the hybrid flow graph se- a program counter a memory nodes instead of special op-
mantics using the graph rewriting systems according to the erations defined in Section 3.1. It has a strict control flow
platform specific operations logic. that closely follows that of the source program. It is use-
We define the HFG semantics for CIL using graph less for the parallel pipelines or other applications, but it
rewriting. Figure 2 shows the example of definition for allows us to create a general HFG from the source code.
the basic instructions. Instruction load constant (ldc) has We use special operations for variables and the stack
a special behavior, it produces a single constant and then (CIL virtual machine is stack based [1]), these operations
12 M. Brabec, D. Bednárek

(a) application of ldc5 rule (b) application of broadcast

Figure 6: Sequential HFG created from code in Listing 1
Figure 4: Computation of a hybrid flow graph

proper number of stack levels and variables, based on the
instruction definition.
The actual computation is the same as in the case of the
.NET virtual machine, the control flow follows the same
path, variables and stack contain the same value at after
every evaluated instruction. Graphically, the computation
is similar to Figure 4.

5 Transformation of CIL to Sequential
HFG

In this section, we explain how to create a sequential hy-
brid flow graph from a CIL code. We present an algorithm
that produces a sequential HFG that follows the CIL com-
putation step by step, including control and data flow. It
Figure 5: Semantics of selected CIL sequential HFG oper-
allows us to define the advanced transformation algorithm
ations
in Section 6.
The basic transformation of a CIL code to a sequential
are used to model the data flow. Plus we add a special HFG requires two steps: 1) create nodes based on the in-
node P (program counter) to model the control flow. structions 2) create edges based on the control and data
flow.
The entire transformation is in Algorithm 1, function v
4.1 Semantics of a CIL Sequential HFG maps nodes to operations. The node P is basically a pro-
The sequential hybrid flow graph semantics is defined us- gram counter, it points to the actual instruction and it is
ing the graph rewriting, same as any other hybrid flow used to handle the control flow. The input sets I, M, Pi are
graph. The transformation rules for the CIL based sequen- obtained from the source code. Sri and Swi are based di-
tial hybrid flow graph are defined in Figure 5, we present rectly on CIL specification [1]. The transformation is best
only the rules necessary for our examples. presented on an example, Listing 1 contains CIL source
The GRS rules defined in Figure 5 are completely based code and Figure 6 contains the resulting graph.
on the CIL instruction definition [1]. The control flow is
controlled by the node P, which follows the edges between
instructions. The instruction edges are constructed accord- 6 Transformation of Sequential HFG to
ing to the following rules: 1) The standard instructions fol- General HFG
low one another 2) The branches have two outgoing edges,
one following the jump target and the other leading to the In this section, we present an algorithm that transforms
next instruction. a procedural code to a hybrid flow graph. We create the
The data management is handled by the stack and CIL sequential HFG (Section 5) and convert it to a general
variable operations, where each instruction modifies the HFG that requires neither the program counter P, nor the
Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 13

Algorithm 1 Transformation of CIL to Sequential HFG
Require: I – set of instruction addresses
M – set containing all variables
Ci , Pi – code and parameter of the instruction i
Sri – number of stack values read by instruction i
Swi – number of stack values written by instruction i
Ensure: V – nodes of the flow graph
v(Vi ) – label of the node Vi
E – edges
ofSTACKthe flow graph
1: V := V P ,V
2: V := V ∪ ViINSTR : i ∈ I
3: v(ViINSTR ):= hCi Pi i for each i ∈ I
Figure 8: General HFG of the SimpleLoop function before
4: V := V ∪ VmVAR : m ∈ M
empty operation elimination
5: v(VmVAR ) := hvar mi for each m ∈ M
6: E := 0/
7: for all i ∈ I do memory nodes. The input for the algorithm is an interme-
8: if i = branchnthen o diate code; in our case CIL.
9: E := E ∪ (ViINSTR ,VPINSTR
i
) The basic idea of the algorithm is that every CIL in-
10: end if struction is transformed to a node. The data and control
11: E := E ∪ (ViINSTR ,Vi+1 INSTR
) flow is then modeled using the sequential HFG presented
12: end for in Section 5. We emulate computation on the graph and
13: for all i ∈ I do note what data can reach which instruction, by what path.
14: if i = stl then
n o Finally, we create edges according to the collected infor-
15: E := E ∪ (ViINSTR ,VPVAR i
) mation and we introduce special nodes where necessary.
16: end if
17: if i = ldl then
n o 6.1 Flow Graph Construction
18: E := E ∪ (VPVAR i
,V i
INSTR
)
19: end if To construct a complete flow graph, we create nodes based
20: if Swi > 0 then
on instructions and use a modified sequential HFG to cre-
21: E := E ∪ (ViINSTR ,V STACK ) ate edges and special nodes. We use basic blocks to model
22: end if all valid execution paths which we use to create all the
23: if Sri > 0 then
necessary edges. Algorithm 2 describes the entire process,
24: E := E ∪ (V STACK ,ViINSTR ) the other algorithms are discussed in the following sec-
25: end if tions. The result of Algorithm 2, applied to the code in
26: end for Listing 1, is the HFG in Figure 1 and Figure 8 shows the
graph before NOP elimination.
The first step of the algorithm is to create basic nodes
according to the instructions of the source code (lines 1
to 3 in Algorithm 2).
To crate the edges, we generate necessary execution
paths using the CIL path generator (Section 6.4) and an-
alyze all the possible inputs by the CIL reachability algo-
rithm (Section 6.3). Basically, we emulate computations
for every valid execution path recording the possible in-
puts, and the paths leading to them, for every instruction.
First, we convert the basic blocks to a regular expression
(function to_regular_expression), then we generate all the
valid paths using the CIL path generator. Next, we have
to update the GCIL so it contains the correct number of it-
erations (function update) for every loop (this due to the
IDs assigned to branches as shown in Figure 9). Finally,
we use the CIL reachability to create sets S[0:N] containing
all the possible inputs for each instruction (lines 4 to 10).
Figure 7: Transformation from CIL to a general HFG
S[0:N] represents sets S0 to SN that contain inputs for in-
structions indexed 0 through N.
14 M. Brabec, D. Bednárek

Algorithm 2 Hybrid flow graph construction
Require: I – set of instructions
B – graph of basic blocks
GCIL – modified CIL graph
RCIL – modified rules for CIL graph
Ensure: N – nodes of the flow graph
E – edges of the flow graph
1: N := {Ni : i ∈ I ∧ v(Ni ) = Oi }
2: regex := to_regular_expression(B)
3: S[0:N] := 0/
4: for all W ∈ CIL_path_generator(regex, GCIL ) do
5: for all si ∈ CIL_reachability(I, update(GCIL ), RCIL )
do
6: Si := Si ∪ si
7: end for
8: end for
9: for all si ∈ S[0:N] do
Figure 9: Symbolic semantics of selected Sequential HFG
10: for all p ∈ si do
operations
11: while ∃q ∈ si : q. f irst = p. f irst ∧ q 6= p do
12: si := si \ q
We can create edges, once the input sets S[0:N] are ready. 13: end while
We start by discarding duplicate sources, because each in- 14: end for
struction can produce only one value, which can reach the 15: if |si | = input_count(i) then
instruction i through multiple paths (different iterations in- 16: for all p ∈ si do
between), we can keep any instance as they are equivalent 17: E := E ∪ {(p, i)}
(lines 12 to 16). Then we check the number of inputs left, 18: end for
if there is the same number of sources as the instruction i 19: else
has inputs then we simply create edges (lines 17 to 21). 20: while |si | > input_count(i) do
If there is more sources than i has inputs then we create 21: a := si [0]
merge nodes to select the correct input value and we con- 22: b := si [1]
nect them to a condition that decides what input is used. 23: V := V ∪ {merge}
We iterate over the sources and we compare the attached 24: E := E ∪ {a. f irst, merge}
paths, the paths always start the same (i.e. the first instruc- 25: E := E ∪ {b. f irst, merge}
tion) and they continue until there is a conditional branch 26: E := E ∪ {merge, i}
that changes the execution order for one of them. We cre- 27: while a.second[pos] = b.second[pos] do
ate a merge node, connect both sources to it and we con- 28: pos := pos + 1
nect the branch condition to the condition input. This way, 29: end while
we reduce the number of incoming edges, until it matches 30: E := E ∪ {pos, merge}
the number of inputs. 31: end while
The last step of Algorithm 2 is the elimination of empty 32: end if
33: end for
operations that no longer have any purpose. This means
34: eiminateN OP(V, E)
mainly loading and storing local variables (ldl and stl).
Eliminate NOP removes any nodes that do not change the
data.
use this to drive the control flow (explained in the next sec-
tion). Last modification is that the rules output identifiers
6.2 Symbolic Semantics of Sequential HFG of consumed values, this is indicated by the bold dashed
boxes on the right side in Figure 9.
We use modified semantics for the CIL sequential HFG
defined (Section 5). The modified rules work with instruc-
tion identifiers instead of actual values. This way, we can 6.3 CIL Reachability
analyze which instructions consume the value produced
other instructions. The rule modification is shown in Fig- We use the modified HFG semantics to locate all possible
ure 9. inputs for each instruction i, where an input is an instruc-
We assign each instruction an identifier equal to its in- tions producing a value consumed by i. We do this by em-
dex (∀i ∈ I : id(i) = i). The identifiers of branches decide ulating CIL computation and storing the inputs for each
how many times the branch is true before it is f alse, we instruction, implemented by Algorithm 3.
Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 15

Algorithm 3 CIL reachability count, because multiple loop iterations do not add new in-
Require: I – set of instructions formation (see Section 6.5 for more details).
GCIL – modified CIL graph
RCIL – modified rules for CIL graph 6.5 Algorithm Correctness
Ensure: S[0:N] – sets of sources for each instruction
1: path := () – empty path
In the following sections, we discuss the reasoning behind
2: while ∃r ∈ RCIL : applicable(GCIL , r) do
the hybrid flow graph construction presented in Section 6,
3: i := program_counter(GCIL ) – current instruction mainly why it is equivalent to the input source code. Here,
4: consumed := apply(GCIL , r) – apply rule we present only the main ideas of a proof, because a com-
5: for all id ∈ consumed do plete proof would require too much space.
6: Si := Si ∪ {(id, path)} We have to focus mainly on three parts of the transfor-
7: end for mation algorithm: 1) the CIL sequential hybrid flow graph
8: path := path · i – update execution path is equivalent to the CIL source code 2) the analyzed ex-
9: end while
ecution paths contain all the possible inputs 3) the merge
nodes created according to the execution paths are placed
correctly.
Algorithm 3 uses several simple functions: applicable
tells whether the rule can be matched on the graph, apply 6.6 CIL HFG Equivalence To Code
applies the rule and returns the input IDs (dashed boxes in
Figure 9), program_counter returns the instruction con- We prove the equivalence of the CIL code and the CIL
nected to P. hybrid graph (Section 5) by making sure that they have
the same control and data flow. We can do this, because
the CIL graph contains a node for program counter and
6.4 CIL Path Generator nodes for variables and stack.
We base the proof on the fact that the HFG operations
We have to examine all valid execution paths to make sure
behave the same way as the instructions. The program
that the HFG is complete. We do this by creating a regular
counter is passed from one instruction to another in the
expression representing the control flow and then generat-
same way as in CIL – control flow is the same. The in-
ing all valid words (paths).
structions store data using special operations (nodes) for
We create basic block graph based on the source code,
stack and variables in the same way they do in memory –
this is a very simple algorithm explained in [10]. The ba-
data flow is equivalent. We omit the full proof, because it
sic block graph can be interpreted as a finite state machine,
would be too long and technical.
the source code is finite, producing only a finite number
of states. We disregard the actual values produced in the
blocks that define the actual control flow. Instead, we treat 6.7 Flow Graph Logic
the basic block graph as a finite state machine that accepts
We cannot use the same approach for a general hybrid flow
a set of words where each word defines a valid order of
graph as for CIL sequential HFG, because a general HFG
basic blocks. Therefore the execution paths can be rep-
does not model the platform specific memory features and
resented by a regular expression [18]. Based on the ba-
control logic. Instead, we focus on the transformation of
sic blocks, we create a regular expression modeling all the
a CIL sequential HFG to a general HFG. We prove their
valid execution paths.
equivalence by tracing all valid execution paths.

Algorithm 4 CIL path generator Lemma 1 (Necessary execution paths). Let W we a set
Require: R – regular expression of words produced by the CIL path generator algorithm
Ensure: W – set of words (Algorithm 4) and E := {e∗ : e ∈ Σ} set of all words, then
S S
1: for all l ∈ [1 : 3 ∗ length(R)] do w∈W reachable(w) = e∈E reachable(e), where the func-
2: for all w ∈ {v : v = e∗ ∧ e ∈ Σ ∧ |v| = l} do tion reachable represents the CIL reachability algorithm.
3: if valid(w, R) then Proof. The lemma says that the paths generated by Al-
4: W := W ∪ {w} gorithm 4 produce all possible inputs for all instructions.
5: end if This is true because there is only a limited number of paths
6: end for that can produce new inputs.
7: end for The basic block graph can be viewed as a finite state ma-
chine. The regular language accepted by the state machine
Algorithm 4 generates all valid execution paths, where can be infinite, but only due to the Pumping lemma [19],
the function valid simply checks if the word is valid ac- which means infinite loop iterations.
cording to the regular expression and Σ is the alphabet. The important effect of loops is that the instructions use
We limit the word length, thus limiting the loop iteration values produced outside the loop in the first iteration and
16 M. Brabec, D. Bednárek

then they consume values produced in the loop. The only [2] Peterson, J. L.: Petri nets. ACM Comput. Surv., 9
possible exception is a conditional branch in the loop, but (3) (Sep. 1977), 223–252, [Online]. Available: http:
we check the paths with both branches taken in the first //doi.acm.org/10.1145/356698.356702
iteration. [3] Gilles, K.: The semantics of a simple language for parallel
programming. In Information Processing: Proceedings of
Lemma 2 (Data flow path merging). Let pairs the IFIP Congress 74 (1974), 471–475
(S[1:N] , P[1:N] ) be sources for instruction i, then [4] Josephs, M. B.: Models for data-flow sequential pro-
input_count(i) = N =⇒ (S[1:N] , i) ∈ E(HFG) or cesses. In: Communicating Sequential Processes. The First
input_count(i) < N =⇒ ∃m ∈ V (HFG) ∧ (S[1:N] , m) ∈ 25 Years, Springer, 2005, 85–97
E(HFG) [5] Ezpeleta, J., Colom, J. M., Martinez, J.: A Petri net based
deadlock prevention policy for flexible manufacturing sys-
Proof. The lemma says that either all the sources were tems. Robotics and Automation, IEEE Transactions on 11
used to create appropriate edges or that a merge nodes (2) (1995), 173–184
were introduced if there is too many sources. [6] Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Graph
The first implication is true, because Algorithm 2 cre- transformation systems. Fundamentals of Algebraic Graph
ates an edge for every source (lines 17 to 20). Transformation (2006), 37–71
The second implication is correct, because the algo- [7] Corradini, A., Dotti, F. L., Foss, L., Ribeiro, L.: Translat-
rithm creates merge nodes when there is too many sources ing java code to graph transformation systems. In: Graph
(line 25). The merge is created based on the execution path Transformations, Springer, 2004, 383–398
stored for each input, it is connected to the last instruction [8] Geiger, L., Zündorf, A.: Graph based debugging with fu-
of the prefix shared by both paths (lines 29 to 32). This jaba. Electr. Notes Theor. Comput. Sci. 72 (2) (2002), 112
means that the merge is driven by the conditional branch [9] Balasubramanian, D., Narayanan, A., van Buskirk, C., Kar-
that changed the execution path and decided what source sai, G.: The graph rewriting and transformation language:
would be used. Great. Electronic Communications of the EASST 1 (2007)
[10] Allen, R., Kennedy, K.: Optimizing compilers for modern
architectures. Morgan Kaufmann San Francisco, 2002
7 Conclusions [11] Sridharan, M., Bodík, R.: Refinement-based context-
sensitive points-to analysis for Java. ACM SIGPLAN No-
We designed the flow graph to represent a procedural code tices 41 (6) (2006), 387–400
along with important information about its structure and [12] Muchnick, S. S.: Advanced compiler design implementa-
behavior. We defined the HFG semantics using graph tion. Morgan Kaufmann Publishers, 1997
rewriting, which is basically a grammar on graphs. We de- [13] Reps, T.: Program analysis via graph reachability. Informa-
fined semantics for both the sequential and general HFG tion and software technology 40 (11) (1998), 701–726
using the formalism, thus making the sequential HFG a [14] Geilen, M., Basten, T.: Requirements on the execution of
special version of the general HFG. We designed an algo- Kahn process networks. In: Programming languages and
systems, Springer, 2003, 319–334
rithm that transforms a sequential source code (C# com-
piled to CIL) to a general HFG using mainly the HFG se- [15] Navarro, A., Asenjo, R., Tabik, S., Cascaval, C.: Analytical
modeling of pipeline parallelism. In: Parallel Architectures
mantics and graph algorithms. Therefore, the algorithm
and Compilation Techniques, 2009, PACT’09, 18th Inter-
can be modified for other platforms.
national Conference on, IEEE, 2009, 281–290.
We have implemented a prototype of the general HFG
[16] Falt, Z., Kruliš, M., Bednárek, D., Yaghob, J., Zavoral,
that uses the defined semantics and is capable of com-
F.: Locality aware task scheduling in parallel data stream
pletely parallel computation. The prototype serves as the processing. In: Intelligent Distributed Computing VIII,
proof of concept, showing that the HFG is defined cor- ser. Studies in Computational Intelligence, D. Camacho,
rectly. It is implemented using the Bobox framework [16]. L. Braubach, S. Venticinque, and C. Badica, Eds., Springer
International Publishing, 2015, 570, 331–342
[17] Falt, Z., Čermák, M., Dokulil, J., Zavoral, F.: Paral-
Acknowledgments lel SPARQL query processing using Bobox. International
Journal On Advances in Intelligent Systems 5 (3,4) (2012),
This paper was partially supported by the Grant Agency 302–314
of Charles University (GAUK) project 122214 and [18] McNaughton, R., Yamada, H.: Regular expressions and
by the Czech Science Foundation (GACR) project state graphs for automata, 1960.
P103/13/08195. [19] Jaffe, J.: A necessary and sufficient pumping lemma for
regular languages. ACM SIGACT News 10 (2) (1978), 48–
49
References

[1] “TG3. Common Language Infrastructure (CLI). Standard
ECMA-335, June 2005.”