J. Yaghob (Ed.): ITAT 2015 pp. 9–16 Charles University in Prague, Prague, 2015 Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks Michal Brabec, David Bednárek Department of Software Engineering Faculty of Mathematics and Physics, Charles University Prague {brabec,bednarek}@ksi.mff.cuni.cz Abstract: Transforming procedural code for execution by Our approach is focused at target environments in which specialized parallel platforms requires a model of compu- an application is presented in two layers – the declarative tation sufficiently close to both the sequential program- upper layer represented as a graph of operators, the lower ming languages and the target parallel environment. In layer consisting of a procedural implementation of indi- this paper, we present Hybrid Flow Graphs, encompassing vidual operators. In such a setting, a language front-end both control flow and data flow in a unified, pipeline based transforms the application source code into an intermedi- model of computation. Besides the definition of the Hy- ate representation; then, a strategic phase decides where brid Flow Graph, we introduce a formal framework based to put the boundary between the declarative upper layer on graph rewriting, used for the specification of Hybrid and the procedural lower layer. Finally, the upper layer Flow Graph semantics as well as for the proofs of cor- is converted into the final graph while procedural code (in rectness of the associated code transformations. As a for- source, bytecode, or binary form) is generated from the malism particularly close to pipeline-based runtime envi- lower layer. ronments which include many modern database engines, Realization of this idea involves many particular prob- the Hybrid Flow Graphs may become a powerful means lems; in this paper, we deal with the intermediate represen- for the automatic parallelization of sequential code under tation which serves as the backbone connecting the front- these environments. end, strategy, and generator phases. Since the upper layer Keywords: compiler, graph, optimization, parallelism will finally be represented by a graph, it seems reasonable that also the intermediate representation be graph based. However, such a graph shall also be able to completely 1 Introduction represent the lower, procedural part of the application. Al- Parallelization of procedural code has been studied for though the theory of compiling uses graphs in many ap- several decades, resulting in a number of mature im- plications including the representation of the code, the plementations, usually tightly coupled with compilers control-flow and data-flow aspects of the code are usually of FORTRAN or C. In most cases, the parallelization approached differently. In our approach, the control flow techniques are targeted at multi-threaded environment, and the data flow is represented by the same mechanism sometimes augmented with task-based parallelism. This because the strategic phase must be able to make the cuts shared-memory approach is perfectly suited for numeric inside both the control flow and the data flow. applications; however, there are niches of computing In this paper, we present Hybrid Flow Graphs (HFG) as which may benefit from a different run-time paradigm. In an intermediate representation of procedural code, capa- particular, database systems make use of execution plans ble of representing control-flow and data-flow in the same which explicitly denote the data flows between operators; layer. We will demonstrate two forms of the HFG, the similar graph-based representation is often used in stream- Sequential HFG and the General HFG. The former form ing systems. Besides other advantages, the explicitly de- has its execution constrained to sequential, equivalent to noted data flow allows easier distribution of the computa- the execution of the procedural code which it was gen- tion across different nodes, compared to the complexity of erated from. The latter form exhibits independent exe- potentially random access in the shared-memory model of cution of individual operations, showing available paral- computation. lelism. Of course, making operations independent requires Nevertheless, database and streaming systems lack the careful dependency analysis – we assume that the required generality of procedural programming languages. To ex- points-to/alias/dependency analyses were made before the tend the applicability of such runtime systems towards conversion to HFG, using some of the algorithms known general programming, a system capable of compiling (a from the theory of compiler construction. subset of) a well-known procedural language is required. We will describe the operational semantics of both the This paper demonstrates an important step towards such sequential and the independent forms of HFG and we will a system – an intermediate representation capable of de- also demonstrate the conversion between them. We will scribing procedural code using the language of data-flow use graph rewriting as the vehicle for the description of oriented graphs. both the semantics and conversions. The use of graph 10 M. Brabec, D. Bednárek rewriting systems allows for some generality of the ap- The flow graph is not only a compiler data represen- proach, both in the description and in the implementation. tation, it is a processing model as well, similar to KPN In particular, the set of operations provided by the inter- graphs [14]. It can be used as a source code for specialized mediate code may be defined almost arbitrarily as long as processing environments, where frameworks for pipeline their semantics is depicted by graph rewriting rules. On parallelism are the best target, since these frameworks use the other hand, the set of control-flow operations is fixed similar models for applications [15]. One such a system is since it forms the non-trivial part of the transformation the Bobox framework [16], where the flow graph can be from sequential code. used to generate the execution plan similarly to the way In this paper, we will use the set of types and operators Bobox is used to execute SPARQL queries [17]. borrowed from the C# language, more exactly, from its standard bytecode representation called CIL [1]. This in- 2.1 Graph Rewriting cludes some artifacts specific to the CIL, namely the stack and the associated operations. Since the Sequential HFG A graph rewriting system (GRS) is a set of rules R trans- mimics the behavior of the CIL abstract machine, the stack forming one graph to another. The rewriting systems are operations are preserved in the HFG. However, they must very similar to grammars where each rule has a left side disappear in the transformation to the General HFG; oth- that has to be matched to the graph and a right side that erwise, the presence of stack operations would prohibit al- replaces it. most any parallelism. Since the removal of stack opera- The GRS are very simple and they are best explained tions is a non-trivial sub-problem, we include the corre- by an example. Figure 2 shows the CIL based rewriting sponding (CIL-specific) algorithm in this paper, as well as system, where each rule emulates a single CIL instruction. the description of the preceding conversion of CIL into the We use extended rewriting rules where ∗ represents sequential HFG. any node (wildcard) and we add Greek letters to identify The rest of the paper is organized as follows: We review unique wildcard nodes where necessary. Variable names the related work in Section 2. The hybrid flow graph and (x or y) represent only nodes with data. Expressions are its semantics is defined in Section 3. We define the se- used to simplify data manipulation. This is necessary so quential hybrid flow graph in Section 4. In Section 5, we we do not have to create a rule for each combination of present an algorithm producing a sequential HFG from a instruction or inputs. source code. Section 6 presents the algorithms that trans- form a general HFG to a sequential HFG. 3 Hybrid Flow Graph 2 Related Work A hybrid flow graph (HFG) is an annotated directed graph, where nodes together with their labels represent operations The flow graph described in this paper is similar but not and edges represent queues for transferring data. The di- identical to other modeling languages, like Petri nets [2] rection of the edge indicates the direction of data flow. or Kahn process networks [3]. The main difference is that Figure 1 shows a general hybrid flow graph for a simple the flow graph was designed for automatic generation from function (see Listing 1 for source code). the source code, where the other languages are generally used to model the application prior to implementation [4] v o i d SimpleLoop ( i n t a ) { or to verify a finished system [5]. The flow graph is sim- f o r ( i n t i =a ; i < 5 ; ) ilar to the graph rewriting system [6], which can be used { to design and analyze applications, but it is not convenient i = call ( i ); for execution. There are frameworks that generate GRS } from procedural code like Java [7], though the produced } graphs are difficult to optimize. The flow graph has simi- Listing 1 Simple function containing a single loop lar traits to frameworks that allow applications to be gen- erated from graphs, like UML diagrams [8] [9], but we A hybrid flow graph is a general concept that can be concentrate both on graph extraction and execution. used regardless of operations O or data types T . A plat- The flow graph is closely related to graphs used form specific flow graph model is defined by specify- in compilers, mainly the dependence and control flow ing O and T based on the target platform. The plat- graphs [10], where the flow graph merges the informa- form specific definition is HFG_MODEL plat f orm = (O ∪ tion from both. The construction of the flow graph and {special operations}, T ), where O and T are platform its subsequent optimization relies on compiler techniques, specific and special operations are platform independent mainly points-to analysis [11], dependence testing [12] (see Section 3.1). The hybrid flow graph semantics (ex- and control-flow analysis [13]. In compilers, graphs result- plained in Section 3.2) are also platform specific. ing from these techniques are typically used as additional As our research is focused on C#, we use CIL [1] in- annotation over intermediate code. structions to specify node operations. The most common Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 11 Figure 2: Semantics of selected basic operations Figure 1: The HFG representation of the C# code from Listing 1 instruction in our examples are: ldl x (load variable or constant), stl x (write to a variable), cle (compare less or equal), add etc. (mathematical operations) [1]. We omit data types for the edges, because they are not important for the graph construction. 3.1 Representation of Control Flow Control flow operations are transformed into platform in- dependent special operations which interact with the data- flow carried through the platform specific basic opera- tions. Graphical notation of special operations is in Fig- ure 3, where a broadcast is a black square and a merge is a white square. Figure 3: Semantics of selected special operations A broadcast node has a single input and a variable num- ber of outputs. It represents an operation that creates a copy of the input for each output. it repeats while it has an input, we specify this behavior A merge node has a single output and three inputs. It by changing its label to ldc − r. The definition of special represents an operation that accepts data from two sources operations is illustrated in Figure 3, the figure contains the and passes them to a single operation based on a condition, definition of broadcast and merge. We use special graphi- it is used to merge data flow after a conditional branch. cal notation for special operations, since they are platform A loop merge node is a special version of the merge independent (broadcast is black square, merge is a white node with two inputs for conditional branches, it is used square). to merge data in loops. The input is split into two pairs, We can use the graph rewriting rules defined in Figure 2 where each contains one data input and one condition in- and 3 to perform the computation of a hybrid flow graph. put. The node first reads data from the first pair and then The computation of a flow graph with a for-loop is shown it loops, while the second pair has a positive value in the in Figure 4. Each part of Figure 4 contains the HFG state condition input. after application of a single rule and the computation starts after application the ldc5 rule. 3.2 Semantics of Hybrid Flow Graphs The hybrid flow graph semantics define the way nodes 4 Sequential HFG process data and communicate. Basically, nodes represent operations or data and edges represent unbounded queues In this section, we define the semantics of a sequential hy- (FIFO) for data transfer. brid flow graph using the graph transformation systems. Semantics are platform specific, similar to the hybrid A sequential hybrid flow graph is a special HFG that uses flow graph model. We define the hybrid flow graph se- a program counter a memory nodes instead of special op- mantics using the graph rewriting systems according to the erations defined in Section 3.1. It has a strict control flow platform specific operations logic. that closely follows that of the source program. It is use- We define the HFG semantics for CIL using graph less for the parallel pipelines or other applications, but it rewriting. Figure 2 shows the example of definition for allows us to create a general HFG from the source code. the basic instructions. Instruction load constant (ldc) has We use special operations for variables and the stack a special behavior, it produces a single constant and then (CIL virtual machine is stack based [1]), these operations 12 M. Brabec, D. Bednárek (a) application of ldc5 rule (b) application of broadcast (c) application of cle rule (d) application of merge rule Figure 6: Sequential HFG created from code in Listing 1 Figure 4: Computation of a hybrid flow graph proper number of stack levels and variables, based on the instruction definition. The actual computation is the same as in the case of the .NET virtual machine, the control flow follows the same path, variables and stack contain the same value at after every evaluated instruction. Graphically, the computation is similar to Figure 4. 5 Transformation of CIL to Sequential HFG In this section, we explain how to create a sequential hy- brid flow graph from a CIL code. We present an algorithm that produces a sequential HFG that follows the CIL com- putation step by step, including control and data flow. It Figure 5: Semantics of selected CIL sequential HFG oper- allows us to define the advanced transformation algorithm ations in Section 6. The basic transformation of a CIL code to a sequential are used to model the data flow. Plus we add a special HFG requires two steps: 1) create nodes based on the in- node P (program counter) to model the control flow. structions 2) create edges based on the control and data flow. The entire transformation is in Algorithm 1, function v 4.1 Semantics of a CIL Sequential HFG maps nodes to operations. The node P is basically a pro- The sequential hybrid flow graph semantics is defined us- gram counter, it points to the actual instruction and it is ing the graph rewriting, same as any other hybrid flow used to handle the control flow. The input sets I, M, Pi are graph. The transformation rules for the CIL based sequen- obtained from the source code. Sri and Swi are based di- tial hybrid flow graph are defined in Figure 5, we present rectly on CIL specification [1]. The transformation is best only the rules necessary for our examples. presented on an example, Listing 1 contains CIL source The GRS rules defined in Figure 5 are completely based code and Figure 6 contains the resulting graph. on the CIL instruction definition [1]. The control flow is controlled by the node P, which follows the edges between instructions. The instruction edges are constructed accord- 6 Transformation of Sequential HFG to ing to the following rules: 1) The standard instructions fol- General HFG low one another 2) The branches have two outgoing edges, one following the jump target and the other leading to the In this section, we present an algorithm that transforms next instruction. a procedural code to a hybrid flow graph. We create the The data management is handled by the stack and CIL sequential HFG (Section 5) and convert it to a general variable operations, where each instruction modifies the HFG that requires neither the program counter P, nor the Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 13 Algorithm 1 Transformation of CIL to Sequential HFG Require: I – set of instruction addresses M – set containing all variables Ci , Pi – code and parameter of the instruction i Sri – number of stack values read by instruction i Swi – number of stack values written by instruction i Ensure: V – nodes of the flow graph v(Vi ) – label of the node Vi E – edges  ofSTACKthe flow graph 1: V := V P ,V 2: V := V ∪ ViINSTR : i ∈ I 3: v(ViINSTR ):= hCi Pi i for each i ∈ I Figure 8: General HFG of the SimpleLoop function before 4: V := V ∪ VmVAR : m ∈ M empty operation elimination 5: v(VmVAR ) := hvar mi for each m ∈ M 6: E := 0/ 7: for all i ∈ I do memory nodes. The input for the algorithm is an interme- 8: if i = branchnthen o diate code; in our case CIL. 9: E := E ∪ (ViINSTR ,VPINSTR i ) The basic idea of the algorithm is that every CIL in- 10: end if  struction is transformed to a node. The data and control 11: E := E ∪ (ViINSTR ,Vi+1 INSTR ) flow is then modeled using the sequential HFG presented 12: end for in Section 5. We emulate computation on the graph and 13: for all i ∈ I do note what data can reach which instruction, by what path. 14: if i = stl then n o Finally, we create edges according to the collected infor- 15: E := E ∪ (ViINSTR ,VPVAR i ) mation and we introduce special nodes where necessary. 16: end if 17: if i = ldl then n o 6.1 Flow Graph Construction 18: E := E ∪ (VPVAR i ,V i INSTR ) 19: end if To construct a complete flow graph, we create nodes based 20: if Swi > 0 then  on instructions and use a modified sequential HFG to cre- 21: E := E ∪ (ViINSTR ,V STACK ) ate edges and special nodes. We use basic blocks to model 22: end if all valid execution paths which we use to create all the 23: if Sri > 0 then  necessary edges. Algorithm 2 describes the entire process, 24: E := E ∪ (V STACK ,ViINSTR ) the other algorithms are discussed in the following sec- 25: end if tions. The result of Algorithm 2, applied to the code in 26: end for Listing 1, is the HFG in Figure 1 and Figure 8 shows the graph before NOP elimination. The first step of the algorithm is to create basic nodes according to the instructions of the source code (lines 1 to 3 in Algorithm 2). To crate the edges, we generate necessary execution paths using the CIL path generator (Section 6.4) and an- alyze all the possible inputs by the CIL reachability algo- rithm (Section 6.3). Basically, we emulate computations for every valid execution path recording the possible in- puts, and the paths leading to them, for every instruction. First, we convert the basic blocks to a regular expression (function to_regular_expression), then we generate all the valid paths using the CIL path generator. Next, we have to update the GCIL so it contains the correct number of it- erations (function update) for every loop (this due to the IDs assigned to branches as shown in Figure 9). Finally, we use the CIL reachability to create sets S[0:N] containing all the possible inputs for each instruction (lines 4 to 10). Figure 7: Transformation from CIL to a general HFG S[0:N] represents sets S0 to SN that contain inputs for in- structions indexed 0 through N. 14 M. Brabec, D. Bednárek Algorithm 2 Hybrid flow graph construction Require: I – set of instructions B – graph of basic blocks GCIL – modified CIL graph RCIL – modified rules for CIL graph Ensure: N – nodes of the flow graph E – edges of the flow graph 1: N := {Ni : i ∈ I ∧ v(Ni ) = Oi } 2: regex := to_regular_expression(B) 3: S[0:N] := 0/ 4: for all W ∈ CIL_path_generator(regex, GCIL ) do 5: for all si ∈ CIL_reachability(I, update(GCIL ), RCIL ) do 6: Si := Si ∪ si 7: end for 8: end for 9: for all si ∈ S[0:N] do Figure 9: Symbolic semantics of selected Sequential HFG 10: for all p ∈ si do operations 11: while ∃q ∈ si : q. f irst = p. f irst ∧ q 6= p do 12: si := si \ q We can create edges, once the input sets S[0:N] are ready. 13: end while We start by discarding duplicate sources, because each in- 14: end for struction can produce only one value, which can reach the 15: if |si | = input_count(i) then instruction i through multiple paths (different iterations in- 16: for all p ∈ si do between), we can keep any instance as they are equivalent 17: E := E ∪ {(p, i)} (lines 12 to 16). Then we check the number of inputs left, 18: end for if there is the same number of sources as the instruction i 19: else has inputs then we simply create edges (lines 17 to 21). 20: while |si | > input_count(i) do If there is more sources than i has inputs then we create 21: a := si [0] merge nodes to select the correct input value and we con- 22: b := si [1] nect them to a condition that decides what input is used. 23: V := V ∪ {merge} We iterate over the sources and we compare the attached 24: E := E ∪ {a. f irst, merge} paths, the paths always start the same (i.e. the first instruc- 25: E := E ∪ {b. f irst, merge} tion) and they continue until there is a conditional branch 26: E := E ∪ {merge, i} that changes the execution order for one of them. We cre- 27: while a.second[pos] = b.second[pos] do ate a merge node, connect both sources to it and we con- 28: pos := pos + 1 nect the branch condition to the condition input. This way, 29: end while we reduce the number of incoming edges, until it matches 30: E := E ∪ {pos, merge} the number of inputs. 31: end while The last step of Algorithm 2 is the elimination of empty 32: end if 33: end for operations that no longer have any purpose. This means 34: eiminateN OP(V, E) mainly loading and storing local variables (ldl and stl). Eliminate NOP removes any nodes that do not change the data. use this to drive the control flow (explained in the next sec- tion). Last modification is that the rules output identifiers 6.2 Symbolic Semantics of Sequential HFG of consumed values, this is indicated by the bold dashed boxes on the right side in Figure 9. We use modified semantics for the CIL sequential HFG defined (Section 5). The modified rules work with instruc- tion identifiers instead of actual values. This way, we can 6.3 CIL Reachability analyze which instructions consume the value produced other instructions. The rule modification is shown in Fig- We use the modified HFG semantics to locate all possible ure 9. inputs for each instruction i, where an input is an instruc- We assign each instruction an identifier equal to its in- tions producing a value consumed by i. We do this by em- dex (∀i ∈ I : id(i) = i). The identifiers of branches decide ulating CIL computation and storing the inputs for each how many times the branch is true before it is f alse, we instruction, implemented by Algorithm 3. Hybrid Flow Graphs: Towards the Transformation of Sequential Code into Parallel Pipeline Networks 15 Algorithm 3 CIL reachability count, because multiple loop iterations do not add new in- Require: I – set of instructions formation (see Section 6.5 for more details). GCIL – modified CIL graph RCIL – modified rules for CIL graph 6.5 Algorithm Correctness Ensure: S[0:N] – sets of sources for each instruction 1: path := () – empty path In the following sections, we discuss the reasoning behind 2: while ∃r ∈ RCIL : applicable(GCIL , r) do the hybrid flow graph construction presented in Section 6, 3: i := program_counter(GCIL ) – current instruction mainly why it is equivalent to the input source code. Here, 4: consumed := apply(GCIL , r) – apply rule we present only the main ideas of a proof, because a com- 5: for all id ∈ consumed do plete proof would require too much space. 6: Si := Si ∪ {(id, path)} We have to focus mainly on three parts of the transfor- 7: end for mation algorithm: 1) the CIL sequential hybrid flow graph 8: path := path · i – update execution path is equivalent to the CIL source code 2) the analyzed ex- 9: end while ecution paths contain all the possible inputs 3) the merge nodes created according to the execution paths are placed correctly. Algorithm 3 uses several simple functions: applicable tells whether the rule can be matched on the graph, apply 6.6 CIL HFG Equivalence To Code applies the rule and returns the input IDs (dashed boxes in Figure 9), program_counter returns the instruction con- We prove the equivalence of the CIL code and the CIL nected to P. hybrid graph (Section 5) by making sure that they have the same control and data flow. We can do this, because the CIL graph contains a node for program counter and 6.4 CIL Path Generator nodes for variables and stack. We base the proof on the fact that the HFG operations We have to examine all valid execution paths to make sure behave the same way as the instructions. The program that the HFG is complete. We do this by creating a regular counter is passed from one instruction to another in the expression representing the control flow and then generat- same way as in CIL – control flow is the same. The in- ing all valid words (paths). structions store data using special operations (nodes) for We create basic block graph based on the source code, stack and variables in the same way they do in memory – this is a very simple algorithm explained in [10]. The ba- data flow is equivalent. We omit the full proof, because it sic block graph can be interpreted as a finite state machine, would be too long and technical. the source code is finite, producing only a finite number of states. We disregard the actual values produced in the blocks that define the actual control flow. Instead, we treat 6.7 Flow Graph Logic the basic block graph as a finite state machine that accepts We cannot use the same approach for a general hybrid flow a set of words where each word defines a valid order of graph as for CIL sequential HFG, because a general HFG basic blocks. Therefore the execution paths can be rep- does not model the platform specific memory features and resented by a regular expression [18]. Based on the ba- control logic. Instead, we focus on the transformation of sic blocks, we create a regular expression modeling all the a CIL sequential HFG to a general HFG. We prove their valid execution paths. equivalence by tracing all valid execution paths. Algorithm 4 CIL path generator Lemma 1 (Necessary execution paths). Let W we a set Require: R – regular expression of words produced by the CIL path generator algorithm Ensure: W – set of words (Algorithm 4) and E := {e∗ : e ∈ Σ} set of all words, then S S 1: for all l ∈ [1 : 3 ∗ length(R)] do w∈W reachable(w) = e∈E reachable(e), where the func- 2: for all w ∈ {v : v = e∗ ∧ e ∈ Σ ∧ |v| = l} do tion reachable represents the CIL reachability algorithm. 3: if valid(w, R) then Proof. The lemma says that the paths generated by Al- 4: W := W ∪ {w} gorithm 4 produce all possible inputs for all instructions. 5: end if This is true because there is only a limited number of paths 6: end for that can produce new inputs. 7: end for The basic block graph can be viewed as a finite state ma- chine. The regular language accepted by the state machine Algorithm 4 generates all valid execution paths, where can be infinite, but only due to the Pumping lemma [19], the function valid simply checks if the word is valid ac- which means infinite loop iterations. cording to the regular expression and Σ is the alphabet. The important effect of loops is that the instructions use We limit the word length, thus limiting the loop iteration values produced outside the loop in the first iteration and 16 M. Brabec, D. Bednárek then they consume values produced in the loop. The only [2] Peterson, J. L.: Petri nets. ACM Comput. Surv., 9 possible exception is a conditional branch in the loop, but (3) (Sep. 1977), 223–252, [Online]. Available: http: we check the paths with both branches taken in the first //doi.acm.org/10.1145/356698.356702 iteration. [3] Gilles, K.: The semantics of a simple language for parallel programming. In Information Processing: Proceedings of Lemma 2 (Data flow path merging). Let pairs the IFIP Congress 74 (1974), 471–475 (S[1:N] , P[1:N] ) be sources for instruction i, then [4] Josephs, M. B.: Models for data-flow sequential pro- input_count(i) = N =⇒ (S[1:N] , i) ∈ E(HFG) or cesses. In: Communicating Sequential Processes. The First input_count(i) < N =⇒ ∃m ∈ V (HFG) ∧ (S[1:N] , m) ∈ 25 Years, Springer, 2005, 85–97 E(HFG) [5] Ezpeleta, J., Colom, J. M., Martinez, J.: A Petri net based deadlock prevention policy for flexible manufacturing sys- Proof. The lemma says that either all the sources were tems. Robotics and Automation, IEEE Transactions on 11 used to create appropriate edges or that a merge nodes (2) (1995), 173–184 were introduced if there is too many sources. [6] Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Graph The first implication is true, because Algorithm 2 cre- transformation systems. Fundamentals of Algebraic Graph ates an edge for every source (lines 17 to 20). Transformation (2006), 37–71 The second implication is correct, because the algo- [7] Corradini, A., Dotti, F. L., Foss, L., Ribeiro, L.: Translat- rithm creates merge nodes when there is too many sources ing java code to graph transformation systems. In: Graph (line 25). The merge is created based on the execution path Transformations, Springer, 2004, 383–398 stored for each input, it is connected to the last instruction [8] Geiger, L., Zündorf, A.: Graph based debugging with fu- of the prefix shared by both paths (lines 29 to 32). This jaba. Electr. Notes Theor. Comput. Sci. 72 (2) (2002), 112 means that the merge is driven by the conditional branch [9] Balasubramanian, D., Narayanan, A., van Buskirk, C., Kar- that changed the execution path and decided what source sai, G.: The graph rewriting and transformation language: would be used. Great. Electronic Communications of the EASST 1 (2007) [10] Allen, R., Kennedy, K.: Optimizing compilers for modern architectures. Morgan Kaufmann San Francisco, 2002 7 Conclusions [11] Sridharan, M., Bodík, R.: Refinement-based context- sensitive points-to analysis for Java. ACM SIGPLAN No- We designed the flow graph to represent a procedural code tices 41 (6) (2006), 387–400 along with important information about its structure and [12] Muchnick, S. S.: Advanced compiler design implementa- behavior. We defined the HFG semantics using graph tion. Morgan Kaufmann Publishers, 1997 rewriting, which is basically a grammar on graphs. We de- [13] Reps, T.: Program analysis via graph reachability. Informa- fined semantics for both the sequential and general HFG tion and software technology 40 (11) (1998), 701–726 using the formalism, thus making the sequential HFG a [14] Geilen, M., Basten, T.: Requirements on the execution of special version of the general HFG. We designed an algo- Kahn process networks. In: Programming languages and systems, Springer, 2003, 319–334 rithm that transforms a sequential source code (C# com- piled to CIL) to a general HFG using mainly the HFG se- [15] Navarro, A., Asenjo, R., Tabik, S., Cascaval, C.: Analytical modeling of pipeline parallelism. In: Parallel Architectures mantics and graph algorithms. Therefore, the algorithm and Compilation Techniques, 2009, PACT’09, 18th Inter- can be modified for other platforms. national Conference on, IEEE, 2009, 281–290. We have implemented a prototype of the general HFG [16] Falt, Z., Kruliš, M., Bednárek, D., Yaghob, J., Zavoral, that uses the defined semantics and is capable of com- F.: Locality aware task scheduling in parallel data stream pletely parallel computation. The prototype serves as the processing. In: Intelligent Distributed Computing VIII, proof of concept, showing that the HFG is defined cor- ser. Studies in Computational Intelligence, D. Camacho, rectly. It is implemented using the Bobox framework [16]. L. Braubach, S. Venticinque, and C. Badica, Eds., Springer International Publishing, 2015, 570, 331–342 [17] Falt, Z., Čermák, M., Dokulil, J., Zavoral, F.: Paral- Acknowledgments lel SPARQL query processing using Bobox. International Journal On Advances in Intelligent Systems 5 (3,4) (2012), This paper was partially supported by the Grant Agency 302–314 of Charles University (GAUK) project 122214 and [18] McNaughton, R., Yamada, H.: Regular expressions and by the Czech Science Foundation (GACR) project state graphs for automata, 1960. P103/13/08195. [19] Jaffe, J.: A necessary and sufficient pumping lemma for regular languages. ACM SIGACT News 10 (2) (1978), 48– 49 References [1] “TG3. Common Language Infrastructure (CLI). Standard ECMA-335, June 2005.”