On the Possibility of Parallel Programs Synthesis from Algorithm Graph Specification Ark.V. Klimov[0000-0002-7030-1517] Institute of Design Problems in Microelectronics (IPPM) of RAS arkady.klimov@gmail.com Abstract. To solve the actual problem of automated parallel programming, the UPL language is proposed for specifying the M-graph of the algorithm and a roadmap for generating programs on its basis for various parallel platforms in automatic mode is presented. For this purpose, in addition, the user must speci- fy the distribution functions that map calculations to the platform resources. The algorithm for generating a loop nest by the M-graph of the algorithm and the combined distribution function is described, and its operation is demonstrat- ed by a simple example. Keywords: Parallel Programming, Dataflow Computation Model, Algorithm Graph, Graph Representation Language, Distribution Function, Automated Program Synthesis. 1 Introduction At present, program efficiency is impossible without parallelism. But, unlike the clas- sical Von Neumann type sequential computer, there is still no common computation model and a corresponding programming language that could automatically be trans- lated with acceptable efficiency (like C language) to all existing platforms of parallel and distributed computations. Worse, there are many different types of platforms (OpenMP, MPI, CUDA, TBB, ..., FPGA), for each of which one has to radically re- build the entire algorithm. As an answer to this situation, appeared the AlgoWiki project [1]. Its goal was to present all known fundamental computational algorithms in a way as to first describe the algorithm in some abstract form independently of the platform, and then, in the second part, to describe the peculiarities of various implementations for different specific platforms. Now, more than a hundred different algorithms are so described. For the abstract representation of algorithms, the concept of an algorithm graph is introduced, in which information links between computing nodes are explicitly ex- pressed: nodes represent calculations, arcs represent connections between outputs and inputs of various nodes. Unfortunately, the AlgoWiki project offers no formal lan- guage for describing the abstract algorithm graph, regarding which it was possible to raise questions about verification and further transformations of algorithms, including the generation of code for various platforms. Here, we try to fill this gap. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 220 To this end, having specified the concept of an algorithm graph (Section 2), we propose an algorithm graph representation language UPL (Section 3). This language can be considered as a programming language with full-fledged semantics that allows it to be executed as a program on some abstract machine (Section 4). To turn a graph into an efficiently working code, we first need to distribute the computations by space (placement, section 5) and time (scheduling, section 6). Section 7 introduces the con- cept of a unified distribution function, section 8 describes a general method for con- structing it, and section 9 presents a universal method for generating code from an algorithm graph and a distribution function. The method is illustrated with a simple example. The novelty of our approach is that, unlike many systems for automatic parallelization, where these distributions are automatically generated, we consider them as a tool for manual programming and compilation control: by setting these functions, the user controls the code generation, whose efficiency depends from the user selection. In the paper we try to show that the concept of distribution function not only allows to automatically generate the code according to it, but also is a convenient and adequate means to control the properties of the generated code. 2 The Concept of Algorithm Graph First of all, let us pay attention to the ambiguity associated with the concept of an algorithm graph. We ought to distinguish between a computation graph in which the nodes are individual computational acts and the algorithm graph properly, which in a compact form describes all kinds of computation graphs that arise when this algo- rithm works with specific input data. The first can be huge, while the second can usu- ally fit in one or more pages. For brevity and definiteness, we will call it the R-graph (running graph, and also - lattice graph due to the Rostov-on-Don school [3]). And the actual graph of the algorithm will be called the M-graph (macro-graph, meta- graph, etc.). To avoid ambiguity, the nodes of the R-graph will also be called virtual1 nodes. In the literature, the terms graph of information dependencies or information graph are also found. These can be understood as both an M-graph and an R-graph, depending on the context. An M-graph can be build automatically from a fragment of code in C or Fortran if the fragment belongs to a so-called affine class. Such a fragment can contain only loops and assignment statements, where all accesses to array elements and loop boundaries are affine expressions in enclosing loops variables and fixed structural parameters (for example, 2*i+j, N-1). Conditional statements with affine condi- tions (like i1 and N>2, and that the whole array A [0:N] is the input as well as the output. This M-graph exactly mimics the Heat1 pro- gram up to transition nodes, such as node A (which have a single input that is trans- mitted to output as it is). In Fig. 2 (right) is shown the corresponding R-graph for N = 5 and M = 3. Each nodes have an index with one or two components, according to the number of cycles enclosing the corresponding statement. Within the text, indices are usually written in angle brackets, and in hexagons on the graph. A regular node is drawn as a block with one or more named input ports, shown as truncated yellow circle attached to the block. A transition node is depicted as a yellow oval. Node tag is declared on a pink hexagon. Each arc (edge) has a label with a list of index expressions in a blue hexa- gon: this is the tag of the target node. Near the beginning of the arc, the emission con- dition may be written. An expression that defines a transmitted value normally is written in a small box on the arc; otherwise it is either the formula in the sender node or the value of its first input port. All index expressions standing on arcs calculate the recipient index based on the sender context. We say that the M-graph works in the scatter paradigm, which means that a producer knows where and to whom the produced value should be transmitted. If you revert all arcs, making index expressions to calculate the sender index in the context of the receiver, you get a graph in the gather paradigm, where a consumer knows where to get the input data from, but knows nothing about where to transfer the results - it just puts them into their output data storage, and whoever needs them will pick them up from there. For polyhedral models, such a transformation is always possible, but not always in the general case. The scatter paradigm is, in principle, better suited for parallel and distributed setting, since it provides just one-way data transfer along the arcs, while in the gather paradigm the data is first saved and then transmitted in response to a request. 223 i=0..N i=0 Ain i i=N 01 0