Using Algebra-Algorithmic and Term Rewriting Tools
         for Developing Efficient Parallel Programs

           Anatoliy Doroshenko1, Kostiantyn Zhereb1 and Olena Yatsenko1
        1 Institute of Software Systems of National Academy of Sciences of Ukraine,

                    Glushkov prosp. 40, 03187 Kyiv, Ukraine
    doroshenkoanatoliy2@gmail.com, zhereb@gmail.com, oayat@ukr.net


       Abstract. An approach to program design and synthesis using algebra-
       algorithmic specifications and rewriting rules techniques is proposed. An alge-
       bra-algorithmic toolkit based on the approach allows building syntactically cor-
       rect and easy-to-understand algorithm specifications. The term rewriting system
       supplements the algebra-algorithmic toolkit with facilities for transformation of
       the sequential and parallel algorithms, enabling their improvement.


       Keywords. Algebra of algorithms, code generation, formalized design of pro-
       grams, parallel computation, term rewriting


       Key terms. FormalMethod, HighPerformanceComputing, ConcurrentComputa-
       tion, Integration


1      Introduction

Nowadays uniprocessor systems are almost fully forced out by multiprocessor ones,
as the latter allow getting the considerable increase of productivity of programs. Thus,
the need of program parallelization arises [10]. There are libraries, such as pthreads,
OpenMP, TBB and others [1], allowing developers to write parallel programs. Using
these libraries a programmer manually divides code into independent sections, de-
scribes data exchange and synchronization between them. However, such method has
substantial defects, in particular, related to committing of errors into program code
and a time required for parallelization and debugging. Therefore, the parallelization
process has to be automatized as much as possible, and in an ideal, should be carried
out fully automatically, without participation of a programmer.
   This paper continues our research on automation of process of designing and de-
velopment of efficient parallel programs, started in [2], [9], [10], [11]. Our approach
is based on usage of Integrated toolkit for Designing and Synthesis of programs (IDS)
[2], [19]. The process of algorithm designing in IDS consists in the composition of
reusable algorithmic components (language operations, basic operators and predi-
cates), represented in Systems of Algorithmic Algebras (SAA) [2], [9], [19]. We used
IDS for generation of sequential and parallel programs in Java and C++ on the basis
                          Using Algebra-Algorithmic and Term Rewriting Tools …       39


of high-level algorithm specifications (schemes). To automate the transformations of
algorithms and programs we use term rewriting system Termware [8], [11]. The nov-
elty of this paper is 1) adjusting IDS to generate parallel code in Cilk++ language,
which is an extension to the C and C++ programming languages, designed for multi-
threaded parallel computing [7] and 2) closer integration between IDS and Termware
systems. The approach is illustrated on a recursive sorting algorithm (quick sort).
   The problem of automated synthesis of program code from specifications has been
studied extensively and many approaches have been proposed [13], [14]. Important
aspects of program synthesis include 1) format of inputs (specifications), 2) methods
for supporting concrete subject domains and 3) techniques for implementing trans-
formation from specifications to output program code (these aspects roughly corre-
spond to 3 dimensions of program synthesis discussed in [14]). For input specifica-
tion, a popular option is using domain-specific languages (DSLs) [4], [17] that allow
capturing requirements of subject domain. Other options include graphical modeling
languages [5], [17], formal specification languages [16], ontologies [6] and algebraic
specifications [3]. Using such formalisms enables analysis and verification of specifi-
cations and generated code. There are also approaches that provide specification not
of program or algorithm, but of problem to be solved, in form of functional and non-
functional constraints [18], examples of input/output pairs [15], or natural language
descriptions [14].
   Another crucial aspect of program synthesis is specialization for subject domain.
Some approaches are restricted to a single domain, such as statistical data analysis
[12] or mobile application development [17]; others provide facilities for changing
domain-specific parts, by using ontological descriptions [6], grammars [16], or by
providing generic framework that is complemented by domain-specific tools [18].
   Finally, an important aspect is transformation from input specification into source
code in a target language. A transformation algorithm can be hand-coded [12], but it
reduces flexibility of system. Therefore, transformation is often described in a de-
clarative form, such as rewriting rules [16], visualized graph transformations [17],
code templates [6]. More complex approaches require searching the space of possible
programs [18], possibly using genetic programming or machine learning approaches
[14]. In [4], partial synthesis is proposed: generic parts of application are generated,
and then completed with specific details manually.
   In comparison, our approach uses algebraic specifications, based on Glushkov al-
gebra of algorithms [2], but they can be represented in three equivalent forms: alge-
braic (formal language), natural-linguistic and graphical, therefore simplifying under-
standing of specifications and facilitating achievement of demanded program quality.
Another advantage of IDS is a method of interactive design of syntactically correct
algorithm specifications [2], [19], which eliminates syntax errors during construction
of algorithm schemes. Specialization for subject domain is done by describing basic
operators and predicates from this domain. Our approach uses code templates to spec-
ify implementations for operators and predicates; program transformations, such as
from sequential to parallel algorithm, are implemented as rewriting rules. Such sepa-
ration simplifies changing subject domain or transformations.
40      A. Doroshenko, K. Zhereb and O. Yatsenko


2      Formalized Design of Programs in IDS and Termware

The developed IDS toolkit is based on System of Algorithmic Algebras (SAA), which
are used for formalized representation of algorithmic knowledge in a selected subject
domain [2], [9], [19]. SAA is the two-based algebra SAA = <{U, B}; >, where U is a
set of logical conditions (predicates) and B is a set of operators, defined on an infor-
mational set;  = 1  2 is the signature of operations consisting of the systems 1
and 2 of logical operations and operators respectively (these will be considered be-
low). Operator representations of algorithms in SAA are called regular schemes. The
algorithmic language SAA/1 [2] is based on mentioned algebra and is used to describe
algorithms in a natural language form. The algorithms, represented in SAA/1, are
called SAA schemes.
   Operators and predicates can be basic or compound. The basic operator (predicate)
is an operator (predicate), which is considered in SAA schemes as primary atomic
abstraction. Compound operators are built from elementary ones by means of opera-
tions of sequential and parallel execution operators, branching and loops, and syn-
chronizer WAIT ‘condition’ that delays the computation until the value of the
condition is true (see also Table 1 in next section).
   The advantage of using SAA schemes is the ability to describe algorithms in an
easy-to-understand form facilitating achievement of demanded quality of programs.
The IDS is intended for the interactive designing of schemes of algorithms in SAA
and generating programs in target programming languages (Java, С++, Cilk++). In
IDS algorithms are designed as syntactically correct programs ensuring the syntactical
regularity of schemes. IDS integrates three forms of design-time representation of
algorithms: regular schemes, SAA schemes (textual representation of SAA formulae)
and flow graphs. For integration with Termware, in this paper IDS was also adjusted
on generation of programs in Termware language.


                          Fig. 1. Architecture of the IDS toolkit

   The IDS toolkit consists of the following components (Fig. 1): constructor, in-
tended for dialogue designing of syntactically correct sequential and concurrent algo-
rithm schemes and generation of programs; flow graph editor; generator of SAA
schemes on the basis of higher level schemes, called hyper-schemes [19]; and data-
                           Using Algebra-Algorithmic and Term Rewriting Tools …         41


base, containing the description of SAA operations, basic operators and predicates in
three mentioned forms, and also their program implementations.
    The constructor is intended to unfold designing of algorithm schemes by superpo-
sition of SAA language constructs, which a user chooses from a list of reusable com-
ponents for construction of algorithms. The design process is represented by a tree of
an algorithm [2], [19]. On each step of the design process the constructor allows the
user to select only those operations, the insertion of which into the algorithm tree does
not break the syntactical correctness of the scheme. The tree of algorithm constructing
is then used for automatic generation of the text of SAA scheme, flow graph and the
program code in a target programming language.
    Example 1. We illustrate the use of SAA on Quicksort algorithm, which is given
below in the form of SAA scheme. The identifiers of basic operators in the SAA
scheme are written with double quotes and basic predicates are written with single
quotes. Notice that identifiers can contain any text explaining the meaning of operator
or predicate. It is not interpreted: it has to match exactly the specification in the data-
base (however, since constructs are not entered manually, but selected from a list, the
misspellings are prevented). The comments and implementations of compound opera-
tors and predicates in SAA schemes begin with a string of “=” characters.

SCHEME QUICKSORT_SEQUENTIAL ====
"main(n)"
==== Locals (
     "Declare an array (a) of type (int) and size (n)";
     "Declare a variable (i) of type (int)";
     "Declare a variable (end) of type (int)");
     "Fill the array (a) of size (n) with random
      values";
     "end := a + n";
     "qsort(a, end)";

"qsort(begin, end)"
==== IF NOT('begin = end')
        "Reduce (end) by (1)";
        "Reorder array (a) with range (begin) and (end)
         so that elements less than pivot (end) come
         before it and greater ones come after it; save
         pivot position to variable (middle)";
        "qsort(begin, middle)";
        "Increase (middle) by (1)";
        "Increase (end) by (1)";
        "qsort(middle, end)"
     END IF
END OF SCHEME QUICKSORT_SEQUENTIAL
42      A. Doroshenko, K. Zhereb and O. Yatsenko


   To automate the transformation (e.g. parallelization) of programs we augment ca-
pabilities of IDS with rewriting rules technique [8], [11]. At the first step we construct
high-level algebraic models of algorithms based on SAA in IDS (see also [2], [9],
[19]). After high-level program model is created, we use parallelizing transformations
to implement a parallel version of the program on a given platform (multicore in this
paper). Transformations are represented as rewriting rules and therefore can be ap-
plied in automated manner. The declarative nature of rewriting technique simplifies
adding new transformations. Also transformations are separated from language defini-
tions (unlike approach used in [16]), therefore simplifying addition of new transfor-
mations or new languages.
   We use the rewriting rules system Termware [8], [11]. Termware is used to de-
scribe transformations of terms, i.e. expressions in a form f  t1 , , tn  . Transforma-
tions are described as Termware rules, i.e. expressions of form source [condi-
tion]-> destination [action]. Here source is a source term (a pattern
for match), condition is a condition of rule application, destination is a trans-
formed term, action is additional action that is performed when rule fires. Each of
4 components can contain variables (denoted as $var), so that rules are more gener-
ally applicable. Components condition and action are optional. They can exe-
cute any procedural code, in particular use the additional data on the program.


3      Generation of Terms and Programs and Experimental Results

IDS system performs generation of programming code on the basis of an algorithm
tree, received as a result of designing an algorithm in the IDS Constructor (see Sec-
tion 2), and also code templates – implementations of basic operators and predicates
in a target language (Java, C++, Cilk++), that are stored in IDS database. In the proc-
ess of generation, IDS translates SAA operations into corresponding operators of
programming language. Compound operators can be represented as subroutines
(methods). IDS database contains various code patterns for generation of parallel
programs, namely using WinAPI threads, Message Passing Interface (MPI), and
Cilk++ operations [7]. For implementation of parallel version of our illustrative ex-
ample (Quicksort algorithm), we used Cilk++ as it facilitates programming of recur-
sive parallel programs [7]. Cilk++ is a general-purpose programming language, based
on C/C++ and designed for multithreaded parallel computing.
   Table 1 gives a list of main SAA operations and templates of their implementation
in Termware and Cilk++, which are stored in the IDS database. The implementations
contain placeholders like ^condition1^, ^operator1^ etc., which are replaced
with program code during the program generation.
   For the purpose of transformation of some algorithm, IDS performs the generation
of a corresponding term and developer specifies a set of rules for transformation.
Then Termware carries out the actual transformation, the result of which can further
be used for code generation in a programming language.
                          Using Algebra-Algorithmic and Term Rewriting Tools …        43

Table 1. The main SAA operations and templates of their implementation in Termware and
Cilk++ languages

  Text of SAA operation       Termware implementation      Cilk++ implementation
  “operator1”;                then (^operator1^,           ^operator1^;
  “operator2”                 ^operator2^)                 ^operator2^
  IF ‘condition’              IF (^condition1^,            if (^condition1^){
  THEN “operator1”            ^operator1^,                 ^operator1^ }
  ELSE “operator2”            ELSE (^operator2^))          else {^operator2^}
  END IF
  FOR '(var) from             FOR (%1, %2, %3,             for (%1, %2, %3) {
  (begin) to                       ^operator1^               ^operator1^
  (end)'                      )                            }
  LOOP “operator1”
  END OF LOOP
  (“operator1”                Parallel(                    cilk_spawn
  PARALLEL “opera-            ^operator1^,                 ^operator1^;
  tor2”)                      ^operator2^)                 ^operator2^
  WAIT ‘condition’            WAIT                         cilk_sync;
                              (^condition1^)

   Example 2. We will parallelize the sequential Quicksort algorithm (see Exam-
ple 1), using IDS and Termware. For the parallelization, function qsort has to be
transformed, so we generated the term for this function:

  qsort(Params(begin, end),
       IF (NOT(Equal(begin, end)),
           then (Dec(end, 1),
           then (Partition(a, begin, end, end),
           then (CALL(qsort(begin, middle)),
           then (Inc(middle, 1),
           then (Inc(end, 1),
           CALL (qsort(middle, end)))))))))

  Then the operation of parallel execution of operations has to be added to this term.
This is done by applying the following two Termware rules:

1. then(CALL($x), then ($y, $z)) ->
                 Parallel (CALL($x), then($y, $z))
2. then($x1, Parallel($x2, $x3)) ->
                 then($x1, then(Parallel($x2, $x3),
                       WAIT(AllThreadsCompleted(n))))

   The first rule replaces the operation of sequential execution of operators with paral-
lel execution. The second rule adds a synchronizer WAIT(AllThreads Com-
44      A. Doroshenko, K. Zhereb and O. Yatsenko


pleted(n), which delays the computation until all threads complete their work.
The result of the transformation is given below.

qsort(Params(begin, end),
IF(NOT(Equal(begin, end)),
     then (Dec(end, 1),
     then (Partition(a, begin, end, end),
     then (Parallel(
                   CALL (qsort(begin, middle)),
                   then (Inc(middle, 1),
                   then (Inc(end, 1),
                   CALL (qsort(middle, end))))),
              WAIT(AllThreadsCompleted(n)))))))
   Thus, as a result of parallelization, the first operator (thread) of Parallel opera-
tion executes the operator qsort(begin, middle), and the second one calls
two       Inc      operators      and       qsort(middle, end).               Operation
WAIT(AllThreadsCompleted(n)) performs the synchronization of
threads. The threads are created recursively; their quantity is specified as an input
parameter of function main. Notice that these transformations are only valid if two
qsort calls are independent. The system doesn’t check this property: it has to be
asserted by a developer.
   The resulting parallel algorithm scheme Quicksort was used for generation of code
in Cilk++ using IDS system. The parallel program was executed on Intel Core 2 Quad
CPU, 2.51 GHz, Windows XP machine. Fig. 2 shows the program execution time in
seconds. The speedup at execution of program with usage of 2, 3 and 4 processors
was 2; 2.9 and 3.8 accordingly, which shows that the program has a good degree of
parallelism and is scalable.


Fig. 2. The execution time of parallel Quicksort program on a quad-core processor; the size of
input array is 5107 elements
                           Using Algebra-Algorithmic and Term Rewriting Tools …          45


4      Conclusion

We have described our approach of constructing efficient parallel programs using
high-level algebra-algorithmic specifications and rewriting rules technique. Algebra-
algorithmic toolkit IDS and rewriting rules engine Termware are combined to enable
formal, yet easy-to-understand algorithm specifications and automate program syn-
thesis and parallelization process. The combined development toolkit can be retar-
geted to various subject domains and implementation languages, as exemplified by
Cilk++. The developed system could be further extended with automated code analy-
sis facilities based on rewriting technique.


References
 1. Akhter, S., Roberts, J.: Multi-Core Programming. Intel Press, Hillsboro (2006)
 2. Andon, F. I., Doroshenko, A. Y., Tseytlin, G. O., Yatsenko, O. A.: Algebra-Algorithmic
    Models and Methods of Parallel Programming. Akademperiodika, Kyiv (2007) (in Rus-
    sian)
 3. Apel, S. et al.: An Algebraic Foundation for Automatic Feature-Based Program Synthesis.
    Science of Computer Programming. 75(11), 1022–1047 (2010)
 4. Bagheri, H., Sullivan, K.: Pol: Specification-Driven Synthesis of Architectural Code
    Frameworks for Platform-Based Applications. In: Proc. 11th Int. Conf on Generative Pro-
    gramming and Component Engineering, pp. 93–102, ACM, New York (2012)
 5. Batory, D.: Program Refactoring, Program Synthesis, and Model-Driven Development. In:
    Proc. 16th Int. Conf. on Compiler Construction. LNCS 4420, pp. 156–171 Springer-
    Verlag, Berlin Heidelberg (2007)
 6. Bures, T. et al.: The Role of Ontologies in Schema-Based Program Synthesis. In: Proc.
    Workshop on Ontologies as Software Engineering Artifacts, Vancouver (2004)
 7. Cilk Home Page, http://cilkplus.org/
 8. Doroshenko A., Shevchenko R.: A Rewriting Framework for Rule-Based Programming
    Dynamic Applications, Fundamenta Informaticae, 72(1–3), 95–108 (2006)
 9. Doroshenko, A., Tseytlin, G., Yatsenko, O., Zachariya, L.: A Theory of Clones and For-
    malized Design of Programs. In: Proc. Int. Workshop on Concurrency, Specification and
    Programming (CS&P’2006), pp. 328–339, Wandlitz, Germany (2006)
10. Doroshenko, A. Y., Zhereb, K. A., Yatsenko, Ye. A.: On Complexity and Coordination of
    Computation in Multithreaded Programs. Problems in Programming, 2, 41–55 (2007) (in
    Russian)
11. Doroshenko, A., Zhereb, K.: Parallelizing Legacy Fortran Programs Using Rewriting
    Rules Technique and Algebraic Program Models. In: Ermolayev, V. et al. (eds.) ICT in
    Education, Research, and Industrial Applications. CCIS 347, pp. 39–59. Springer Verlag,
    Berlin Heidelberg (2013)
12. Fischer, B., Schumann, J.: AutoBayes: a System for Generating Data Analysis Programs
    from Statistical Models. J. Funct. Program. 13(3), 483–508 (2003)
13. Flener, P.: Achievements and Prospects of Program Synthesis. In: Kakas, A. C., Sadri, F.
    (eds.) Computational Logic: Logic Programming and Beyond. LNCS 2407, pp. 310–346,
    Springer Verlag, London (2002)
46      A. Doroshenko, K. Zhereb and O. Yatsenko


14. Gulwani, S.: Dimensions in Program Synthesis. In: 12th Int. ACM SIGPLAN Symposium
    on Principles and Practice of Declarative Programming, pp. 13–24. ACM, New York
    (2010)
15. Kitzelmann, E.: Inductive Programming: a Survey of Program Synthesis Techniques. Ap-
    proaches and Applications of Inductive Programming, LNCS 5812, pp. 50–73. Springer
    Verlag, Berlin Heidelberg (2010)
16. Leonard, E. I., Heitmeyer, C. L.: Automatic Program Generation from Formal Specifica-
    tions using APTS. In: Automatic Program Development. A Tribute to Robert Paige, pp.
    93–113. Springer Science, Dordrecht (2008)
17. Mannadiar, R., Vangheluwe, H.: Modular Synthesis of Mobile Device Applications from
    Domain-Specific Models. In: Proc. 7th Int. Workshop on Model-Based Methodologies for
    Pervasive and Embedded Software, pp. 21–28. ACM, New York (2010)
18. Srivastava, S., Gulwani, S., Foster, J. S.: From Program Verification to Program Synthesis.
    In: Proc. 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-
    ming Languages, pp. 313–326. ACM, New York (2010)
19. Yatsenko, O.: On Parameter-Driven Generation of Algorithm Schemes. In: Proc. Int.
    Workshop on Concurrency: Specification and Programming (CS&P’2012), pp. 428–438,
    Berlin, Germany (2012)