=Paper=
{{Paper
|id=Vol-2046/bozo-et-al2
|storemode=property
|title=Analysing the hierarchical structure of Erlang applications
|pdfUrl=https://ceur-ws.org/Vol-2046/bozo-et-al2.pdf
|volume=Vol-2046
|authors=István Bozó,Bence János Szabó,Melinda Tóth
}}
==Analysing the hierarchical structure of Erlang applications==
<pdf width="1500px">https://ceur-ws.org/Vol-2046/bozo-et-al2.pdf</pdf>
<pre>
            Analysing the hierarchical structure of Erlang
                             applications

                               István Bozó                 Bence János Szabó
                       bozoistvan@caesar.elte.hu           szbence@hotmail.com
                                             Melinda Tóth
                                     tothmelinda@caesar.elte.hu
                                            ELTE Eötvös Loránd University
                                               Faculty of Informatics
                                                 Budapest, Hungary


                                                        Abstract
                       Understanding and maintaining legacy software is a well-known prob-
                       lem. Static analysis tools aims to support developers in these tasks. We
                       present a static analysis method for extracting the supervisor hierarchy
                       of Erlang programs, which can support program comprehension and
                       maintenance. Beside the algorithm we introduce two different views of
                       the supervisor graph, a compact supervisor graph with only the essen-
                       tial information, and a more detailed full supervisor graph. We demon-
                       strate the method on an example Erlang application and present the
                       two different views of the supervisor hierarchy. The method has been
                       implemented as an extension of the RefactorErl analyser framework.

1    Introduction
RefactorErl [1, 2] is a static source code analyser, comprehension and refactoring tool for Erlang [3, 4]. The aim
of the tool is to support source code comprehension and maintenance. RefactorErl is an extensible framework
that provides several analysis features [5] and tools. The extracted information of the source code is stored in a
Semantic Program Graph [6] (SP G) which is available for users through a query language [7].
   The tool provides several basic semantic analyses, like function call analysis, variable binding and reference
analysis, data flow analysis, etc. There are several advanced analyses built on the basic semantic analyses,
for example duplicate code analysis, dead code analysis, and dependency graphs, etc. However, the analysis of
supervisors and applications [8] of Erlang programs was not supported. Both concepts are fundamental in Erlang
because they describe dependencies and connections between Erlang modules.
   A supervisor is a process that monitors and manages other processes and handle errors based on the given
strategies. A supervised process can be a worker or an other supervisor process. The hierarchically built structure
of supervisor processes are called supervision trees. Using supervisors is a good programming practice to build
fault-tolerant applications. Understanding the hierarchy of complex and concurrent or distributed applications
is essential in maintenance and support phases.
   In this paper we are going to extend the semantic layer of the SP G with information about supervisors.
We provide formal definitions both for the extensions of the model of the semantic program graph and for the
algorithms of the semantic analyses. In addition we define two new graphs, a compact view of the supervision
hierarchy and a view of the full supervisor graph including detailed information.

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: E. Vatai (ed.): Proceedings of the 11th Joint Conference on Mathematics and Computer Science, Eger, Hungary, 20th – 22nd of
May, 2016, published at http://ceur-ws.org


                                                              42
   The paper is structured as follows. Section 2 describes the necessary background information for understanding
the approach. Section 3 describes the formal extension of the used source code representation (SP G) and the
algorithm of the analysis. Section 4 introduces two graphs/views with different granularity level. Section 5
demonstrates the presented algorithm and views on an example. Section 6 surveys the supervisor concept used
in other programming languages. Finally, Section 7 summarises our results and presents our conclusions.

2     Background
Before discussing the method in details, it is necessary to survey the Erlang supervisor and the RefactorErl
framework used as a basis for the analysis.

2.1    Erlang supervisors
The supervisor behaviour is a library module that contains generic code for process monitoring and managing.
The specific code is implemented in the callback module. The only mandatory function in the callback module
is the function init/1. This function defines the restart strategy, the maximum restart intensity and the child
specifications of the supervisor process. The child specification describes the identifier of the child, the function to
start the process, the restart option, the timeout of the shutdown, the type of child process (worker or supervisor)
and its callback module.
   The supervisor process can be started with the supervisor:start link/2,3 functions. These functions
require the name of the supervisor (optional), the callback module and startup arguments. When the function
is applied the init/1 function is invoked from the provided callback module with the specified arguments.
   We must note here that besides the static initialising arguments, there are several supervisor manipulating
functions such as adding, restarting, deleting a child process, or querying and checking child specifications, etc
in runtime.

2.2    RefactorErl
RefactorErl stores the analysed data in a special directed graph called semantic program graph (SP G). This
graph includes all the information that can be statically extracted from the source code with different analyses.
The SP G consists of three layers. The first is the lexical layer, the second is the syntactic layer and the third is
the semantic layer.
   The semantic layer is constructed by several analyses which can be divided into two groups. The first is the
group of pre analyses which are executed when an Erlang source file is being loaded into RefactorErl, and the
second is the group of post analyses which need to be run manually after the pre analysis phase.
   The SP G is a directed graph that can be described as an ordered hextuple:

                                             SP G = (V, AN , AV , l, E, e)

The components of the hextuple are the following:
    • V = (Vlex ∪ Vsyn ∪ Vsem ∪ Vr ): The nodes of the SP G are the union of the lexical, syntactic and semantic
      nodes, extended with a special singleton set of the root node.
    • AN : The set of attribute names of the nodes.
    • AV : The set of possible attribute values.
    • l : V × AN → AV : The node labelling partial function.
    • E: The set of edge labels.
    • e : V × E × N → V : The partial function that describes labelled, ordered edges between the graph nodes.
   The SP G has several type of nodes with different attributes describing the properties of the nodes. The nodes
of the SP G and their attributes that are important from the aspect of the supervisor analysis are as follows:

Root: A special singleton set that includes the root node of the SP G. Symbol: Vr
File: Set of syntactic nodes representing the files in the SP G. Symbol: Vf i ⊆ Vsyn


                                                          43
Expr: Set of syntactic nodes representing the expressions in the SP G. The expression nodes have a type
   attribute which describes the type of the expression. The supervisor analysis uses expression nodes with
   type application, which means that the expression is a function invocation. Symbol: Ve ⊆ Vsyn
Module: Set of semantic nodes representing an Erlang module in the SP G. The module nodes have a name
  attribute which describes the name of the module. Symbol: Vm ⊆ Vsem
Behaviour: Set of semantic nodes representing the behaviours in the SP G. A behaviour node is present in the
   SP G if the analysed module implements a behaviour. The type attribute of the node describes the name
   of the implemented behaviour. Symbol: Vb ⊆ Vsem
Func: Set of semantic nodes representing the functions in the SP G. The functions are identified by the triple
   of the name of the defining module, and the name and arity of the function ((M,F,A)). Symbol: Vf ⊆ Vsem

3     Supervisor analysis
The analysis defined in this paper extends the SPG with the supervisor semantic node and new labelled links
between semantic and syntactic nodes of the graph according to the semantics of the supervisor behaviour.

3.1    SP G extension
We formally specify how do we extend the previously described set of semantic nodes with a set of Supervisor
nodes (Vsup ) and the directed edges between the nodes of the graph.

Attributes of the Supervisor node:
    • name: The name of the supervisor. Type: atom()
    • type: The type of the supervisor. Type: [supervisor,worker]
    • registered name: This attribute contains the possible registered names of the supervisor.
      Type: supRegName() = [{[via], [Module :: atom()], [Name :: atom()]}],
      {[Scope :: atom()], [Name :: atom()]}.
    • data: If the supervisor is a worker then this attribute contains every information about the worker.
      Type: supPropList() = [{child id, [atom()]}, {restart, [atom()]}, {shutdown, [integer()]},
      {modules, [atom()]}].
    • strategy: The restart strategy of the supervisor.
      Type:      supStrategy() = [{[Strategy :: strategy()], [MaxTime :: integer()], [MaxRestart
      :: integer()]}] where the strategy() is the type of restart strategies (one for all, one for one, rest -
      for one, simple one for one), MaxTime and MaxRestart describe the maximum intensity of restarts within
      the given time period.
    • start args: Contains the last argument of the supervisor:start link/2,3 function calls which can start
      the actual supervisor. The last argument is passed to the init/1 callback function. Type: [term()]

New edges from the Supervisor node:
    • sup cb module : Vsup → Vm : Points to the node of the callback module of the supervisor.
    • sup start : Vsup → Vf : Points to the function nodes that apply the functions supervisor:start link/2,3
      in their definition (potentially start the supervisor process).
    • sup init : Vsup → Vf : Points to the function node init/1 in the callback module.
    • sup worker : Vsup → Vsup : Points from a supervisor to its direct successor supervisor nodes (in the
      supervision tree).
    • w def : Vsup → Vf : Points from a worker supervisor node to the function node that defines the worker.
    • sup def : Vsup → Vsup : Points from a supervisor to its defining supervisor node.


                                                        44
  • sup ref : Vsup → Ve : Points to the function calls (expressions) that can modify or query the supervisor in
    runtime.

New edges to the Supervisor node:
  • beh sup : Vb → Vsup : This edge points from the behaviour node to the supervisor node if its type attribute
    is supervisor.

Formal extension of the SPG
Let SP G0 denote the extended semantic program graph:

                                          SP G0 = (V 0 , A0N , A0V , l0 , E 0 , e0 )
The set of semantic nodes (Vsem ) is extended with the set of Supervisor nodes (Vsup ):
                                                   0
                                                 Vsem = Vsem ∪ Vsup

                                                    V 0 = V ∪ Vsem
                                                                0

The sets of attribute names (AN ) and values (AV ) are extended with the attributes and possible values of
attributes of the Supervisor node:
                       Asup
                        N = {name, type, registered name, data, strategy, start args}

     Asup
      V   = {atom() ∪ [supervisor|worker] ∪ supRegN ame() ∪ supP ropList() ∪ supStrateg() ∪ [term()]}

                                                   A0N = AN ∪ Asup
                                                               N
                                                   A0V = AV ∪ Asup
                                                               V
The set of edge labels (E) is extended with the new labels:
                       Esup = {sup def, sup cb module, sup start, sup init, sup worker,
                                             w def, sup ref, beh sup}
                                                    E 0 = E ∪ Esup

The node labelling partial function l is extended with the following cases for an arbitrary x ∈ Vsup node:
                                              l0 : x × name → atom()
                                       l0 : x × type → [supervisor|worker]
                                   l0 : x × registered name → supRegN ame()
                                          l0 : x × data → supP ropList()
                                        l0 : x × strategy → supStrategy()
                                          l0 : x × start args → [term()]

The partial function e is extended with the following cases for an arbitrary x ∈ Vsup node and an arbitrary n ∈ N
natural number:
                                      e0 : x × sup cb module × n → y ∈ Vm
                                e0 : x × {sup start, sup init, w def } × n → y ∈ Vf
                                 e0 : x × {sup worker, sup def } × n → y ∈ Vsup
                                          e0 : x × sup ref × n → y ∈ Ve
For an arbitrary z ∈ Vb node:
                                         e0 : z × beh sup × n → y ∈ Vsup


                                                             45
3.2   The algorithm
The analysis is performed in two phases. The first step is the pre analysis which is performed during the basic
semantic analysis. The second step is the post analysis, which extracts all the statically available information
about the supervisors. We separated our analysis according to the asynchronous, incremental analyser framework
of RefactorErl. The first phase of the analysis (pre analysis) is the syntax based node identification part that
can be performed incrementally form by form asynchronously. In this phase we have only a partial view on
the semantic layer. The second phase (post analysis) uses the information calculated by the first phase of the
supervisor analysis and other semantic analyses performed on the full SP G in an interfunctional way (such as
the data-flow reaching). The pseudo code of the analysis is presented in Algorithm 1.

Pre analysis
In the pre analysis phase a new Supervisor semantic node is created and inserted into the SP G if the module
implements the supervisor behaviour. That is, the module contains the -behaviour(supervisor) attribute.
This module is the callback module of the behaviour. A new edge with label sup cb module is inserted between
the Supervisor node and the module and the name attribute of the Supervisor is set to the name of the callback
module. Finally, a new edge with label beh sup is inserted between the corresponding behaviour node and the
Supervisor node.

Post analysis
The pre analysis phase did the initialisation, new Supervisor nodes have been created, initialised and inserted
to the SP G. The post phase of the algorithm extracts the most possible information about the supervisors.
The algorithm determines the starting arguments, workers, hierarchy, etc. The post phase creates new semantic
nodes and extends the semantic information available in the SP G. The Algorithm 1 shows the pseudo code of
the post analysis phase for a given Supervisor node.
    First of all, the algorithm finds the init/1 callback function of the supervisor using the f ind init f un routine.
Once the callback function node is found we insert an edge with label sup init between the examined Supervisor
node and the function node.
    The second step of the algorithm is to determine the functions that potentially start the supervisor process.
The algorithm searches for supervisor:start link/2,3 function applications. For each function application
we get its arguments with the get args routine, then we apply data-flow reaching by invoking the f low routine to
determine the possible values of the arguments. The M odule variable contains the name of the callback module
of the supervisor that is started with the examined application. We filter the result for the actual supervisor
(Supervisor) and we update its registered name and args attributes. Finally, we create a link with label
sup start between the Supervisor and the node of the function that applies the supervisor:start link/2,3
functions.
    The third step of the algorithm is the analysis of restart strategies and child specifications of the Supervisor.
This information can be found in the return value of the init/1 callback function or the supervisor:start -
child/2 function. At first, we examine the return points of the init/1 callback function. The return points
of the function are queried by the return points routine. For each return point we apply data-flow reaching
to determine the possible strategies and child specifications. We update the attribute restart strategy with
the newly found restart strategies (Strategies) and collect the child specifications in the set GoodChildSpecs.
Next, we need to get the child specifications from the supervisor:start child/2 function applications. We
query the function applications and for each function application we try to determine the values of its arguments
with data-flow reaching. The gathered child specifications are inserted to the GoodChildSpec set if the set of
registered names of the examined Supervisor and the set SupRef N ames has a non empty section. Next we
create a new worker Supervisor node using the create worker routine for every child specification from the
GoodChildSpec and create a link with label sup worker between every newly created worker and the examined
Supervisor. Finally, based on the type of the worker process we use a sup def or w def edge to create a link
between the worker and its definition.
    The last step is to analyse the supervisor references. We examine the function from the supervisor module
listed in variable F unsW ithArity. For each function we query the function applications, then with data-flow
reaching we try to determine the referred supervisor from its arguments. If the function application refers the
examined Supervisor, then a link with label sup ref is created between the function application and the Supervisor
node.


                                                          46
Algorithm 1 analyse supervisor(Supervisor)
 1: InitF un ← f ind init f un(Supervisor)
 2: create link(Supervisor, InitF un, sup init)
 3: %%     Analyse supervisor : start link/2, 3 f unctions %%
 4: StartF unCalls ← get f un calls(supervisor, start link, {2, 3})
 5: for F unCall ∈ StartF unCalls do
 6:    (SupN ame, M odule, Args) ← f low(get args(F unCall))
 7:    if Supervisorname ∩ M odule 6= ∅ then
 8:        Supervisorregistered name ← Supervisorregistered names ∪ SupN ame
 9:        Supervisorargs ← Supervisorargs ∪ Args
10:        create link(Supervisor, get f un(F unCall), sup start)
11:    end if
12: end for
13: %%     Analyse child specif ications %%
14: RetP oints ← return points(InitF un)
15: GoodChildSpecs ← ∅
16: for RetP oint ∈ RetP oints do
17:    (ok, (Strategies, ChildSpecs)) ← f low(RetP oint)
18:    Supervisorstrategy ← Supervisorstrategy ∪ Strategies
19:    GoodChildSpecs ← GoodChildSpecs ∪ ChildSpecs
20: end for
21: StartChildF unCalls ← get f un calls(supervisor, start child, 2)
22: for F unCall ∈ StartChildF unCalls do
23:    (SupRef N ames, ChildSpecs) ← f low(get args(F unCall))
24:    if Supervisorregistered name ∩ SupRef N ames 6= ∅ then
25:        GoodChildSpecs ← GoodChildSpecs ∪ ChildSpecs
26:    end if
27: end for
28: for ChildSpec ∈ GoodChildSpecs do
29:    N ewW orker ← create worker(ChildSpec)
30:    W orkerDef ← get worker def (ChildSpec)
31:    create link(Supervisor, N ewW orker, sup worker)
32:    if supervisor ∈ N ewW orkertype then
33:        create link(N ewW orker, W orkerDef, sup def )
34:    else if worker ∈ N ewW orkertype then
35:        create link(N ewW orker, W orkerDef, w def )
36:    end if
37: end for
38: %%     Analyse supervisor ref erences %%
39: F unsW ithArity ← {(start child, 2), (terminate child, 2), (delete child, 2),
40:                      (restart child, 2), (which children, 1), (count children, 1)}
41: for (F un, Arity) ∈ F unsW ithArity do
42:    F unCalls ← get f un calls(supervisor, F un, Arity)
43:    for F unCall ∈ F unCalls do
44:        (SupRef N ames, ) ← f low(get args(F unApp))
45:        if Supervisorregistered name ∩ SupRef N ames 6= ∅ then
46:            create link(Supervisor, F unCall, sup ref )
47:        end if
48:    end for
49: end for


4   The supervisor graph
In Section 3 we extended the SP G with additional semantic information about supervisors. It is possible to
visualise the SP G, however it is too extensive and it is hard to find relevant information about supervisors. In


                                                        47
this section we define a supervisor graph which is a view of the SP G that contains information only about the
supervisors.
   The introduced graphs contain explicit information about the supervisor structure that can be used for code
comprehension, debugging or checking the hierarchy of the software.

4.1    Definition of the supervisor graph
Let the supervisor graph Gsup be an ordered pair. The first component of the pair is the set of nodes of the
supervisor graph. The second component of the pair is the set of directed edges represented as ordered triples.

                                                Gsup = (V sup , E sup )
The set of nodes is the union of supervisor nodes, worker nodes, function nodes and expression nodes, formally:

                                       V sup = (Vssup ∪ Vwsup ∪ Vfsup ∪ Vesup )

The set of edges is represented by ordered triples. The components of a triple are the head node, the tail node
and the label of the edge:
                                      E sup ⊆ V sup × V sup × LABELsup
The set of edge labels of the supervisor graph is:

                 LABELsup = {start, init, supervisor, worker, sup def, worker def, ref erence}

Nodes of the supervisor graph
  • The definition of the supervisor node is the following:

               Sup node = {name :: string(), strategy :: supStrategy(), registered name :: [term()],
                                                 start args = [term()]}

      The Sup nodes are representing the supervisors in the supervisor graph. The Sup node has the following
      attributes:
        – name: The name of the supervisor. This will be the caption of the node in the supervisor graph.
        – strategy: The restart strategy of the supervisor.
        – registered name: If the supervisor is registered, this attribute contains the type of the registration along
          with the registered name.
        – start args: The arguments which are passed to the init/1 callback function.
  • The definition of the worker node is the following:

                            SupW orker node = {name :: string(), data :: supP ropList()}

      The SupW orker nodes are representing the workers in the supervisor graph. The SupW orker node has the
      following attributes:
        – name: The name of the worker. This name will be the caption of the node in the supervisor graph.
        – data: This attribute contains every information available about the worker, for example: the type of
          the worker.
  • The definition of function node is the following:

                                        SupF unction node = {name :: string()}

      The function nodes represent the callback functions in the supervisor graph. The SupF unction node has
      the following attributes:
        – name: The name of the function. This will be the caption of the node in the supervisor graph.


                                                          48
  • The definition of the expression node is the following:

          SupExpression node = {name :: string(), f unc :: {module(), f un(), arity()}, args : [term()]}

    The expression nodes represent the supervisor references in the supervisor graph. Supervisor references
    are function applications, which might modify or query the supervisor arguments during runtime. The
    SupExpression node has the following attributes:
       – name: The text of the expression node which will be the caption of the node in the supervisor graph.
       – func: A module, function and arity triple ({M,F,A}) that identifies the function which is referred by
         the application expression.
       – args: The arguments of the function application.

Notations
In the previous section we introduced the Vsup set of Supervisor semantic nodes in the SP G. In this section we
use the V sup notation for the set of supervisor nodes in the new supervisor graph. We define two subsets of set
Vsup :
  • Vsups ⊆ Vsup – set of supervisor nodes which type is supervisor. Formally:

                                  Vsups = { x | x ∈ Vsup , supervisor ∈ l(x, type)}

  • Vsupw ⊆ Vsup – set of supervisor nodes which type is worker. Formally:

                                    Vsupw = { x | x ∈ Vsup , worker ∈ l(x, type)}

Auxiliary functions
We introduce some auxiliary functions to shorten the supervisor graph building rules:
  • edges(x, y): The function takes two nodes from the SP G and returns the set of labels of directed edges with
    head x and tail y.
  • updateSup(Sup, Sup0 ):

                                            Sup ∈ Vsups ∧ Sup0 ∈ Vssup
                                  Sup0 [name = Supname , strategy = Supstrategy ,
                         registered name = Supregistered name , start args = Supstart args ]

    The rule says that the Sup0 node (type of Sup node) shall have the same name, strategy, registered name
    and start arguments as the Sup Supervisor node from the SP G.
  • updateSupW orker(Sup, W orker):

                                          Sup ∈ Vsupw ∧ W orker ∈ Vwsup
                                    W orker[name = Supname , data = Supdata ]

    The rule says that the W orker node (type of SupW orker node) shall have the same name and the same
    data as the Sup Supervisor node (which type is worker) from the SP G.
  • updateSupF unction(F un, F un0 ): The function updates the name attribute of F un0 node (type of set
    SupF unction node) with the (M,F,A) of the Func node from the SP G.
  • updateSupExpression(Expr, Expr0 ): The function updates the attributes of Expr0 node (type of
    SupExpression node). The name attribute shall be the string created from (M,F,A) of the function that
    contains the SP G expression Expr. The args attribute contains the arguments of the function application
    which information is gathered from. Finally, the func attribute contains the ordered triple {M,F,A} of the
    function which the supervisor reference refer. E.g.: {supervisor,start child,2}.


                                                       49
COMPACT supervisor graph
We introduce two types of supervisor graphs. The full supervisor graph contains every available information
about the supervisors and workers, like callback functions, supervisor references, worker definitions, etc. The
COMPACT supervisor graph only contains information about the hierarchy of the supervisors and workers. We
use the COMPACT flag to distinguish the COMPACT and full supervisor graphs in the rules.

Rules
In the followings we define the rules which are used to build the full and COMPACT supervisor graphs.

 1. Supervisor node
                                                           x ∈ Vsups
                                                       sup
                                                 x ∈ Vs ∧ updateSup(x, x0 )
                                                  0

    If an x Supervisor node is present in the SP G then the x0 node which is synthesised from x using the
    updateSup function will be in the supervisor graph. This x0 will be a member of the Sup nodes of the
    supervisor graph.

 2. Supervisor edge (COMPACT )

                {x, y, z} ⊆ Vsups ∧ sup worker ∈ edges(x, y) ∧ sup def ∈ edges(y, z) ∧ COM P ACT
                            x0 ∈ Vssup ∧ z 0 ∈ Vssup ∧ updateSup(x, x0 ) ∧ updateSup(z, z 0 )∧
                                                   (x0 , z 0 , supervisor) ∈ E sup

    Let the x, y and z be Supervisor nodes from SP G. If there is a sup worker edge between x and y and
    a sup def edge between y and z, then there will be a supervisor edge between the x0 and z 0 synthesised
    Sup node nodes in the COMPACT supervisor graph. The not COMPACT (full) supervisor edge rule can
    be analogously defined.

 3. Worker edge

                                        x ∈ Vsups ∧ y ∈ Vsupw ∧ sup worker ∈ edges(x, y)
                            0
                       x        ∈ Vs ∧ y 0 ∈ Vwsup ∧ updateSup(x, x0 ) ∧ updateSupW orker(y, y 0 )∧
                                    sup

                                                        0 0             sup
                                                     (x , y , worker) ∈ E

    Let x and y be Supervisor nodes in the SP G and the type of y is worker. If there is a sup worker edge
    between x and y, then there will be a worker edge between the x0 and y 0 synthesized Sup node nodes in
    the supervisor graph.

 4. Supervisor init

                            x ∈ Vsups ∧ f ∈ Vf ∧ sup init ∈ edges(x, f ) ∧ ¬COM P ACT
                       x ∈ Vssup ∧ f 0 ∈ Vfsup ∧ updateSup(x, x0 ) ∧ updateSupF unction(f, f 0 )∧
                        0

                                                       (x0 , f 0 , init) ∈ E sup

    If a Supervisor node is connected to a function node with a sup init edge in the SP G, then there will be
    an init edge between the synthesised x0 and f 0 nodes in the supervisor graph (if the supervisor graph is not
    COMPACT ). This rule describes that the supervisor shall be connected to its init callback function.

 5. Supervisor start

                            x ∈ Vsups ∧ f ∈ Vf ∧ sup start ∈ edges(x, f ) ∧ ¬COM P ACT
                       x0 ∈ Vssup ∧ f 0 ∈ Vfsup ∧ updateSup(x, x0 ) ∧ updateSupF unction(f, f 0 )∧
                                                      (x0 , f 0 , start) ∈ E sup


                                                               50
            If a Supervisor node is linked to a function node with a sup start edge in the SP G, then there will be an
            start edge between the synthesised x0 and f 0 nodes in the supervisor graph (if the supervisor graph is not
            COMPACT ). This rule describes that the supervisor shall be connected to the function that can start the
            supervisor with an application of the supervisor:start link/2,3 functions.

         6. Supervisor reference


                                             x ∈ Vsups ∧ e ∈ Ve ∧ sup ref ∈ edges(x, e) ∧ ¬COM P ACT
                                      x0 ∈ Vssup ∧ e0 ∈ Vesup ∧ updateSup(x, x0 ) ∧ updateSupExpression(e, e0 )∧
                                                                  (x0 , e0 , ref erence) ∈ E sup

            If the x Supervisor node is linked to an expression node x with a sup ref edge in the SP G, then there
            will be a ref erence edge between the synthesised x0 and e0 nodes in the supervisor graph (if the supervisor
            graph is not COMPACT ).


     5      Demonstrating Example
     In this section we illustrate the algorithm of the analysis with an example implementation of a supervisor. First
     we introduce the example implementation, then we discuss the algorithm step by step focusing mostly on the
     post analysis phase of the SPG extension. Finally we present the supervisor graph and the compact supervisor
     graph.


     5.1      Example Supervisor
     The example supervisor can be seen in Figure 1. This supervisor is a simplified model of a theatre. The theatre
     has a director, a technician, and a bandmaster. The bandmaster supervises the musicians as can be seen in
     Figure 2.

                                                            Figure 1: Theatre supervisor
 1   −module( t h e a t r e ) .
 2   −b e h a v i o u r ( s u p e r v i s o r ) .
 3
 4   −export ( [ s t a r t l i n k / 1 , add crew /1 ] ) .
 5   −export ( [ i n i t /1 ] ) .
 6
 7   s t a r t l i n k ( Args ) −>
 8           s u p e r v i s o r : s t a r t l i n k ( { l o c a l , ?MODULE} , ?MODULE, Args ) .
 9
10   i n i t ( Args ) −>
11           Children = [
12                {director ,
13                {director , start link , [ ]} ,
14                  t r a n s i e n t , 1 0 0 , worker , [ d i r e c t o r ] } ,
15                { tech ,
16                { tech , s t a r t l i n k , [ ] } ,
17                  t r a n s i e n t , 1 0 0 , worker , [ t e c h ] } ,
18                {bandmaster ,
19                {bandmaster , s t a r t l i n k , [ Args ] } ,
20                  t r a n s i e n t , 1 0 0 , s u p e r v i s o r , [ bandmaster ] } ] ,
21           {ok , {{ o n e f o r a l l , 3 , 500} , C h i l d r e n }} .
22
23   add crew ( C h i l d S p e c ) −>
24       s u p e r v i s o r : s t a r t c h i l d ( { l o c a l , ?MODULE} , C h i l d S p e c ) .


                                                                              51
                                                        Figure 2: Bandmaster supervisor
 1   −module( bandmaster ) .
 2   −b e h a v i o u r ( s u p e r v i s o r ) .
 3
 4   −export ( [ s t a r t l i n k /1 ] ) .
 5   −export ( [ i n i t /1 ] ) .
 6
 7   s t a r t l i n k ( Args ) −>
 8           s u p e r v i s o r : s t a r t l i n k ( { l o c a l , ?MODULE} , ?MODULE, Args ) .
 9
10   i n i t ( l e n i e n t ) −>
11           i n i t ( { o n e f o r o n e , 3 , 60} ) ;
12   i n i t ( angry ) −>
13           i n i t ( { r e s t f o r o n e , 2 , 60} ) ;
14   i n i t ( j e r k ) −>
15           i n i t ( { o n e f o r a l l , 1 , 60} ) ;
16
17   i n i t ( { R e s t a r t S t r a t e g y , MaxRestart , MaxTime} ) −>
18          Mod = m u s i c i a n s ,
19           Type = worker ,
20           Children = [
21                  {guitar ,
22                  {Mod, s t a r t l i n k , [ g u i t a r , good ] } ,
23                    t r a n s i e n t , MaxRestart , Type , [Mod] } ,
24                  {drummer ,
25                  {Mod, s t a r t l i n k , [ drummer , a v e r a g e ] } ,
26                    t r a n s i e n t , MaxRestart , Type , [Mod] } ] ,
27           {ok , {{ R e s t a r t S t r a t e g y , MaxRestart , MaxTime} , C h i l d r e n }} .


     Pre analysis
     The pre analysis phase looks for ”-behaviour(supervisor).” attribute in each module. In this case both
     theatre (in line 2) and bandmaster module (in line 2) has such attribute. Two new Supervisor nodes will be
     inserted into the SP G. One for theatre module and one for bandmaster module, and two sup cb module links
     will be inserted between the Supervisor nodes and the module nodes. The name attribute of the Supervisor
     nodes will be equal to the names of the module which they are connected to. Also two beh sup links will be
     inserted between the Behaviour nodes and the Supervisor nodes.

     Post analysis
     We discuss only the analysis of the theatre supervisor since the analysis of the bandmaster can be executed
     analogously. The only difference will be the amount of extracted information about the restart strategy and
     maximal restart intensity as the information is passed as a parameter to function init/1. This additional
     information is not always trivially available statically, but the data-flow analysis can calculate the possible values
     of variables statically.
        The first step of the post analysis is to find the definition of the function init/1 from theatre callback module
     in the SP G. It can be found in the line 10 of Figure 1. We insert a link sup init between the supervisor node
     and the function node.
        The second step of the analysis is the examination of the supervisor:start link/2,3 function applications.
     We query the references of the function and we get two function applications, one application in the defini-
     tion of function theatre:start link/1 (in line 8 in Figure 1) and one other one in the definition of function
     bandmaster:start link/1 (in line 8 in Figure 2). The next step is the filtering of the relevant applications. To
     do that we query the possible values of arguments for each application (using data flow analysis) and we look
     for the second argument which describes the started supervisor. We find that the function application from the
     theatre module refers to the theatre supervisor because the ?MODULE macro is substituted to the name of the
     containing module. We update the attributes of the theatre supervisor node with the gathered information. The


                                                                         52
attributes registered name and args are updated with the possible values of the first and the third arguments
of the application. Finally we insert a link sup start between the node of the function theatre:start link/1
(in line 7 in Figure 1) and the node of the supervisor theatre.
   The third step is the analysis of restart strategies and child specifications. To find the child specifications
we need the possible return values of the init/1 callback function. We query the possible return values and
extract the information about the restart strategy and maximum intensity of restarts from the result. In our
case it is the tuple {one for all, 3, 500} (in line 21 in Figure 1). We update the strategy attribute of the
theatre node with this information. Next we try to extract the child specifications from the return values using
backward data-flow reaching. The Children variable (in line 11 in Figure 1) contains the child specifications,
and we need to determine the possible values of this variable using data-flow analysis. There are three different
children of the theatre supervisor. For each children we create a new Supervisor node of type worker and
update the data attributes with the information can be found about each supervisor. We link the theatre
supervisor to the workers with sup worker edges. We insert worker def links between the workers and their
function definition. The name of the worker function will be the second component of the child specification.
For example, {director, start link, []} (in line 13 in Figure 1) means, that the children will be started
using the director:start link/0 function. We also examine the child specifications found in the applications
of supervisor:start child/2 function, but in our case there is no information about them.
   In the final step of the analysis we examine the supervisor references that potentially query or change the
attributes of the theatre supervisor. There is a reference to our supervisor in the theatre:add crew/1 function
(in line 23 in Figure 1) . We insert a link sup ref between the supervisor node an the reference.
   The post analysis of the bandmaster supervisor can be analogously executed. The only important difference is
that the data flow analysis will have heavier impact on the data extracted, because the init/1 callback function
of the bandmaster module have four clauses. Thus, the restart strategy and the maximal restart intensity can
be given as parameters to the supervisor. For example the data-flow analysis will determine that the supervisor
can have three different restart strategies (one for one, rest for one, one for all.

5.2   Results
Using the rules defined in Section 4 the COMPACT and full supervisor graphs can be constructed from the SP G
after the supervisor analysis is completed. We show the COMPACT and full supervisor graphs for our example
supervisor implementation presented in subsection 5.1.

COMPACT supervisor graph
The Figure 3 shows the COMPACT supervisor graph for the theatre supervisor. The graph shows only the
hierarchy of the supervisors and the workers. The green diamond nodes represent the supervisors and the blue
oval nodes are the workers of the supervisors. The attributes of the nodes contains every information that can
be extracted statically about the supervisors or the workers as it can be seen in the picture, we highlighted the
attributes of three nodes.

                                    Figure 3: COMPACT supervisor graph


Full supervisor graph
The Figure 4 shows the full supervisor graph for the theatre supervisor. We can see that the structure of this
graph is more complex than in the case of the COMPACT supervisor graph. We can find additional information


                                                       53
in the graph like the functions of the callback module, the function nodes of the worker definitions and the
supervisor references. The orange hexagons represent the function nodes.

                                         Figure 4: Full supervisor graph


6   Related work
The supervisor concept is present in other programming languages as well. In this section we examine some tools
which are capable of analysing supervisors in Erlang and in other programming languages as well.

Percept2
Percept2 [9] is a profiler tool for Erlang programs with focus on processes and process communication. Percept2
is a dynamic analyser tool which analyses the program in runtime. In comparison with our results Percept2 can
detect the hierarchy of the supervisors correctly. However it does not know the supervisor specific information,
for example restart strategy, child specifications, etc. In some cases Percept2 can provide more accurate results,
because it can detect new children added to supervisors dynamically. Our tool is a static code analyser, and can
detect only those children which specification is present in the source code. However our tool can detect child
specifications that are not necessarily added to the supervisors in a certain runtime configurations. In conclusion,
the two tools have slightly different focus, as Precept2 is a dynamic and RefactorErl is a static analyser tool.
The tools should be used as complementary tools.

Akka
Akka [10] is a library for Scala and Java programs. It supports the implementation of distributed concurrent and
fault tolerant software using the Actor model. The Actor model takes care of the thread handling and resource
locking, thus the developer can focus on the business logic of the software. Supervisors are present in Akka and
it is very similar to the supervisors in Erlang. The processes are represented as actors and the supervisor can
monitor these actors. Supervisors can start, stop and restart the monitored processes, and also halt them. Akka
supports two restart strategies only. Currently there is no static code analyser support for supervisors in aid of
Akka. Several IDE-s can provide dynamic information, like call stacks and class diagrams which can be used to
get some information about the supervisors, but it is not nearly as accurate as our results for Erlang supervisors.

Cloud Haskell
Cloud Haskell [11] is a library for Haskell programs. It supports the implementation of distributed and concurrent
software mainly for clusters. In Cloud Haskell there is a package called distributed-process-supervisor [12] which
supports supervisors very similar to the supervisors used in Erlang. There is no static code analyser tool that
support supervisor analysis in Cloud Haskell. The SourceGraph [13] can be used to make function call graphs and
detect module import hierarchy. From these graphs there are some information available about the supervisors,
but our results for Erlang supervisors are more accurate and user friendly.

7   Conclusions
In this paper we discussed a new method for extracting the relevant information from the source code written in
Erlang implementing the supervisor behaviour. For the analysis we used the static analysis tool RefactorErl and


                                                        54
the information available in its Semantic Program Graph. RefactorErl performs lexical, syntactic and different
semantic analyses on the source code and stores the extracted results in the SP G. We presented the algorithm
of the new analysis with its pseudocode and explained the steps in detail. We introduced the formal extension
of the SP G with the new nodes and the way the new semantic information is embedded in the SP G. To make
the supervisor hierarchy more explicit we defined a supervisor graph that is a view of the SP G that includes
only information about supervisors. We defined two levels of granularity for supervisor graphs, that is a full and
a compact view. We discussed the steps and rules how these graphs are generated from the SP G. Finally we
demonstrated our approach on an example and showed the generated graphs, both a full and a compact view.
   The presented method was examined on different open source Erlang applications, and the results seemed to
be useful in software comprehension.

References
 [1] RefactorErl - Refactoring Erlang Programs. http://plc.inf.elte.hu/erlang, 2015. Accessed: 2016-06-
     30.
 [2] István Bozó, Dániel Horpácsi, Zoltán Horváth, Róbert Kitlei, Judit Kőszegi, Máté Tejfel, and Melinda
     Tóth. RefactorErl, Source Code Analysis and Refactoring in Erlang. In Proceeding of the 12th Symposium
     on Programming Languages and Software Tools, Tallin, Estonia, 2011.
 [3] Erlang Programming Language. http://www.erlang.org. Accessed: 2016-06-30.
 [4] Francesco Cesarini and Simon Thompson. Erlang Programming. O’Reilly Media, 2009.

 [5] Melinda Tóth and István Bozó. Static Analysis of Complex Software Systems Implemented in Erlang.
     In Central European Functional Programming School, volume 7241 of Lecture Notes in Computer Science,
     pages 440–498. Springer, 2012.
 [6] Zoltán Horváth, László Lövei, Tamás Kozsik, Róbert Kitlei, István Bozó, Melinda Tóth, and Roland Király.
     Modeling Semantic Knowledge in Erlang for Refactoring. In International Conference on Knowledge Engi-
     neering, Principles and Techniques, KEPT 2009, Selected papers, pages 38–53, 2009.
 [7] M. Tóth, I. Bozó, and Z. Horváth. Applying the Query Language to Support Program Comprehension.
     In Proceeding of International Scientific Conference on Computer Science and Engineering, ISBN 978-80-
     8086-164-3, pages 52–59, Stara Lubovna, Slovakia, September 2010.
 [8] Martin Logan, Eric Merritt, and Richard Carlsson. Erlang and OTP in Action. Manning Publications Co.,
     Greenwich, CT, USA, 1st edition, 2010.
 [9] Huiqing Li and Simon Thompson. Multicore profiling for erlang programs using percept2. In Proceedings
     of the Twelfth ACM SIGPLAN Workshop on Erlang, Erlang ’13, pages 33–42, New York, NY, USA, 2013.
     ACM.

[10] Akka. http://akka.io. Accessed: 2016-06-30.
[11] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards haskell in the cloud. In Proceedings of
     the 4th ACM Symposium on Haskell, Haskell ’11, pages 118–129, New York, NY, USA, 2011. ACM.
[12] Supervisors for The Cloud Haskell Application Platform.             https://hackage.haskell.org/package/
     distributed-process-supervisor. Accessed: 2016-06-30.

[13] SourceGraph: Static code analysis using graph-theoretic techniques. https://hackage.haskell.org/
     package/SourceGraph. Accessed: 2016-06-30.


                                                          55

</pre>