=Paper= {{Paper |id=Vol-1742/MRT16_paper_3 |storemode=property |title=DeSARM: A Decentralized Mechanism for Discovering Software Architecture Models at Runtime in Distributed Systems |pdfUrl=https://ceur-ws.org/Vol-1742/MRT16_paper_3.pdf |volume=Vol-1742 |authors=Jason Porter,Daniel Menasce,Hassan Gomaa |dblpUrl=https://dblp.org/rec/conf/models/PorterMG16 }} ==DeSARM: A Decentralized Mechanism for Discovering Software Architecture Models at Runtime in Distributed Systems== https://ceur-ws.org/Vol-1742/MRT16_paper_3.pdf
          DeSARM: A Decentralized Mechanism for
         Discovering Software Architecture Models at
               Runtime in Distributed Systems
                  Jason Porter                           Daniel A. Menascé                            Hassan Gomaa
       Department of Computer Science              Department of Computer Science             Department of Computer Science
          George Mason University                     George Mason University                    George Mason University
           Fairfax, Virginia 22030                     Fairfax, Virginia 22030                    Fairfax, Virginia 22030
          Email: jporte10@gmu.edu                     Email: menasce@gmu.edu                     Email: hgomaa@gmu.edu




   Abstract—Runtime models play a critical role in modern              self-configuration, self-optimization, self-healing and self-
self-adaptive systems. Hence, runtime architectural models are         protection, and are also called self-* or autonomic sys-
needed when making adaptation decisions in architecture-based          tems [17]. In architecture-based self-adaptive systems, com-
self-adaptive systems. However, when these systems are dis-
tributed and highly dynamic, there is an added need to discover        ponents dynamically change in order to continuously adhere
the systems software architecture model at runtime. Current            to architectural properties and system goals. As a result, these
methods of runtime architecture discovery take a centralized           systems require runtime architectural models when making
approach, in which the process is carried out from a single            adaptation decisions [11]. However, when these systems are
location. These methods are inadequate for large distributed           distributed and highly dynamic there is an added need to
systems because they do not scale up well and have a single
point of failure. Also, systems of such size consist of nodes          discover the system’s architectural model at runtime.
that are typically highly dynamic in nature. Existing approaches          Current methods of runtime architecture discovery take a
to architecture discovery are not capable of addressing these          centralized approach to constructing architectures dynamically.
concerns. This paper describes DeSARM (Decentralized Software          These methods are inadequate for large distributed systems
Architecture discoveRy Mechanism), a completely decentralized          because they do not scale up well and have a single point of
and automated approach for runtime discovery of software archi-
tecture models of distributed systems based on gossiping and mes-      failure. Also, systems of such size are often dynamic in nature
sage tracing. DeSARM is able to identify at runtime important          due to changes in the runtime environment, where nodes join
architectural characteristics such as components and connectors,       and leave the system unpredictably, and are subject to fail-
in addition to synchronous and asynchronous communication              ures. Existing approaches do not address the aforementioned
patterns. Furthermore, through its use of gossiping, it exhibits the   concerns.
properties of scalability, global consistency among participating
nodes, and resiliency to failures. The paper discusses DeSARM’s           We describe a method for runtime software architecture
architecture and detailed design, and demonstrates its properties      discovery in a completely decentralized manner. This is ac-
through experimentation.                                               complished by keeping message logs at each node and dis-
                                                                       seminating among the nodes identified component interaction
                       I. I NTRODUCTION                                patterns (i.e., synchronous, asynchronous, single or multiple
   Software architecture—the high level structure of a software        destination) derived from the logs through selective gossip
system including a collection of components, connectors and            exchanges. With selective gossiping, information dissemina-
constraints—plays an increasingly critical role in the design          tion takes place only between nodes that are part of the
and development of any large complex software system. These            same application. This results in the containment of network
artifacts (i.e., components, connectors, and constraints) are          traffic, which in effect reduces communication overhead. All
needed to reason about the system and act as a bridge                  references to DeSARM’s gossiping method heretofore refer
between requirements and implementation as well as provide             to selective gossiping. Through its use of gossiping, our
a blueprint for system construction and composition. The               mechanism addresses the limitations of the current approaches
architecture helps in the understanding of complex systems,            and exhibits the following properties: (1) resiliency to failures,
supports reuse at both the component and architectural level,          where the gossiping protocol always converges irrespective
indicates the major components to be developed together with           of node failures, (2) global consistency among participating
their relationships and constraints, exposes changeability of          nodes, where all nodes converge to the same architectural
the system, and allows for verification and validation of the          view, and (3) scalability, where the gossiping protocol can
target system at a high level [32].                                    accomodate systems of increasing size [29]. Accordingly, the
   Software architectures have been very influential in                scope of our work is the derivation of architectural models
self-adaptive systems, i.e., systems that are capable of               at runtime to be used in decentralized decision making for
architecture-based adaptation in large distributed systems. This    from which interaction patterns are identified using pattern
effort is part of a larger project, RASS: Resilient Autonomic       detection in order to define architectural elements. Yuan and
Software Systems [27], aimed at developing a highly-resilient       Malek [35] take a dynamic approach to discovering the ar-
and highly-dependable system, which is capable of making            chitectural model of a distributed application by generating a
adaptation decisions in a distributed fashion without cen-          component interaction model using data mining. Huang et al.
tralized control. To achieve this goal, each node requires a        [15] use the reflective ability of the component framework to
complete model of the system at runtime.                            discover an up-to-date architecture from the running system.
   Thus, the main contribution of this paper is DeSARM:             In contrast to the aforementioned, DeSARM is based on a de-
Decentralized Software Architecture discoveRy Mechanism,            centralized approach to architecture discovery. This approach
a completely decentralized and automated method and sys-            is effective for large distributed systems which pose scalability,
tem for runtime discovery of software architecture models           single point of failure and dynamicity concerns, which are not
of distributed systems using gossiping [6][18] and message          addressed by the existing methods.
tracing. The information dissemination and relatively fast             Also related to our work is architecture-based software
convergence capability of the gossiping protocol aid each node      dynamic adaptation, which addresses the dynamic software
in deriving a complete model of the architecture. We also           reconfiguration of the architecture model and corresponding
demonstrate experimentally DeSARM properties of resiliency,         implementation for the purpose of runtime adaptation and
global consistency, and scalability. It is important to note that   evolution (see e.g., [10], [21], [23], [25], [34]).
these properties relate to the architecture discovery process
and not to application failure recovery or adaptation, which                              III. A SSUMPTIONS
are outside the scope of this paper.                                  This paper makes the following assumptions:
   The rest of this paper is organized as follows. Section II         1) If a component fails, it is restarted on the same node
discusses related work. Section III provides some basic as-              if the node is still running. This is common in modern
sumptions. Section IV describes the architecture of DeSARM.              component-based systems whereby components can be
Section V presents the results of our experiments with De-               restarted when they are detected to have failed.
SARM. Section VI provides a discussion on important is-               2) If a node fails, all the components running on that node
sues related to the architecture discovery mechanism. Finally,           fail. This assumption is obvious and does not require
Section VII presents some concluding remarks and discusses               further comments.
possible future work.                                                 3) If a node cannot be restarted after a failure, its compo-
                                                                         nents can be moved to other node(s) using an existing
                     II. R ELATED W ORK
                                                                         component recovery mechanism not within the scope of
   In this paper we use architecture discovery instead of recov-         this work. This assumption is based on existing work on
ery as is often found in the literature. Architecture recovery           a runtime application recovery mechanism with which
refers to cases where the architecture was known but may have            DeSARM is being integrated. See Section VII.
been lost due to erosion or improper documentation [3], [15],         4) Software components can communicate either through
[33]. In contrast, architecture discovery refers to situations in        a connection-less transport protocol such as UDP or a
which the architecture was not previously known. This is due             connection-oriented protocol such as TCP. Both types
to the fact that the architecture may not have previously existed        of communication are common in distributed systems
or may have changed at runtime due to the dynamic nature of              and DeSARM supports either.
the system as is the case with dynamic software adaptation.           5) The software architecture is not known because it may
   Software architecture discovery approaches can be classi-             dynamically change due to churn and failures. As ex-
fied into dynamic, static, and hybrid, i.e., combining both              plained in the introduction, this is the focus of this work.
dynamic and static analysis. Some examples of static analysis         6) The software deployment on physical nodes is not
approaches include [24][4][19] and examples of hybrid ap-                known. The set of nodes on which the software system is
proaches are [28][30]. However, as the objective of this paper           deployed is not assumed to be known and is discovered
is discovering architectural models at runtime, we shall focus           by DeSARM.
only on dynamic approaches as follows.
   Israr et al. [16] describe SAMEtech, a dynamic approach for        IV. S OFTWARE A RCHITECTURE D ISCOVERY M ETHOD
automating the discovery of architecture models and layered            This section discusses DeSARM’s architecture. We first
performance models from message trace information. Disco-           describe the structure of a node running DeSARM. Then, we
Tect [31] uses a set of pattern recognizers and knowledge           give an overview of how gossiping and message tracing are
of the architectural style being implemented to map low-            incorporated into the discovery process. We then delve into
level system events into high-level architecturally meaningful      the details of the architecture model discovery method.
events. Bojic and Velasevic [3] use test cases that cover rele-        A node in a distributed system consists of three layers
vant use-cases, and concept analysis to group system entities       according to Fig. 1: application layer, DeSARM layer, and
together that implement similar functionality. Vasconcelos et       communication middleware. The application layer consists
al. [33] use specified use-cases to generate execution traces       of the distributed application components that communicate
                                                       Fig. 2. Details of the DeSARM Layer.



                                                                               components are aggregated to form an aggregate mes-
                                                                               sage log (AML) at each node. The AML contains the
                                                                               interactions between all components and their respective
                                                                               destination components. This is further merged with the
                                                                               AMLs from incoming gossip messages received from
                                                                               other nodes (see below).
                                                                             • Gossip-based dissemination: This forms the core of our
                                                                               architecture discovery method and enables the distribu-
                                                                               tion of AMLs from each node throughout the system.
                                                                             • Peer node selection: This is achieved through the main-
                                                                               tenance of a component/node database which is derived
                                                                               by identifying component IDs and their related node IDs
                                                                               from incoming and outgoing messages. This ensures that
                                                                               only nodes running components that are part of the same
      Fig. 1. Layered Structure at a Node of a Distributed System.             application are selected for dissemination.
                                                                             • Control: The DeSARM controller manages the execution
                                                                               of the gossip process by maintaining the timing between
over the network via messaging events. Each component has                      consecutive rounds.
two logs: message sent log (MSL) and message received log                    • Architecture discovery: This is used to derive the archi-

(MRL) as shown in Fig. 2. The DeSARM layer forms a                             tectural model of the system based on the message traces.
wrapper around the communication middleware and provides                  Finally, the communication middleware provides network ac-
the same interface to the application layer components as the             cess allowing the sending and receiving of component-level
communication middleware. DeSARM consists of a number                     and gossiping messages between nodes.
of modules each providing different functionality (see Fig.2):
                                                                             Gossip is an epidemic protocol, which due to its simplicity,
  • Message logging: All incoming messages are logged be-                 robustness and flexibility makes it ideal for reliable dissem-
    fore being passed to the application layer components and             ination of data in large-scale distributed systems. In gossip-
    all outgoing messages are logged before being passed to               based dissemination, data spreads exponentially fast and takes
    the communication middleware. All messages are logged                 O (logN) rounds to reach all N nodes [18]. The essence
    to stable storage.                                                    of this approach, which lies at the core of all gossip-based
  • Message log aggregation: The MSLs and MRLs of all                     dissemination approaches, was first introduced in the seminal
!!!!"##$%&#'()'*&+,*'-)                                          &&&&&!"##$%&*'.'$/'*&+,*'-)&                                                • Destination Type: single destination (SD) or multiple
&&&&0%''*&12&&&&&&&&&&&&&&&&&&&&&&                               &&&&&0%''*&32&                                                                destination (MD)
!!!!)"&'/'*4&!"56'&7($+#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*'.'%5"(&"8&*'97'#+&8*"6&1&                                                             • Message Type (mt): request reply (RQ), no-reply re-
!!!!!""!#$%$&'!$(&)*+,$!-*.'+$.!                                                                                                               quested (NR), a reply to previous request (RP)
!!!!!!!!!#"$"%&'&()*&&+,-."
                                                                                                                                             • Transaction ID
!!!!!""!-./&$$0!'/!$(&)*+,$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!""!.$&$12$!3$##*,$!
!!!!!!!!!%&/0,#1"234-."""""""""""""""""""""""""""""""""""""""""""""""""""""""""234*"$"+&(&56&7+89,*-."                                       • Message Unique ID (mid)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"!                                                        • Return ID: equals 0 if mt 6= RP, otherwise equals “mid”
!                   !                     !                    !!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!
!!!!""!4*1'!5/.!.$#-/+#$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!""!-./&$$0!'/!$(&)*+,$!                                           of original request message
!!!!!!!!234#"$"+&(&56&7+89,#-.""""""""""""""""""""""""""""""""""""""""%&/0,*1"234-."                                                         • Source Component: component sending the message
!!!""!3$.,$!678#!'/!,$'!+$4!*,,.$,*'$0!678!!!!!!!!!!!!!""3$.,$!678#!'/!,$'!+$4!*,,.$,*'$0!678!
!!!!!!!"234"$"9&+:&48:%,2341"234#-.&&&&&&&&&&&&&&&&&&&&&&&&&&&234"$"9&+:&48:%,2341"234*-."                                                   • Destination Component: component receiving the mes-
!!!!'()&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&'()&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&       sage
                                                                                                                                             • Source Node ID: ID of sending node
             Fig. 3. Design of the Gossip-based Dissemination Framework                                                                      • Destination Node ID: ID of receiving node

                                                                                                                                              The MRL and MSL for each component are scanned and
 paper by Demers et al. [5] and involves the dissemination of                                                                              the interaction patterns for these messages are identified. The
 data by allowing randomly chosen pairs of nodes to exchange                                                                               following types of messages are considered: Reply requests
 new information. After the exchange, the two nodes forming                                                                                (RQ), No-reply requests (NR), and Replies (RP).
 a pair should have the same information effectively reducing                                                                                 These message types allows us to identify synchronous
 the entropy of the system [29].                                                                                                           vs asynchronous interactions. In the former case, since reply
    The main elements of the gossip-based dissemination frame-                                                                             messages are guaranteed to have a request, then the original
 work are: (1) peer selection, where a peer (node) selects                                                                                 request reply message and its associated reply message are
 another peer uniformly at random from the set of available                                                                                treated as a single synchronous interaction (SY). If the original
 peers, (2) data exchanged, which involves the exchange of                                                                                 message was sent as a unicast (SD) then the tuple (source,
 data between peers and is specific to the use of the gossip                                                                               destination, SY, SD) is added to the AML. Otherwise, if the
 mechanism, and (3) data processing, which details how each                                                                                original message was sent as a multicast (MD), then the tuple
 peer handles the information received from other peers and                                                                                (source, destination, SY, MD) is added to the AML. No-reply
 is also specific to the use of the gossip mechanism [18].                                                                                 requested messages on the other hand are treated as asyn-
 DeSARM’s gossip-based dissemination is depicted in Fig. 3                                                                                 chronous interactions (AS) and added to the AML as (source,
 and consists of:                                                                                                                          destination, AS, SD) if the message was sent as a unicast, or
    • Peer selection: A peer P periodically chooses another                                                                                (source, destination, AS, MD) if the message was multicasted.
      peer Q uniformly at random from the set of peers that                                                                                The AML is treated as a set so only unique tuple entries are
      participate in the same application, i.e., DeSARM uses                                                                               allowed for each component interaction, irrespective of the
      selective gossiping.                                                                                                                 frequency of such interactions in the message logs because
    • Data exchanged: The AML is sent from one peer to                                                                                     a software architecture does not consider how many times
      another.                                                                                                                             a certain type of interaction occurred between components.
    • Data processing: The received AML is merged with the                                                                                 Further details on the algorithms implemented by DeSARM
      local AML at each node to produce an updated AML.                                                                                    can be found in [26].
    Once a node detects that there are no further updates to the                                                                              After each round of gossiping, the updated AML is used to
 gossiped AMLs, it assumes that convergence was reached and                                                                                incrementally discover the architecture, which is represented
 a complete software architecture can be derived.                                                                                          as a labeled directed graph. The vertices of this graph cor-
                                                                                                                                           respond to unique component ids and the edges correspond
                                                                                                                                           to unique component interactions. Edges are labeled with the
                                                                                                                                           interaction patterns (SY or AS) and destination types (SD or
                                                                                                                                           MD). Further details on this process can also be found in [26].
                                                                                                                                              To depict how DeSARM works, we use an example ar-
                                                                                                                                           chitecture of a distributed emergency monitoring system (see
                                                                                                                                           Fig. 4). The architecture consists of five types of components
                                                                                                                                           with three instances each of the Monitoring Sensor and Re-
                                                                                                                                           moteSystem Proxy components, and a single instance of the
                                                                                                                                           other components. This example assumes that each component
                                                                                                                                           is assigned to a single node. When referring to components we
                                                                                                                                           mean component instances unless otherwise mentioned. The
                                 Fig. 4. Emergency Monitoring System.
                                                                                                                                           communication patterns within the system are:
       Each log entry in the MSL or MRL has the following fields:                                                                            •   Operator Presentation sends synchronous messages with
       • Timestamp                                                                                                                               reply to Alarm Service and Monitoring Data Service.
                                                                                      A. Experiment I
 !"#$%"&$#'(                           /*-"%*(
                                                                         3.*&4%"&(
   )*#+"&(                             )0+%*-(                                           We use the application described in Fig. 4, whose archi-
                                                                       1&*+*#%45"#(
 ,"-."#*#%(                              1&"20(
                6);():(
                                 6);():(
                                                             6);(!:(                  tecture is known, and show that DeSARM converges exactly
                                                6);():(
                                                                                      to that known architecture. Table I shows the mapping be-
           6);():(                                         )<;():(     6);(!:(        tween nodes, components, and physical machines for this
                                     )<;():(          !"#$%"&$#'(                     architecture. As discussed above, the known and discovered
                       674&-(
                      )*&8$9*(
                                                        :4%4((                        architectures are represented as graphs; we compare the simi-
                                                       )*&8$9*(
                                                                                      larity between the two (the known and current version of the
                                                                                      discovered architecture) over time. For that, we use the graph
      Fig. 5. Discovered Architecture Model as Labeled Directed Graph
                                                                                      comparison algorithms proposed in [20][36][9] and a graph
                                                                                      similarity metric that ranges from 0 to 1, where 0 indicates
  •  Alarm Service and Monitoring Data Service send asyn-                             no similarity and 1 indicates that the two graphs are identical.
     chronous multicast messages to Operator Presentation.                            The use of graph comparison using the similarity measure is
   • Monitoring Sensor and Remote System Proxy send asyn-
                                                                                      only for convergence checking during the experiments and is
     chronous unicast messages to Alarm Service and Moni-                             not part of DeSARM’s implementation. We plot the evolution
     toring Data Service.                                                             of the similarity metric over time to display the convergence
                                                                                      speed of the discovery mechanism.
   The discovered architecture model corresponding to Fig. 4 is
the graph shown in Fig.5. This shows the five component types
                                                                                                                 TABLE I
of the system and their respective communication patterns as                             M APPING OF NODES TO COMPONENTS AND PHYSICAL MACHINES .
edge labels, which would have been derived by the architecture
discovery process.                                                                       Node      Software Component                     Machine
                                                                                         Node 1    Monitoring Sensor Component (MSC)      Machine 1
      V. D E SARM I MPLEMENTATION AND E XPERIMENTS                                       Node 2    Monitoring Sensor Component 2 (MSC2)   Machine 1
                                                                                         Node 3    Monitoring Sensor Component 3 (MSC3)   Machine 1
   DeSARM was implemented in Java, and was developed by                                  Node 4    Remote System Proxy (RSP)              Machine 1
extending an open source implementation of gossip [14] that                              Node 5    Operator Presentation (OP)             Machine 2
manages a list of nodes on a network for cluster membership.                             Node 6    Alarm Service (ArmS)                   Machine 2
                                                                                         Node 7    Remote System Proxy 2 (RSP2)           Machine 1
We extended the open source gosssip implementation by                                    Node 8    Monitoring Data Service (MDS)          Machine 2
adding the core functionality of DeSARM, as described in                                 Node 9    Remote System Proxy 3 (RSP3)           Machine 2
Section IV above, as well as incorporated push-pull based
gossip (discussed further below). We emulated a distributed                              Figure 6 shows how the architecture converges over time
system by implementing each node of the distributed system                            to the known architecture at each of the nodes shown in
on a different virtual machine (VM) and spread the VMs                                Table I under a no-failure case. Different nodes converge at
over physical machines connected over a network. The VMs                              different rates but at time 80 sec all nodes have converged
communicate over TCP/IP so they can be located anywhere on                            to the correct software architecture. Nodes 6 and 9 are the
the network. The DeSARM implementation is heavily multi-                              first to converge and node 7 is the last. Note that in our
threaded with different functions of DeSARM implemented as                            implementation of gossiping, each time a node i sends a gossip
different threads. Some examples of threads include sending                           message to node j, node j replies with a gossip message.
and receiving of gossip messages, message log aggregation,                            This way, two nodes will exchange AMLs more often, leading
architecture discovery, component/node database maintenance,                          to faster convergence. Because of the random nature of peer
and sending and receiving of component messages. All the                              selection in the gossip protocol, some nodes may gossip more
communication between nodes uses Java sockets. DeSARM                                 often with others, leading to different convergence rates among
modules normally communicate using UDP, however if gossip                             nodes. Additionally, the convergence rate is affected by the
exchanges become too large, resulting in message fragmenta-                           communication pattern among components.
tion in multiple packets, TCP is used to guarantee message                               Figure 7 shows the evolution of the architecture similarity
integrity.                                                                            when components fail. As the figure shows, failures start to
   Our experiments demonstrate the operation of DeSARM                                occur after the first 80 sec, before convergence was achieved.
and assess its convergence and the number of messages                                 In fact, the value of the similarity metric was equal to 0.9375
exchanged by the DeSARM middleware. Two sets of exper-                                (i.e., < 1) at all nodes at t = 80 sec. The failure probability of
iments were completed. In the first experiment, two types                             each component, while processing, is set at 20% (a relatively
of tests were conducted. In the first test, there were no                             high value) and the average component down time is set at
component/node failures. In the second test, we added random                          180 seconds. Thus, at approximately t = 260 sec, the failed
failures with subsequent recovery for each of the components.                         components will start to recover from the failure and resume
This second test reveals the impact of failures on the con-                           sending messages. DeSARM automatically resumes its mes-
vergence of DeSARM to the final architecture. In the second                           sage collection and gossiping of newly updated AMLs when
experiment, the scalability of the mechanism is examined.                             components start to recover. When that happens, convergence
                                                  $"!#
                                                                                                                                                                                                         TABLE III
                                                                                                                                                                                         N UMBER OF GOSSIP MESSAGES SENT PER NODE .
                                                  !",#
 !"#$%&'#&("')*%+%,-"%&.)/01)2-%,("'34)




                                                  !"+#                                                                                                                                        Sent          Received         Sent      Received
                                                  !"*#                                                                                                                                    (no failures)   (no failures)   (failures)   (failures)
                                                  !")#                                                                                                                            1            24              14             61           36
                                                  !"(#                                                                                                                            2            22              15             59           52
                  )




                                                                                                                                                                                  3            20              22             63           50
                                                  !"'#
                                                                                                                                                                                  4            22              14             60           55
                                                  !"&#                                                                                                                            5            21              19             39           54
                                                  !"%#                                                                                                                            6            22              21             60           68
                                                  !"$#                                                                                                                            7            25              11             65           69
                                                  !"!#
                                                                                                                                                                                  8            22              32             50           59
                                                                  $!#           %!#            &!#          '!#       (!#            )!#           *!#          +!#               9            21              21             63           59
                                                                                                             5%+')/3'#4)                                                          Avg.         22              19             58           56
                                                                        -./0#$#             -./0#%#              -./0#&#              -./0#'#            -./0#(#
                                                                        -./0#)#             -./0#*#              -./0#+#              -./0#,#

                                                                                                                                                                             an event log (not part of DeSARM) during the experiments.
Fig. 6. Architecture similarity at the 9 nodes as a function of time with no                                                                                                 Entries in these logs are timestamped in nanoseconds and cor-
failures.                                                                                                                                                                    respond to events such as sending application-level messages,
                                                                                                                                                                             sending/receiving DeSARM gossip messages, and computing
                                                      $"!#                                                                                                                   architecture similarity metrics. At the end of the experiment,
                                                      !",#                                                                                                                   the event logs of all nodes were sort-merged offline to produce
           !"#$%&'#&("')*%+%,-"%&.)/0%&$)1-%,("'23)




                                                      !"+#                                                                                                                   a single log. Figure 8 shows a few excerpts of this log. The first
                                                      !"*#                                                                                                                   two entries of this log show application-level messages sent
                                                      !")#                                                                                                                   by component MSC at Node 1 to components ArmS (at Node
                                                      !"(#                                                                                                                   6) and MDS (at Node 8). The next two entries correspond
                                                      !"'#                                                                                                                   to similar messages sent by MSC2 at Node 2, to components
                                                      !"&#                                                                                                                   ArmS and MDS. Later in time, DeSARM at Node 1 sends a
                                                      !"%#                                                                                                                   gossip message to Node 8 with Node 1’s current view of the
                                                      !"$#                                                                                                                   AML, namely [(MSC,ArmS,AS,SD), (MSC,MDS,AS,SD)].
                                                      !"!#                                                                                                                   This view only reflects the messages that component MSC
                                                             !#           (!#           $!!#         $(!#        %!!#         %(!#          &!!#         &(!#         '!!#   at Node 1 sent to nodes 6 and 8. Later in time, De-
                                                                                                              4%+')/2'#3)
                                                                                  -./0#$#        -./0#%#      -./0#&#       -./0#'#        -./0#(#                           SARM at Node 8 receives the following gossip message from
                                                                                  -./0#)#        -./0#*#      -./0#+#       -./0#,#                                          Node 4: [(RSP,ArmS,AS,SD), (RSP, MDS,AS, SD), (MDS,
                                                                                                                                                                             OP, AS, MD), (MSC3,MDS,AS,SD), (MSC2,MDS,AS,SD),
Fig. 7. Architecture similarity at the 9 nodes as a function of time with                                                                                                    (ArmS,OP,AS,MD), (MSC3, ArmS, AS, SD), (OP, ArmS, SY,
component failures.                                                                                                                                                          SD), (MSC2, ArmS, AS, SD)]. As a result, Node 8’s similarity
                                                                                                                                                                             metric becomes 0.6875. Later in time, Node 8 receives a gossip
                                                                                                                                                                             message from Node 9 with an AML that reflects Node 9’s
is achieved as illustrated in Table II, which shows the times at                                                                                                             current view of the architecture. This AML is aggregated with
which nodes 1-9 converge after failure recovery. As shown in                                                                                                                 Node 8’s AML resulting in an AML that reflects the entire
Table II, node convergence times are spread between t = 340                                                                                                                  software architecture. When the similarity metric for Node 8
sec and t = 400 sec.                                                                                                                                                         is next computed, it shows a value of 1, indicating convergence
   Figures 6 and 7 are representative of similar results we                                                                                                                  at Node 8.
obtained in other experiments.
                                                                                                                                                                             B. Experiment II
                                                          TABLE II                                                                                                             To test the scalability of the approach we tested DeSARM
                                  C ONVERGENCE TIME OF NODES 1-9 UNDER COMPONENT FAILURES .
                                                                                                                                                                             on Argo [2], a high performance computing cluster operated
                                                         1               2             3          4           5          6        7              8         9                 by the Office of Research Computing at George Mason
                                                        390             380           390        360         340        340      400            340       380                University. For this purpose, we put together a synthetic
                                                                                                                                                                             application with 30 components, each one of them residing
   Table III shows the number of gossip messages sent and                                                                                                                    on a different node of the research cluster. Some components
received per node until convergence is achieved in the case                                                                                                                  have a synchronous communication interface only, sending
when failures do not occur and in the case when failures occur.                                                                                                              and receiving only synchronous messages, some have an
As the table shows, the average number of sent and received                                                                                                                  asynchronous communication interface only, sending and re-
gossip messages when no failures occur is almost 1/3 of the                                                                                                                  ceiving only asynchronous messages, while others comprise
corresponding number when there are failures.                                                                                                                                both synchronous and asynchronous interfaces, sending syn-
   For illustration and debugging purposes, each node collected                                                                                                              chronous messages and receiving asynchronous messages or
                                                                                         9004AC3S1.(4-;-4(
                 !"#$%&!#'"#&%&#()*+(,-./(01,/2343567(6-,,38-(/1(956*(
                                                                                         6-,,38-,(
                 !"#$%&!#'%#:"&:()*+(,-./(01,/2-;-./7(6-,,38-(/1()<*(
                 !"#$%#%=&:!#:'!()*+=(,-./(01,/2343567(6-,,38-(/1(956*(
                 !"#$%#%=&:%=>>#()*+=(,-./(01,/2-;-./7(6-,,38-(/1()<*(
                 !"#$#''#$>!$%!=(?@-(,A6A435A/B(6-/5AC(3/(D1E-(!(A,F(>G!=&(
                 HHHHH(
                 !"#">'!#$=>:%=:(D1E-(!(,-./(81,,A0(6-,,38-(/1F(D1E-(%(IA/@(9)JF(K2)*+L956*L9*L*<7L(
                 2)*+L)<*L9*L*<7M(
                 (
                                                                                       T1,,A0(6-,,38-(
                 (                                                                      N516(.1E-(:(
                 HHHHH(
                 !"#":%'!=&":'$'(D1E-(%(5-C-A;-E(81,,A0(6-,,38-(N516F(D1E-(:(IA/@(9)J(K2O*PL956*L9*L*<7L(
                                                                             - : IA/@ 9)J K2O*P 956* 9* *<
                 2O*PL)<*L9*L*<7L(2)<*LQPL9*L)<7L(2)*+'L)<*L9*L*<7L(2)*+=L)<*L9*L*<7L(2956*LQPL9*L)<7L(
                 2)*+'L956*L9*L*<7L(2QPL956*L*RL*<7L(2)*+=L956*L9*L*<7M(
                 HHHHH(                                                               T1,,A0(6-,,38-(
                 !"#"&':%=!:&'&$(?@-(,A6A435A/B(6-/5AC(3/(D1E-(%(A,F(>G$%"&(            N516(.1E-(#((
                 (
                 HHHHH(
                 !"#"&'&:%$'#>>'(D1E-(%(5-C-A;-E(81,,A0(5-,01.,-(6-,,38-(N516F(D1E-(#(IA/@(9)J(
                 K2O*P'L956*L9*L*<7L(2O*P'L)<*L9*L*<7L(2)*+'L956*L9*L*<7L(2)*+'L)<*L9*L*<7L(2)*+=L956*L9*L*<7L(
                 2)*+=L)<*L9*L*<7L(2)*+L956*L9*L*<7L(2)*+L)<*L9*L*<7L(2O*P=L956*L9*L*<7L(2O*P=L)<*L9*L*<7L(
                 2956*LQPL9*L)<7L(2QPL956*L*RL*<7L(2O*PL956*L9*L*<7M(
                 HHHHH(                                                              D1E-(%(C1.;-58-E(
                 !"#"$':%:#%'">$(?@-(,A6A435A/B(6-/5AC(3/(D1E-(%(A,F(!G>(

Fig. 8. Excerpts of event trace. Components: Monitoring Sensor (MSC, MSC2, and MSC3), Remote System Proxy (RSP, RSP2, and RSP3), Operator
Presentation (OP), Alarm Service (ArmS), and Monitoring DataService (MDS).



vice versa. All communication between components take place              The first issue we address is whether DeSARM’s selec-
at a random time interval to emulate local processing between         tive gossiping mechanism always converges, i.e., if there
message exchanges.                                                    is the possibility of different nodes converging to different
   Each component communicates with only two other com-               discovered architectural models. To address this issue and to
ponents so that the initial AML at each node is a very small          ensure convergence, we employed anti-entropy based gossip
fraction of the complete AML. Therefore, the value of the             in DeSARM. In anti-entropy gossip, the gossiping protocol
similarity metric is very small to begin with. It starts at zero in   runs indefinitely albeit at specified regular intervals [29]; this
some cases, when components on a node send a synchronous              means that all nodes continuously make gossip exchanges. Be-
message to another component and have to wait for the reply.          cause gossiping never stops, updates will reach all nodes and
This way the DeSARM instance at each node would require               the system will always eventually converge. However, such
more information to derive the complete architectural model           a mechanism can result in unnecessary message exchanges
than in the previous experiment.                                      especially if no architectural changes have taken place within
   The convergence time at the 30 nodes varied significantly          the system. This problem is exacerbated in large distributed
due to the randomness in message exchange. The node that              systems. To address this issue we adjusted the protocol to
took the longest time to converge converged in 260 sec (i.e.,         gradually decrease the gossip rate δ when no new updates
4.3 minutes) as shown in Table IV. This table shows the               to the AML have been detected over a period of time. Then,
progression of the similarity metric over time. The table also        when an update is received, the original gossip rate is restored
shows that the rate of convergence, roughly defined as the            to speedup convergence.
increase in convergence over time is slower at the beginning
and faster at the end. For example, after 58% of the time, the           The second issue is the relationship between gossip conver-
similarity metric has only achieved the value of 0.27. At 85%         gence speed and the gossip rate δ. A higher value of δ implies
of the time, the similarity metrics achieved 0.73. While the          faster convergence at the cost of higher message overhead.
slowest node to converge took over four minutes, the fastest          Other factors such as system size, component communication
took 30 seconds.                                                      patterns, and gossip mode (see below) also affect convergence
                                                                      speed. For example, convergence speed is generally affected
                       VI. D ISCUSSION                                by the communication pattern among components, including
  We discuss here some additional issues of interest, some of         how often they communicate and the number of components
which are being addressed in ongoing work.                            they communicate with. Two modes of gossiping can be used:
                                                             TABLE IV
                                         S LOWEST CONVEGENCE RATE IN THE 30- NODE EXPERIMENT.

             Time (sec)          0-110   120    130    140    150    160-170   180    190-200   210-220   230-250   260
             Similarity Metric    0.00   0.10   0.13   0.20   0.27     0.30    0.47     0.57      0.73      0.97    1.00



push gossip (where a peer sends a message but does not               to failures. These properties were demonstrated through vari-
receive one in return) vs. push-pull gossip (where a peer sends      ous experiments, with and without component failures. These
a message and receives one in return). Our implementation            experiments assessed the rate of convergence of the DeSARM
of DeSARM uses push-pull gossip, which accelerates conver-           nodes towards the software architecture being discovered.
gence.                                                               These experiments showed that DeSARM is resilient and
   The third issue is the implication of dynamic architectural       is able to discover the architecture even in the presence of
changes during gossiping and how it affects convergence. We          failures, albeit at a lower pace than the one when no failures
are currenty addressing this issue through the integration of        occur. DeSARM was implemented in Java using a multi-
DeSARM with an application recovery layer that runs above            threaded architecture.
DeSARM and provides DeSARM with updates to events that                  In future work we plan to examine DeSARM’s capability
cause architectural changes. Examples of such events include         of discovering architectures that change over time as in the
component failures as well as component removal due to adap-         case of dynamic software adaptation [13], [12]. This work will
tation. When such events occur, this layer informs DeSARM            examine different recovery and adaptation models as well as
of the affected component(s). Due to the decentralized nature        dynamic components, i.e., components that change their com-
of DeSARM, this update can be received at any node. Upon             munication patterns at runtime based on their state or phase
receipt, the node will remove from its AML all entries related       of execution, and the impact on DeSARM. Also of interest,
to the removed component(s) that contain the component(s)            is addressing the mechanism’s capacity for discovering the
as a sender or receiver and use gossip to propogate these            full behavioral architecture of a system such as identifying
deletions to other nodes within the system. Note that archi-         component interaction protocols.
tectural changes that result in new components being added              In the current experiments, we used a fixed architecture to
are handled automatically by DeSARM as described here.               illustrate how DeSARM converges with and without failures
   The fourth issue is the overhead of the gossip mechanism          and how it operates equally well with two or thirty physical
in relation to both network traffic as well as latency in the        machines. In the future, we will run experiments in which a va-
sending and receiving of component messages. DeSARM has              riety of software architectures are generated following certain
very little network overhead for two reasons: the size of the        distributions for parameters such as number of components
AMLs and its use of selective gossiping. As the AMLs contain         and communication pattern types.
only unique tuple entries of component interactions, the size           As mentioned previously, DeSARM is being integrated
of the AMLs are kept small. This in effect reduces the overall       with an existing application recovery mechanism (ARM) that
bandwidth needed for gossip exchanges. Also, as mentioned            allows for the runtime recovery of component(s) due to node
earlier, selective gossiping allows for the containment of           failure [1]. Through the peer sampling capability of gossip
network traffic to only those nodes that are part of the same        [18], DeSARM is able to identify suspected node failures and
application which further reduces the communication overhead         alert the ARM. Once the node failure has been verified, the
of the mechanism. With respect to the latency of component           ARM proceeds to recover the failed component(s) to a new
messaging, DeSARM can keep it at a minimum because, as               node and re-instantiates them based on the current architecture
previously mentioned, all major functionality is separated into      provided by DeSARM. As a result of the decentralized nature
different threads. This allows for all other processing to take      of DeSARM, component recovery can be initiated from any
place in the background thus reducing any major impact the           node as each maintains a complete model of the architecture.
gossip mechanism will have handling component messages.              This integration will allow for the development of applications
                                                                     that are resilient to failures.
               VII. C ONCLUDING R EMARKS                                                 ACKNOWLEDGEMENTS
  This paper described DeSARM, a completely decentralized              This work was partially supported by the AFOSR grant
and automated approach for software architecture discovery           FA9550-16-1-0030 and the Office of Research Computing at
of distributed systems based on gossiping and message trac-          George Mason University.
ing. Through message tracing, DeSARM is able to identify
important architectural characteristics such as components
and connectors in addition to synchronous and asynchronous
communication patterns. Furthermore, through the use of
gossiping, DeSARM exhibits the properties of scalability,
global consistency among participating nodes, and resiliency
                             R EFERENCES                                       [18] Kermarrec, Anne-Marie, and Maarten Van Steen. ”Gossiping in dis-
                                                                                    tributed systems.” ACM SIGOPS Operating Systems Review 41.5
 [1] Albassam, Emad, et al. “Model-based Recovery Connectors for Self-
                                                                                    (2007): 2-7.
     Adaptation and Self-Healing.” Proc. 11th Intl. Joint Conf. Software
                                                                               [19] Koschke, Rainer. “Architecture reconstruction.” Software Engineering.
     Technologies (ICSOFT 2016), July 24-26, 2016, Lisbon, Portugal.
                                                                                    Springer Berlin Heidelberg, 2009. pp. 140–173.
 [2] http://orc.gmu.edu/research-computing/argo-cluster/
                                                                               [20] Koutra, Danai, et al. Algorithms for graph similarity and subgraph
 [3] Bojic, Dragan, and Dusan Velasevic. “A use-case driven method of ar-
                                                                                    matching. Technical Report Carnegie-Mellon-University, 2011.
     chitecture recovery for program understanding and reuse reengineering.”
                                                                               [21] Kramer, Jeff, and Jeff Magee. “Self-managed systems: an architectural
     IEEE Conf. Software Maintenance and Reengineerig, 2000, pp. 23–32.
                                                                                    challenge.” Future of Software Engineering, 2007. FOSE’07. IEEE,
 [4] Corazza, Anna, Sergio Di Martino, and Giuseppe Scanniello. “A proba-
                                                                                    2007.
     bilistic based approach towards software system clustering.” IEEE 14th
                                                                               [22] Menascé, Daniel A., John Ewing, Hassan Gomaa, Sam Malek, and
     Conf. Software Maintenance and Reengineering, 2010, pp. 88-96.
                                                                                    Joao P. Sousa, “A framework for utility-based service-oriented design in
 [5] Demers, Alan, et al. “Epidemic algorithms for replicated database
                                                                                    SASSY,” First Joint WOSP/SIPEW Intl. Conf. Performance Engineering,
     maintenance.” Proc.Sixth Annual ACM Symp. Principles of distributed
                                                                                    San Jose, CA, USA, January 28-30, 2010.
     computing. ACM, 1987.
                                                                               [23] Menascé, Daniel A., Hassan Gomaa, Sam Malek, and Joao P. Sousa.
 [6] Dulman, Stefan, and Eric Pauwels. “Self-Stabilized Fast Gossiping
                                                                                    ”SASSY: A framework for self-architecting service-oriented systems.”
     Algorithms.” ACM Tr. Autonomous and Adaptive Systems (TAAS), 10.4
                                                                                    Software, IEEE 28.6 (2011): pp. 78–85.
     (2015): 29.
                                                                               [24] Naseem, Rashid, Onaiza Maqbool, and Siraj Muhammad. “Improved
 [7] Esfahani, Naeem, Eric Yuan, Kyle R, Canavera and Sam Malek. “In-
                                                                                    similarity measures for software clustering.” , 2011 15th IEEE European
     ferring Software Component Interaction Dependencies for Adaptation
                                                                                    Conf. Software Maintenance and Reengineering (CSMR), 2011.
     Support.””ACM Tr. Autonomous and Adaptive Systems (TAAS), 10.4            [25] Oreizy, Peyman, et al. “An architecture-based approach to self-adaptive
     (2016): 26.                                                                    software.” IEEE Intelligent systems 14.3 (1999): pp. 54–62.
 [8] Ewing, John, and Daniel A. Menascé, “A Meta-controller method            [26] Porter, Jason, Daniel A. Menascé, and Hassan Gomaa, “Decentralized
     for improving run-time self-architecting in SOA systems,” Proc. 5th            Software Architecture Discovery in Distributed Systems.” Technical
     ACM/SPEC Intl. Conf. Performance Engineering (ICPE 2014), Dublin,              Report GMU-CS-TR-2016-2, Dept. of Computer Science, George Ma-
     Ireland, March 23-26, 2014.                                                    son University, 2016. (http://cs.gmu.edu/ tr-admin/papers/GMU-CS-TR-
 [9] Foggia, Pasquale, Carlo Sansone, and Mario Vento. “A performance               2016-2.pdf).
     comparison of five algorithms for graph isomorphism.” Proc. 3rd IAPR      [27] RASS Project, http://cs.gmu.edu/˜menasce/rass/
     TC-15 Workshop on Graph-based Representations in Pattern Recogni-         [28] Riva, Claudio, and Jordi Vidal Rodriguez. “Combining static and dy-
     tion. 2001.                                                                    namic views for architecture reconstruction.” Proc. 6th IEEE European
[10] Garlan, David, et al. “Rainbow: Architecture-based self-adaptation with        Conf. Software Maintenance and Reengineering, 2002.
     reusable infrastructure.” Computer 37.10 (2004): pp. 46–54.               [29] Riviere, Etienne, and Spyros Voulgaris. “Gossip-based networking for
[11] Garlan, David, and Bradley Schmerl. “Using architectural models at             internet-scale distributed systems.” E-Technologies: Transformation in a
     runtime: Research challenges.” European Workshop on Software Archi-            Connected World. Springer Berlin Heidelberg, 2011. pp. 253–284.
     tecture. Springer Berlin Heidelberg, 2004.                                [30] Sartipi, Kamran, and Nima Dezhkam. “An Amalgamated Dynamic and
[12] Gomaa, Hassan, and Koji Hashimoto, 2012. “Dynamic self-adaptation              Static Architecture Reconstruction Framework to Control Component
     for distributed service-oriented transactions,” Proc. 7th Intl. Symp.          Interactions 259.” 14th IEEE Working Conf. Reverse Engineering, 2007.
     Softw. Eng. for Adaptive and Self-Managing Systems, SEAMS 12. IEEE        [31] Schmerl, Bradley, et al. “Discovering architectures from running sys-
     Press, Piscataway, NJ, USA, pp. 11-20.                                         tems.” IEEE Tr. Software Engineering, 32.7 (2006): pp. 454–466.
[13] Gomaa, Hassan, Koki Hashimoto, Minseong Kim, Sam Malek, and               [32] Taylor, Richard N., Nenad Medvidovic, and Eric M. Dashofy. Software
     Daniel A. Menascé, 2010. “Software adaptation patterns for service-           architecture: foundations, theory, and practice. Wiley Publishing, 2009.
     oriented architectures,” Proc. 2010 ACM Symp. Applied Computing,          [33] Vasconcelos, Aline, and Cludia Werner. “Software architecture recovery
     New York, NY, USA, pp. 462469. doi:10.1145/1774088.1774185.                    based on dynamic analysis.” Brazilian Symp. on Softw. Engineering.
[14] https://code.google.com/archive/p/java-gossip/                                 2004.
[15] Huang, Gang, Hong Mei, and Fu-Qing Yang. “Runtime recovery and            [34] Weyns, Danny, and Tanvir Ahmad. “Claims and evidence for
     manipulation of software architecture of component-based systems.”             architecture-based self-adaptation: A systematic literature review.” Soft-
     Automated Software Engineering, 13.2 (2006): pp. 257–281.                      ware Architecture. Springer Berlin Heidelberg, 2013. pp. 249–265.
[16] Israr, Tauseef, Murray Woodside, and Greg Franks. “Interaction tree       [35] Yuan, Eric, and Sam Malek. “Mining Software Component Interactions
     algorithms to extract effective architecture and layered performance           to Detect Security Threats at the Architectural Level.” Proc. 13th
     models from traces.” J. Systems and Software 80.4 (2007): pp. 474–             Working IEEE/IFIP Conf. Software Architecture. 2016.
     492.                                                                      [36] Zager, Laura A., and George C. Verghese. “Graph similarity scoring and
[17] Kephart, Jeffrey O., and David M. Chess. “The vision of autonomic              matching.” Applied mathematics letters 21.1 (2008): pp. 86–94.
     computing.” IEEE Computer 36.1 (2003): pp. 41–50.