5th Symposium on Conceptual Modelling Education (SCME 2017)


    Visualizing Code Variabilities for Supporting Reuse
                        Decisions

                        Anna Zamansky and Iris Reinhartz-Berger

               Department of Information Systems, University of Haifa, Israel
                        annazam@is.haifa.ac.il, iris@is.haifa.ac.il


       Abstract. Software reuse is the practice of using artifacts from existing systems
       to build new ones. It has been shown effective for improving quality and main-
       tainability and for reducing cost and development time. Human factors have been
       identified as significant barriers to a wider adoption of reuse practices in industry.
       In this paper we consider a tool-supported approach for systematic reuse of ob-
       ject-oriented programs (written in Java) based on polymorphism-inspired mech-
       anisms. The suggested tool gets as input implementations of multiple products,
       and produces a visual representation of the similarities and variabilities between
       their classes in terms of exhibits behaviors, as well as presents possible reuse
       options. We discuss the suitability of this approach for educational and training
       settings, and specifically for supporting reuse decisions of novice developers.

       Keywords: Software reuse, Education, Decision Support, Software Product Line
       Engineering, Variability Analysis, Variability Mechanisms, Polymorphism


1      Introduction

Development has become increasingly complex while reducing time-to-market remains
a critical issue. Software reuse has been shown to be effective for reducing cost and
development time [14]. However, there are significant barriers in the adoption of reuse
practices in industry. As pointed out in [2], “initial research on software reuse has fo-
cused on the technological issues (e.g., programming language support, creating and
retrieving reusable artifacts, repositories, etc.), and only later non-technical factors
(e.g., organization, processes, business drivers) were found to be important for the suc-
cess of a reuse strategy”.
   Recently more attention has been drawn to the human factors of reuse practices,
focusing mainly on decision making processes of developers [12], [20]. In particular,
empirical studies suggest reuse training is an important factor for improving reuse prac-
tices [5], [4]. Nevertheless, work on how to educate for reuse is scarce. Frakes and Kang
[6] stress the need for addressing reuse education: “Industry studies have shown that
education is a primary factor in better reuse, yet there had been little systematic study
of how best to do reuse education. Certainly, both academia and industry could improve
educational practices”.


                                                25
                   5th Symposium on Conceptual Modelling Education (SCME 2017)


   In this paper we propose a tool-supported approach for educating and training nov-
ices and supporting reuse decisions. In [19] we presented a tool for comparing pairs of
software artifacts (object-oriented code) and representing their similarities and varia-
bilities. The tool, called VarMeR – a Variability Mechanisms Recommender, is based
on an ontological framework that compares software behaviors rather than concrete
implementations [16], [17], [18]. This way software systems that have similar inten-
sions (i.e., exhibit similar behaviors) can be considered for reuse, even if their imple-
mentations are different (e.g., contains different components).
   We particularly explore the suitability of VarMeR to assist novice developers in re-
use decisions. To this end, VarMeR was extended to compare an arbitrary number of
software artifacts (rather than pairs of artifacts). In other words, the input of VarMeR
is object-oriented code artifacts (in Java) that belong to multi software systems and the
output is a graph that captures the similarities and variabilities of the classes of those
systems in terms of their exhibited behavior. The tool further recommends how to in-
crease reuse by utilizing suitable polymorphism-inspired mechanisms.
   The rest of this paper is structured as follows. Section 2 provides an overview of our
proposed approach, while Section 3 presents the capabilities of the extended version of
the VarMeR tool (for supporting reuse when multi software products are available).
Section 4 describes the benefits of the tool and its possible use scenarios in educational
and training contexts, as well as some preliminary usability feedback. Finally, Section
5 summarizes and refers to future plans.


2            The VarMeR Approach

VarMeR analyzes the commonality and variability of products behaviors and presents
the analysis outcomes in the form of polymorphism-inspired mechanisms among clas-
ses that behave similarly (even if their realizations are different). Specifically, the ap-
proach is composed of three steps, shown in Figure 1: Extract Behaviors, Compare
Behaviors, and Analyze Variability.

                                         Products’
                                         representations                Similar elements
        P1


    .
    .                                              Compare                             Analyze           Reuse
                   Extract
                  Behaviors                        Behaviors                          Variability        Recommendations
    .


        Pn          Ontological                       Similarity                           Variability
                    foundation                        measures                             mechanisms


                                  Figure 1. A high level overview of the approach


2.1          Extracting Behaviors
   Based on ontological considerations [16], a software behavior can be represented as
a triplet of initial state – the status of the software before the behavior occurs, external


                                                                   26
               5th Symposium on Conceptual Modelling Education (SCME 2017)


event – that triggers the behavior, and final state– the status of the software after the
behavior occurs. Those behavioral components are extracted from the public operations
of the different classes1. Each public class operation specifies some behavior of the
software product that is widely relevant within the product. We assume that the opera-
tion name captures the essence of the behavior and thus can describe its trigger (the
external event), e.g., Borrow and Return of a Book Copy class in a library management
system (see Figure 2).


           Borrow:
           InitialState = {AvailabilityStatus:Boolean, BorrowingPeriod:int}
           ExternalEvent = Borrow
           FinalState = {AvailabilityStatus:Boolean, Borrow:void}

                          Figure 2. An example of behavior extraction

   For extracting initial and final states, we distinguish between two levels of operation
descriptor: shallow – which refers to the signature of the operation, and deep – which
takes into consideration the behavior in terms of attributes used and modified through-
out the operation (including those that are used and modified indirectly by operations
called from the analyzed operation). We consider only attributes and ignore local vari-
ables, as the later can be defined for implementation and realization purposes and may
hinder the operation’s behavior essence. The initial state of the behavior is composed
of all the parameters passed to the operation (part of shallow) and all the class attributes
used (read) by the operation (part of deep). The final state consists of the operation
name and its returned type (part of shallow) and all the class attributes modified (set)
by the operation (part of deep). Figure 2 exemplifies the behavior extraction outcome
for the operation Borrow of the Book Copy class. Note that each attribute is presents
via its name and type which provide the basis for comparison.


2.2    Compare Behaviors
   Different methods have been proposed for measuring the similarity of applications
or software systems. McMillan et al. [11], for example, propose an approach called
CLAN that measures similarity of Java applications using the notion of semantic layers
that correspond to packages and class hierarchies. As opposed to that approach and


1 The assumption is that private and protected operations are introduced for implementation pur-

   poses and thus hinder the exhibited behavior of the analyzed software product.


                                                  27
                5th Symposium on Conceptual Modelling Education (SCME 2017)


many other methods, which take into account structural implementation and realization
considerations, VarMeR measures the behavioral similarity of software systems.
    To this end, a similarity mapping between the behavior constituents (namely, initial
state, external event, and final state) is applied. This mapping can be based on existing
general-purpose or domain-specific similarity metrics or some combination of such
metrics. The metrics can be based on semantic nets or statistical techniques to measure
the distances among words and terms [13]. Alternatively, they can use type or sche-
matic similarities, potentially ignoring the semantic roles or essence of the compared
elements [7]. The similarity mapping associates to each operation’s constituent (param-
eter, attribute used, or attribute modified) all of its similar counterparts in the other
operation (i.e., elements whose similarity with the given constituent exceeds some pre-
defined threshold).
    Returning to our example of Book Copy, assume a class named Car which has an
operation named Rent that changes the In Agency status of a car from true to false. It
further calculates the Back Date according to the Rental Period. Figure 3 exemplified
two potential similarity mappings: the first one (a) is based on a schematic type-based
similarity according to which two attributes are similar if and only if their types are
similar. The second mapping (b) is based on a semantic measure named Latent Seman-
tic Analysist (LSA) [10].


                 bo                    ca
                   ok opy                r
                                                          bo
                                                            ok opy               ca
                      c                                        c                   r


      Figure 3. Examples of similarity mappings based on: (a) type-based similarity and
                              (b) semantic similarity (LSA)


2.3      Analyze Variability
  Based on the similarity mapping, we can distinguish between the following cases
among (public) operations:
  1.     USE – the similarity mapping is bijection (each constituent of operation 1 has exactly
         one counterpart in operation 2 and vice versa).
  2.     REF (abbreviation for refinement) – at least one constituent in operation 1 has more than
         one counterpart in operation 2.


                                                28
                 5th Symposium on Conceptual Modelling Education (SCME 2017)


     3.   EXT (abbreviation for extension) – at least one constituent in operation 1 has no coun-
          terpart in operation 2.
    Note that REF and EXT are not mutually exclusive; we refer to a combination of
both as REF-EXT (abbreviation for refined extension).
    Aggregating the above notions from the level of operations to the level of classes,
we take inspiration from the polymorphism mechanisms. Polymorphism is the provi-
sion of a single interface to entities of different types. Therefore, the cases of polymor-
phism are characterized by similar signatures of operations (namely, the USE category
in the shallow level of the operations). We further focus on three types of polymorphism
which are widely used in industry:
     1.   Subtyping (inclusion) polymorphism which includes refinement or extension of behav-
          iors (e.g., function pointers, inheritance).
     2.   Parametric polymorphism which includes name or type analogy (e.g., C++ templates).
     3.   Overloading which includes behavior change while maintaining the same signature.
   Table 1 presents recommendations for those polymorphism-inspired mechanisms
based on the reuse mapping characteristics.
               Table 1. Characteristics of Polymorphism-Inspired Mechanisms
    Shallow       Deep             Description                           Polymorphism-Inspired
                                                                         mechanism
    USE           USE              Both signatures and behaviors are     Parametric
                                   similar
    USE           REF              Signatures are similar and behavior
                                   is refined
    USE           EXT              Signatures are similar and behavior
                                                                         Subtyping
                                   is extended
    USE           REF-EXT          Signatures are similar and behavior
                                   is both refined and extended
    USE           Not mapped       Signatures are similar and behavior   Overloading
                                   is different


3         The VarMeR Tool

    In order to make our approach accessible to developers (potentially novice ones and
students), we developed a tool named VarMeR – Variability Mechanisms Recom-
mender. While the first version of the tool, described in [19], concentrated on analyzing
the commonality and variability between a pair of software products, the current ver-
sion extends the scope to a multi-product setting. This way the tool aims at supporting
reuse decisions and particularly the selection of the most suitable products (or product
parts) to reuse.
    The inputs of the tool, namely the software products, are provided as (paths to) jar
files. Those files are reverse engineered into class diagrams (in XMI format) and Pro-
                                   2
gram Dependence Graphs (PDG) [8] (in JSON format). The shallow and deep levels
of the behaviors are extracted from those representations and the tool proceeds in the

2   PDG explicitly represents the data and control dependencies of a program.


                                                 29
              5th Symposium on Conceptual Modelling Education (SCME 2017)


three stages described in the previous section. The outcome is presented visually in
three levels of analysis, using graph-based representations:
• Product level with nodes representing the software products and edges representing
     their degrees of similarity.
• Class level with nodes representing the classes of the analyzed software products and
     edges representing recommendation on polymorphism-inspired mechanisms (paramet-
     ric, subtyping, and overloading).
• Operation level with nodes representing operations (of a certain sub-set of classes of
     software products) and edges representing the mechanism characteristics listed in Table
     1 (USE, REF, EXT, REF-EXT).
   In the product and class levels, the size of the nodes is proportional to the size of
objects they represent; the larger the node is, the more operations the class have or the
more classes the product has. The width of an edge, as well as its length, represents the
degree of evidence (e.g., the number of operations related with a certain type of poly-
morphism); the thicker/longer the link is, the more evidence exist.
   An example of VarMeR output at the class level is depicted in Figure 4. The com-
parison is done between three different software products, the classes of which are rep-
resented using different colors. To support scalability, VarMer provides the user with
several possibilities for fine-tuning and information hiding, which are further discussed
in the next section.


4      Potential Use of VarMeR in Education and Training Contexts

Although lack of training has been identified as a major barrier to a wider adoption of
reuse practices in industry, no systematic way to address this problem has been pro-
posed [6]. We suggest the VarMeR tool presented above as a starting point for devel-
oping methods for education and decision support of novice developers and software
engineering students due to its intuitive abstractions, its visualization, and its support
for scalability. These features are discussed below, as well as some usability scenarios
and feedback we have collected regarding VarMeR.


4.1 VarMeR in Education and Training Contexts
   Intuitive abstractions. Krueger [9] highlights the importance of choosing the right
abstraction in the context of reuse: “Why is software reuse difficult? Useful abstractions
for large, complex, reusable software artifacts will typically be complex. In order to use
these artifacts, software developers must either be familiar with the abstractions a priori
or must take time to study and understand the abstractions. The latter case can defeat
some or all of the gains in reusing an artifact.”


                                            30
              5th Symposium on Conceptual Modelling Education (SCME 2017)


                              Figure 4. Class Level Analysis
   VarMeR makes use of graph-based abstractions which are intuitive and easy to un-
derstand: nodes to represent (different types of) elements, and edges to represent their
similarity relations. It also supports switching between different abstraction levels by
supporting multi-level analysis (product-, class- and operation-levels). Another kind of
abstraction made by the VarMeR approach is that comparison between software ele-
ments is made in terms of intensions (i.e., exhibited behavior) rather than in terms of
realizations and implementations. Starting with understanding behavior may be much
easier than starting by understanding old code, especially for novices. As highlighted
by Agresti [1]: “It takes effort to understand old code. The developer is trying to estab-
lish whether the old code will meet some or all of the new requirements so it can be
part of a new system. When that old code is written such that it makes it especially
difficult to understand, a developer can reasonably conclude that her effort is better
spent developing new code from scratch.”
   Visualization. While a large body of research on software visualization exists [3],
to the best of our knowledge visualization for reuse has not yet been addressed. The
type of visualization offered by VarMeR can be classified as what is called in [15]
‘changing the perspective’, or “bringing large software engineering problems within
the scope of a single view,.., an attempt at complexity control, helping to keep a large
problem ‘in a single head’ by visualizing the overall structure and providing some as-
sistance for navigating or traversing that structure.” In VarMeR’s visualization we aim
to increase cognitive effectiveness by the use of simple and intuitive graph-based ele-
ments (nodes and edges), employing also other visual variables such as color (to encode
different products), size (to encode the number of operations/classes in each class/prod-
uct) and edge thickness (to encode extent of similarity).
   Scalability: fine-tuning and information hiding. Reuse decisions might be easy
when considering two simple operations such as Borrow and Rent from Figure 3, but


                                           31
               5th Symposium on Conceptual Modelling Education (SCME 2017)


dealing with large scale projects with hundreds of classes and thousands of behaviors
introduces an additional dimension of complexity when making reuse decisions. To
reduce the user cognitive load, VarMeR supports several ways of information hiding
and fine-tuning at the class level analysis (see Figure 4):
- Modifying thresholds for presenting recommendations for each type of polymor-
     phism (namely, the minimal percentages of “similar” operations to present para-
     metric, subtyping, and overloading edges can be set; the lower these thresholders
     are, the more abstraction is needed to apply reuse for those classes).
- Hiding classes which have no similarity to other classes (Hide Classes button).
- Filtering the graph-based visualization according to different software packages of
     the product (Filter Files button).


4.2 Usability Scenarios and Feedback
    The above features provide a starting point for developing a method for supporting
novice developers and software engineering students in reuse decisions. Consider, e.g.,
a scenario in which a novice developer, or a student in a programming course, needs to
develop a software system. In an industrial setting, the company may have already de-
veloped various similar systems and maintain a repository of software artifacts. In other
settings, some open-source applications may be available. Searching in such reposito-
ries, e.g., using keywords or queries, our developer may discover several similar sys-
tems, not all of which have informative descriptions, and each of which may have many
different versions. Let us assume that our developer finally decides to select five sys-
tems, which according to their descriptions are quite similar to the system he/she needs
to develop. For making a decision which system (or system parts) to reuse, it is useful
to comprehend how the five systems differ.
    This is exactly the point where VarMeR enters the scene, providing assistance to
support such decisions. The developer can run the tool on the five selected systems3
and browse the similarity analysis and recommendations at different levels of abstrac-
tion. The product-level analysis will show him/her which of the systems are most sim-
ilar. Zooming into class level, he/she can identify clusters of classes which can poten-
tially be reused for implementing different behaviors. Zooming again into the operation
level, we obtain information on the reuse relations among operations. At this stage the
developer can zoom into the implementation itself, by looking at the relevant segment
of code in an Eclipse-like environment and adapt the code to the task at hand.
    We are currently in the process of evaluating VarMeR’s usability. In a pilot setting,
we gave the tool to two pairs of students studying in the Information Systems depart-
ment at the University of Haifa. Each pair received two portions of similar Java projects
from SourceForge. They were requested to inspect VarMeR’s outcomes, and grade the
appropriateness of its recommendations. Our analysis of the open-text feedback, pro-
vided by the students and classified by us according to VarMeR’s features, shows that
visualization was positively mentioned – and particularly the use of colors and size.
3 Note that theoretically the developer could run VarMeR on all the game applications. However,

   the complexity of multi-object comparison dramatically increases as the number of objects
   increases. Hence, some a-priori filtering is recommended.


                                              32
               5th Symposium on Conceptual Modelling Education (SCME 2017)


The intuitive abstractions, and specifically the ability to zoom-in and zoom-out between
the product, class, and operation levels, was also considered very useful. The scalability
support was also mentioned as important, but the comprehensibility of three independ-
ent bars (for parametric, subtyping, and overloading mechanisms) was questioned.


5      Summary and Future Work

The human factor is one of the most significant barriers in wider adoption of reuse
practices in industry. Yet aspects of reuse training and decision support have so far been
overlooked in the software engineering literature. We addressed this problem by pre-
senting a tool-supported approach aiming to support developers in making reuse deci-
sions, and discussed its applications in training settings. The VarMeR tool, supporting
our approach, has several features which make it attractive in the context of reuse edu-
cation. It applies intuitive abstractions of software artifacts, and allows for easy switch-
ing between abstraction levels. Furthermore, it uses visualization which employs intu-
itive visual constructs and variables, and aims to reduce cognitive load of developers
when dealing with large scale software projects by allowing for fine-tuning and infor-
mation hiding.
    In the future, we intend to explore several paths for further development of VarMeR
for educational purposes. First, we intend to develop a querying language on top of
VarMeR, in order to retrieve the most relevant artifacts to a given development task.
Second, we intend to systematically support reuse activities. After retrieving the rele-
vant (portions of) software artifacts, we need to explore how to guide the developer in
applying the reuse recommendations. Finally, we are in the process of adapting
VarMeR in an academic software engineering course. Empirical studies with the stu-
dents are planned to evaluate the benefits and limitations of the tool and further improve
the tool. Our long term vision is for VarMeR to be fully integrated in standard devel-
opment environments to promote reuse thinking as an integral part of development.
Acknowledgment. The authors thank Jonathan Liberman for his help in the implemen-
tation of the VarMeR tool. We also thank Alex Kogan and Asaf Mor for their assistance
in the initial steps of the development. The first author was supported by the Israel
Science Foundation under grant agreement 817/15.

References

[1] Agresti, W.W. (2011). Software reuse: developers’ experiences and perceptions. Journal of
    Software Engineering and Applications, 4 (1), pp. 48-58.
[2] Anisa, S. (2015). Do Developers Make Unbiased Decisions? The Effect of Mindfulness and
    Not-Invented-Here Bias on the Adoption of Software Components. ECIS’2015.
[3] Diehl, S. (2007). Software visualization: visualizing the structure, behaviour, and evolution
    of software. Springer Science & Business Media, 2007.
[4] Favaro, J. (1991). What price reusability?: a case study. ACM SIGAda Ada Letters, 11
    (3), ACM, pp. 115-124.


                                               33
               5th Symposium on Conceptual Modelling Education (SCME 2017)


[5] Frakes, W.B. and Fox C.J. (1995). Sixteen questions about software reuse. Communications
     of the ACM, 38 (6): 75-ff.
[6] Frakes, W.B. and Kang, K. (2005). Software reuse research: Status and future. IEEE trans-
     actions on Software Engineering, 31 (7), pp. 529-536.
[7] Kashyap, V. and Sheth, A., (1996). Semantic and schematic similarities between database
     objects: a context-based approach. VLDB Journal, 5(4), pp. 276-304.
[8] Krinke, J. (2001). Identifying Similar Code with Program Dependence Graphs. 8th Working
     Conference on Reverse Engineering, pp. 301-309.
[9] Krueger, C. W. (1992). Software reuse. ACM Computing Surveys (CSUR) 24 (2), 131-183.
[10] Landauer, T. K., Foltz, P. W. and Laham, D. (1998). Introduction to Latent Semantic Anal-
     ysis. Discourse Processes 25, pp. 259-284.
[11] McMillan, C., Grechanik, M. and Poshyvanyk, D. (2012). Detecting similar software appli-
     cations. 34th International Conference on Software Engineering (ICSE’2012), pp. 364-374.
[12] Mellarkod, V., Appan, R., Jones, D. R., & Sherif, K. (2007). A multi-level analysis of factors
     affecting software developers’ intention to reuse software assets: An empirical investiga-
     tion. Information & Management, 44(7), 613-625.
[13] Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based
     measures of text semantic similarity. American Association for Artificial Intelligence
     (AAAI’06), pp. 775-780.
[14] Mohagheghi, P., and Conradi, R. (2007). Quality, productivity and economic benefits of
     software reuse: a review of industrial studies. Empirical Software Engineering 12(5), 471-
     516.
[15] Petre, M., A. F. Blackwell, and T. R. G. Green. (1998) Cognitive questions in software
     visualization. Software visualization: Programming as a multimedia experience, 453-480.
[16] Reinhartz-Berger, I., Zamansky, A., & Wand, Y. (2016). An Ontological Approach for Iden-
     tifying Software Variants: Specialization and Template Instantiation. 35th International
     Conference on Conceptual Modeling (ER’2016), pp. 98-112.
[17] Reinhartz-Berger, I., Zamansky, A., and Kemelman, M. (2015). Analyzing Variability of
     Cloned Artifacts: Formal Framework and Its Application to Requirements. Enterprise, Busi-
     ness-Process and Information Systems Modeling, EMMSAD’2015, pp. 311-325.
[18] Reinhartz-Berger, I., Zamansky, A., and Wand, Y. (2015). Taming Software Variability:
     Ontological Foundations of Variability Mechanisms. 34th International Conference on Con-
     ceptual Modeling (ER'2015), LNCS 9381, pp. 399-406.
[19] Reinhartz-Berger, I. Zamansky, A. (2017). VarMeR - A Variability Mechanisms Recom-
     mender for Software Artifacts. CAiSE-Forum-DC2017, 57-64.
[20] Sojer, M. and Henkel, J. (2010). Code reuse in open source software development: Quanti-
     tative evidence, drivers, and impediments. Journal of the Association for Information Sys-
     tems, 11 (12), 868-901.


                                                34