=Paper=
{{Paper
|id=None
|storemode=property
|title=Inter-Project Dependenciesin Java Software Ecosystems
|pdfUrl=https://ceur-ws.org/Vol-837/poster17.pdf
|volume=Vol-837
|dblpUrl=https://dblp.org/rec/conf/dateso/ProchazkaLR12
}}
==Inter-Project Dependenciesin Java Software Ecosystems==
<pdf width="1500px">https://ceur-ws.org/Vol-837/poster17.pdf</pdf>
<pre>
                         Inter-Project Dependencies
                         Inter-Project Dependencies
                        in Java Software Ecosystems
                        in Java Software Ecosystems
                                          1                 2                3
                     Antonín Procházka , Mircea Lungu , Karel Richta
                                      1              2               3
                     Antonı́n Procházka , Mircea Lungu , Karel Richta
        1
         Czech Technical University in Prague, 2 University of Bern,3 Charles University in
        1
                                             Prague
          Czech Technical University in Prague, 2
                                                  University of Bern,3 Charles University in
                                             Prague


             Abstract     Understanding the legacy of code in a software ecosystem
             is critical for the organization that is the owner of the ecosystem as
             well as for individual developers that work on particular systems in
             the ecosystem. Model driven development (MDD) and model driven
             architecture (MDA) techniques for describing inter-project dependencies
             are rarely used or they're not updated by anyone during software evolution
             process. Describing the dependencies by hand can be painful and error
             prone process. Another solution is recovering the dependencies using
             some reverse-engineering process. There are some existing technologies
             today. One of them is an Ecco model of inter-project dependencies with
             a set of methods for recovering the dependencies from Smalltalk based
             software ecosystems developed by Lungu et al. Aim of our research is
             applying this model with its methods on Java based software ecosystem.


   Keywords
   Model Driven Development, Software Ecosystems, Inter-Project Dependencies,
   Java, Reverse Engineering


   1        Introduction
   Software engineering is concentrated mostly on individual projects nowadays.
   We've got sophisticated methods and tools for project management, version
   management, refactoring, testing, deployment and so on. But projects are rarely
   developed individually. They coexist together, evolve together and benet from
   each other. We call these systems of projects software ecosystem s. Like other
   terms connected to computers, the term ecosystem comes from biology. In nature
   we dene an ecosystem as the complex of a community of organisms and its
                                                       1
   environment functioning as an ecological unit . In the context of software engineering
   the ecosystem is dened as a collection of software projects which are developed
   and co-evolve in the same environment [2]. Example of such software ecosystem
   can be a company developing software, an open-source community or a research
   group. As every project is located in its version control repository, we dene

    1
        Webster's Dictionary denition.


J. Pokorný, V. Snášel, K. Richta (Eds.): Dateso 2012, pp. 135–142, ISBN 978-80-7378-171-2.
136       Antonı́n Procházka, Mircea Lungu, Karel Richta


a super-repositories as a collection of all the version. control repositories for
multiple software projects [3].
      Looking at the software from a point of view of software ecosystems uncovers
wide range of important information which help managers to manage their
teams and projects and also help individual developers to better understand
their work. Analysis of software at the abstraction level of software ecosystems
can be either focused on the projects or on the developers in the ecosystem. Our
work is currently focused on projects and their relationships inside a software
ecosystem. We extend previous work of Lungu et al. [4] focused on recovering
inter-project dependencies in Smalltalk ecosystems. In their work they argued for
importance of raising abstraction of view on software products from individual
projects to whole software ecosystems. They presented several viewpoints at
this abstraction level including the inter-project dependency viewpoint. Each
viewpoint, including this one, provides two areas of research. One is own visualization.
Having an interesting information is not enough - we also need to know how to
present it to the user. The second area is information retrieval. Before we can
present some information, we need to get it by some technique from some source.
At rst we focus on inter-project information retrieval from java based software
ecosystems.
      Structure of this paper is following: In section 2 we describe a model used
to store retrieved information. Section 3 summarizes information specic about
inter-project dependencies specic for Java base software ecosystems. Evaluation
of dierent methods for dependency information retrieval is described in section
4. In section 5 we discuss contribution of this work and outline our further
research to be performed on this topic.


2      Ecco model
Lungu et. al presented in their work a lightweight model describing inter-project
dependencies called Ecco. They dened the model and lled it up with information
about inter-project dependencies present in selected Smalltalk based software
ecosystems.
      The Ecco model consist of four main elements.


Ecosystem. In relation to the Ecco model the ecosystem means a set of software
      projects and dependencies between them.
Pro ject. Every software ecosystem consists of one or more projects. Modules
      of each project call some methods and dene another. A project can call
      a method which is dened in another project. Methods like this are called
      requirements.
Dependency. When one project require some method and another denes
      it, we call this relationship a dependency. The dependency consists of a
      client project, which requires the methods, and of a provider project, which
      provides the required methods. The methods making the dependency between
      two projects are called elements of dependency.
                     Inter-Project Dependencies in Java Software Ecosystems   137


Fig. 1. Ecco is a very lightweight model aimed at extracting dependencies between
projects in an ecosystem [4]


Dependency Extraction Strategy. There are several existing techniques for
    gathering information about inter-project dependencies and others can be
    dened in future. Techniques like this are called dependency extraction
    strategies. We include them in the model to be able to compare them during
    our research process.


3    Java Dependencies

In general we have two types of dependency extraction strategies. The rst
type reuses information existing explicitly in software super-repositories. The
disadvantages of such sources are limited availability in dierent ecosystems and
error-prone and time-wasting maintenance. On the other hand, this source is
very important during research because it tells us what results to expect during
evolution of the second type of dependency extraction strategies.

    The second type is base on reverse-engineering of source code. In contrast to
the rst type, this one can be used on any kind of super-repository and doesn't
need any maintenance at all. However it is harder to retrieve the information
this way.
138       Antonı́n Procházka, Mircea Lungu, Karel Richta


3.1     Pro ject Ob ject Model


If we'd like to nd some reverse-engineering strategy for recovering inter-project
dependencies in Java based software ecosystems, we rst need to nd proper
source of data. We need to have a super-repository which will provide us both
the explicit data and source code which we'll reverse-engineer.
      Looking for such super-repository we found Apache Maven best suits our
needs. Maven is a project-centric tool for software development. Its data structures
contain dierent information about each project enabling to manage project's
build, reporting and documentation. Whole Maven stands on technology called
Project Object Model (POM) [1]. Every project has its own so-called POM-
le, which is an XML le containing all the information relevant to this project
like the developers working on it, the path of its sources, required binaries,
the builder, the documentation manager, the bug tracking system and much
more. It includes the explicit information about the inter-project dependencies.
This information has to be compounded from four inter-project relationships
described in the POM: dependencies, exclusions, inheritance and aggregation.
There's also a le called Super-POM which denes value common for all project
in the Maven repository unless they are redened. A simple POM with one
dependency can look like this:


<p r o j e c t >
   <m o d e l V e r s i o n > 4 . 0 . 0 < / m o d e l V e r s i o n >
   <g r o u p I d >c z . c v u t . f i t . s w i n g </g r o u p I d >
   < a r t i f a c t I d >my− p r o j e c t </ a r t i f a c t I d >
   <v e r s i o n >1.0 </ v e r s i o n >
   <d e p e n d e n c i e s >
       <d e p e n d e n c y>
          <g r o u p I d >j u n i t </g r o u p I d >
          < a r t i f a c t I d >j u n i t </ a r t i f a c t I d >
          <v e r s i o n >4.0 </ v e r s i o n >
          <t y p e >j a r </t y p e >
          <s c o p e >t e s t </ s c o p e >
          <o p t i o n a l >t r u e </ o p t i o n a l >
       </d e p e n d e n c y>
   </ d e p e n d e n c i e s >
</ p r o j e c t >


Dependencies. If one project depends directly on another then the information
is described in a dependencies section. This section is located in POM le of the
project which requires these dependencies - the Client Project from the Ecco's
point of view. These dependencies can also be transitive. Transitive dependency
means that if a client project A requires a project B which requires a provider
project C, C becomes common requirement for both A and B. Dependencies
here are divided into 5 scopes:
                      Inter-Project Dependencies in Java Software Ecosystems   139


   A Compile Scope is a default scope representing group of regular projects
which are available with their source code and are necessary for successful build
of a Client Project. The Compile Scope dependencies are transitive.

   A Provided Scope represents a group of precompiled projects expected to be
given at compile time by Software Development Kit (SDK), container or another
way. The Provided Scope dependencies are not transitive.

   A Runtime Scope is much like the Provided Scope but represents projects
expected to be given at runtime. The Runtime Scope dependencies are not
transitive as well.
   A Test Scope is like the Compile Scope but represents projects needed for
testing purposes. The Test Scope dependencies are transitive as well as the
Runtime Scope.

   A System Scope is similar to the Provided Scope but requires a developer
to provide its dependencies explicitly. The System Scope dependencies are not
transitive as well as the Provided Scope.
   As we'll be examining only projects contained in a given ecosystem, we are
interested only in the Compile Scope dependencies. Possibly we can be also
interested in the Test Scope dependencies if we'll extend our analysis to project's
used for testing purposes.


Exclusions. Transitive dependencies can produce unwanted behavior. If a developer
needs to exclude some project from the dependency list she includes it into the
exclusions section of the dependency which causes the problem. The meaning of
the exclusions during populating the Ecco model is obvious. We should respect
these exclusions and throw away dependencies excluded by them.


Inheritance. The Project Object Model brings a feature which enables us to
make an inheritance tree of projects. From the view of POM this means that
if we dene something in an ancestor project's POM le, all its child project
inherit these denitions unless they are redened in a child project's POM les.
There are two points important for us. First, the inheritance relationship itself
represents a dependency and we have to to think about it this way. Second,
dependencies of ancestor client projects become dependencies of child client
projects since these two projects are in inheritance relationship.


Aggregation. If a project is made of a modules, Maven thinks about the modules
as about separated projects which are aggregated into another project called
multi-module project. This relationship is described in the multi-module project's
POM le in a modules section. As the modules are expected to belong to the
same group as their multi-module project, they are dened only by their project
names. From our point of view, the aggregation relationship represents another
way to express the inter-project dependencies between the modules and the
multi-module project.
140       Antonı́n Procházka, Mircea Lungu, Karel Richta


3.2     Java Bytecode


When we think about a reverse-engineering of a Java software, we are not limited
only to a Java language. We can think of any language which can be compiled to
a Java Bytecode. The original information can be simply disassembled from the
byte-code [6]. Consider this simple class denition written in the Java language:

import       j a v a . awt . ∗ ;
import       java . applet . ∗ ;


public       class     DocFooter          extends         Applet      {
    String      date ;
    String      email ;


    public      void     init ()      {
        resize (500 ,100);
        d a t e = g e t P a r a m e t e r ( "LAST_UPDATED" ) ;
        e m a i l = g e t P a r a m e t e r ( "EMAIL " ) ;
    }


    public      void     paint ( Graphics            g)       {
        g . drawString ( date + "              by    " ,100 ,      15);
        g . drawString ( email , 2 9 0 , 1 5 ) ;
    }
}

      If we call javap DocFooter to disassemble a DocFooter.class, we get this
output:

Compiled        from     DocFooter . j a v a
public       class     DocFooter
        extends       java . a p p l e t . Applet         {
    java . lang . String           date ;
    java . lang . String           email ;
    public      DocFooter ( ) ;
    public      void     init ();
    public      void     p a i n t ( j a v a . awt . G r a p h i c s ) ;
}

      Passing some arguments will give us also a disassembly of a behavior, but
this interface declaration is all what we need. We've got fully qualied name of
every class and method used in the compiled code.
      This is how our reverse-engineering dependency extraction strategies will
look like. At rst we take a Java Archive. Every java project is distributed as
a Java Archive. The archive is a regular compressed package of data containing
a Class Files. Every Class File contains a byte-code of one Java class. We open
the archive, disassemble every class le and see which methods are called and
which are dened. We ll this information into the Ecco model. Information
                      Inter-Project Dependencies in Java Software Ecosystems             141


gathered this way needs some more processing before we'll get reliable result.
This post-processing is topic of our further research.


4      Evaluation of Results

To let us compare dierent inter-project dependency retrieval techniques we need
to have a measuring method to let us assign a value to each technique. For this
purpose we'll use well-known information retrieval metrics - a precision, a recall
and an F-measure [5] adopted for our case by Lungu et al. [4]. To use them we
rst need a golden standard or an oracle. This is the information we retrieve
from Maven's POM. Thanks to this information we are able to distinguish a
Relevant dependencies which are present in the oracle and a Nonrelevant which
are not present in the oracle. Besides this we can divide the dependencies to
those which have or have not been retrieved by a concrete reverse-engineering
technique. In common we get four dierent statistical sets of dependencies which
can be seen in table 1.


Table 1.                             Statistical sets of retrieved inter-project dependencies
[5]

                                   Relevant        Nonrelevant
                                   (T P ∪ F N )    (F P ∪ T N )
                     Retrieved     True Positives False Positives
                     (T P ∪ F P )  (T P )          (F P )
                     Not Retrieved False Negatives True Negatives
                     (F N ∪ T N ) (F N )           (T N )


      The metrics are then dened as follows. The Precision (P ) is a fraction of
retrieved dependencies that are relevant. The Recall (R) is a fraction of relevant
documents that are retrieved. The F-measure (F ) is the weighted harmonic mean
of precision and recall. The F-measure represents a single measure that trades
o the precision versus the recall and thus indicates an overall accuracy of the
measured technique.


                   P = |T P|T∪F
                             P|
                                P|       R = |T P|T∪F
                                                    P|
                                                      N|      F1 = P2P+R
                                                                       R


      We use a default balance F-measure (F1 ) which equally weights the precision
and the recall because we don't want to emphasize the recall nor the precision.

      During evaluation of our reverse-engineering techniques we'll calculate these
values for each technique and compare them. This comparison will give us the
required information about the technique's eectivity.
142     Antonı́n Procházka, Mircea Lungu, Karel Richta


5     Conclusion
The information summarized in this paper gives us excellent base for our further
research aimed on dierent reverse-engineering techniques for retrieval of inter-
project dependencies in the Java based software ecosystems. We have an excellent
source of data which will help us with a development of the techniques. Using
the explicitly given information about the dependencies and using the mentioned
metrics we are able to compare every techniques and tell which one better suits
our needs. We found a way which lets us to retrieve the dependencies from any
language which can be compiled to the Java byte-code. In connection with the
work done by Lungu et al. on the Smalltalk based software ecosystem we'll be
also able to summarize dierences between a dependency retrieval from statically
and dynamically typed languages.


6     Acknowledgments
We would like to thank for nancial support of Student Grant Competition of
CTU in Prague, grant number SGS12/093/OHK3/1T/18.


References
1. Apache. Maven project, 2002.
2. Lungu, M. Reverse Engineering Software Ecosystems. PhD thesis, University of
   Lugano, 2009.
3. Lungu, M., Lanza, M., Girba, T., and Heeck, R. Reverse engineering super-
   repositories. In Proceedings of the 14th Working Conference on Reverse Engineering
   (Washington, DC, USA, 2007), IEEE Computer Society, pp. 120129.
4. Lungu, M., Robbes, R., and Lanza, M. Recovering inter-project dependencies
   in software ecosystems. In Proceedings of the IEEE/ACM international conference
   on Automated software engineering (New York, NY, USA, 2010), ASE '10, ACM,
   pp. 309312. ACM ID: 1859058.
5. Manning, C., Raghavan, P., and Schtze, H. Introduction to Information
   Retrieval. Cambridge University Press New York, NY, USA, 2008.
6. Oracle. Java se documentation, February 2010.

</pre>