<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonn Proch</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mircea Lungu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karel Richta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anton n Prochazka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mircea Lungu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karel Richta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Technical University in PraguPe</institution>
          ,
          <addr-line>ra</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Model Driven Development, Software Ecosystems, Inter-Project Dependencies</institution>
          ,
          <addr-line>Java, Reverse Engineering</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Uguneiversity of Bern</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>135</fpage>
      <lpage>142</lpage>
      <abstract>
        <p>Understanding the legacy of code in a software ecosystem is critical for the organization that is the owner of the ecosystem as well as for individual developers that work on particular systems in the ecosystem. Model driven development (MDD) and model driven architecture (MDA) techniques for describing inter-project dependencies are rarely used or they're not updated by anyone during software evolution process. Describing the dependencies by hand can be painful and error prone process. Another solution is recovering the dependencies using some reverse-engineering process. There are some existing technologies today. One of them is an Ecco model of inter-project dependencies with a set of methods for recovering the dependencies from Smalltalk based software ecosystems developed by Lungu et al. Aim of our research is applying this model with its methods on Java based software ecosystem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        a super-repositories as a collection of all the version. control repositories for
multiple software projects [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Looking at the software from a point of view of software ecosystems uncovers
wide range of important information which help managers to manage their
teams and projects and also help individual developers to better understand
their work. Analysis of software at the abstraction level of software ecosystems
can be either focused on the projects or on the developers in the ecosystem. Our
work is currently focused on projects and their relationships inside a software
ecosystem. We extend previous work of Lungu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] focused on recovering
inter-project dependencies in Smalltalk ecosystems. In their work they argued for
importance of raising abstraction of view on software products from individual
projects to whole software ecosystems. They presented several viewpoints at
this abstraction level including the inter-project dependency viewpoint. Each
viewpoint, including this one, provides two areas of research. One is own visualization.
Having an interesting information is not enough - we also need to know how to
present it to the user. The second area is information retrieval. Before we can
present some information, we need to get it by some technique from some source.
At rst we focus on inter-project information retrieval from java based software
ecosystems.
      </p>
      <p>Structure of this paper is following: In section 2 we describe a model used
to store retrieved information. Section 3 summarizes information specic about
inter-project dependencies specic for Java base software ecosystems. Evaluation
of dierent methods for dependency information retrieval is described in section
4. In section 5 we discuss contribution of this work and outline our further
research to be performed on this topic.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ecco model</title>
      <p>Lungu et. al presented in their work a lightweight model describing inter-project
dependencies called Ecco. They dened the model and lled it up with information
about inter-project dependencies present in selected Smalltalk based software
ecosystems.</p>
      <p>The Ecco model consist of four main elements.</p>
      <p>Ecosystem. In relation to the Ecco model the ecosystem means a set of software
projects and dependencies between them.</p>
      <p>Project. Every software ecosystem consists of one or more projects. Modules
of each project call some methods and dene another. A project can call
a method which is dened in another project. Methods like this are called
requirements.</p>
      <p>Dependency. When one project require some method and another denes
it, we call this relationship a dependency. The dependency consists of a
client project, which requires the methods, and of a provider project, which
provides the required methods. The methods making the dependency between
two projects are called elements of dependency.
Dependency Extraction Strategy. There are several existing techniques for
gathering information about inter-project dependencies and others can be
dened in future. Techniques like this are called dependency extraction
strategies. We include them in the model to be able to compare them during
our research process.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Java Dependencies</title>
      <p>In general we have two types of dependency extraction strategies. The rst
type reuses information existing explicitly in software super-repositories. The
disadvantages of such sources are limited availability in dierent ecosystems and
error-prone and time-wasting maintenance. On the other hand, this source is
very important during research because it tells us what results to expect during
evolution of the second type of dependency extraction strategies.</p>
      <p>The second type is base on reverse-engineering of source code. In contrast to
the rst type, this one can be used on any kind of super-repository and doesn’t
need any maintenance at all. However it is harder to retrieve the information
this way.
3.1</p>
      <sec id="sec-3-1">
        <title>Project Object Model</title>
        <p>If we’d like to nd some reverse-engineering strategy for recovering inter-project
dependencies in Java based software ecosystems, we rst need to nd proper
source of data. We need to have a super-repository which will provide us both
the explicit data and source code which we’ll reverse-engineer.</p>
        <p>
          Looking for such super-repository we found Apache Maven best suits our
needs. Maven is a project-centric tool for software development. Its data structures
contain dierent information about each project enabling to manage project’s
build, reporting and documentation. Whole Maven stands on technology called
Project Object Model (POM) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Every project has its own so-called
POMle, which is an XML le containing all the information relevant to this project
like the developers working on it, the path of its sources, required binaries,
the builder, the documentation manager, the bug tracking system and much
more. It includes the explicit information about the inter-project dependencies.
This information has to be compounded from four inter-project relationships
described in the POM: dependencies, exclusions, inheritance and aggregation.
There’s also a le called Super-POM which denes value common for all project
in the Maven repository unless they are redened. A simple POM with one
dependency can look like this:
&lt;p r o j e c t &gt;
&lt;m o d e l V e r s i o n &gt;4.0.0 &lt;/ m o d e l V e r s i o n &gt;
&lt;g r o u p I d&gt;c z . c v u t . f i t . swing &lt;/g r o u p I d&gt;
&lt; a r t i f a c t I d &gt;my−p r o j e c t &lt;/ a r t i f a c t I d &gt;
&lt;v e r s i o n &gt;1.0&lt;/ v e r s i o n &gt;
&lt;d e p e n d e n c i e s &gt;
&lt;dependency&gt;
&lt;g r o u p I d&gt;j u n i t &lt;/g r o u p I d&gt;
&lt; a r t i f a c t I d &gt;j u n i t &lt;/ a r t i f a c t I d &gt;
&lt;v e r s i o n &gt;4.0&lt;/ v e r s i o n &gt;
&lt;type&gt;j a r &lt;/type&gt;
&lt;s c o p e &gt;t e s t &lt;/s c o p e &gt;
&lt;o p t i o n a l &gt;t r u e &lt;/ o p t i o n a l &gt;
&lt;/dependency&gt;
&lt;/ d e p e n d e n c i e s &gt;
&lt;/ p r o j e c t &gt;
Dependencies. If one project depends directly on another then the information
is described in a dependencies section. This section is located in POM le of the
project which requires these dependencies - the Client Project from the Ecco’s
point of view. These dependencies can also be transitive. Transitive dependency
means that if a client project A requires a project B which requires a provider
project C, C becomes common requirement for both A and B. Dependencies
here are divided into 5 scopes:
        </p>
        <p>A Compile Scope is a default scope representing group of regular projects
which are available with their source code and are necessary for successful build
of a Client Project. The Compile Scope dependencies are transitive.</p>
        <p>A Provided Scope represents a group of precompiled projects expected to be
given at compile time by Software Development Kit (SDK), container or another
way. The Provided Scope dependencies are not transitive.</p>
        <p>A Runtime Scope is much like the Provided Scope but represents projects
expected to be given at runtime. The Runtime Scope dependencies are not
transitive as well.</p>
        <p>A Test Scope is like the Compile Scope but represents projects needed for
testing purposes. The Test Scope dependencies are transitive as well as the
Runtime Scope.</p>
        <p>A System Scope is similar to the Provided Scope but requires a developer
to provide its dependencies explicitly. The System Scope dependencies are not
transitive as well as the Provided Scope.</p>
        <p>As we’ll be examining only projects contained in a given ecosystem, we are
interested only in the Compile Scope dependencies. Possibly we can be also
interested in the Test Scope dependencies if we’ll extend our analysis to project’s
used for testing purposes.</p>
        <p>Exclusions. Transitive dependencies can produce unwanted behavior. If a developer
needs to exclude some project from the dependency list she includes it into the
exclusions section of the dependency which causes the problem. The meaning of
the exclusions during populating the Ecco model is obvious. We should respect
these exclusions and throw away dependencies excluded by them.
Inheritance. The Project Object Model brings a feature which enables us to
make an inheritance tree of projects. From the view of POM this means that
if we dene something in an ancestor project’s POM le, all its child project
inherit these denitions unless they are redened in a child project’s POM les.
There are two points important for us. First, the inheritance relationship itself
represents a dependency and we have to to think about it this way. Second,
dependencies of ancestor client projects become dependencies of child client
projects since these two projects are in inheritance relationship.</p>
        <p>Aggregation. If a project is made of a modules, Maven thinks about the modules
as about separated projects which are aggregated into another project called
multi-module project. This relationship is described in the multi-module project’s
POM le in a modules section. As the modules are expected to belong to the
same group as their multi-module project, they are dened only by their project
names. From our point of view, the aggregation relationship represents another
way to express the inter-project dependencies between the modules and the
multi-module project.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Java Bytecode</title>
        <p>
          When we think about a reverse-engineering of a Java software, we are not limited
only to a Java language. We can think of any language which can be compiled to
a Java Bytecode. The original information can be simply disassembled from the
byte-code [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Consider this simple class denition written in the Java language:
3.2
        </p>
        <p>If we call javap DocFooter to disassemble a DocFooter.class, we get this
output:
Compiled from DocFooter . j a v a
p u b l i c c l a s s DocFooter</p>
        <p>e x t e n d s j a v a . a p p l e t . Applet {
j a v a . l a n g . S t r i n g d a t e ;
j a v a . l a n g . S t r i n g e m a i l ;
p u b l i c DocFooter ( ) ;
p u b l i c v o i d i n i t ( ) ;
p u b l i c v o i d p a i n t ( j a v a . awt . G r a p h i c s ) ;
}</p>
        <p>Passing some arguments will give us also a disassembly of a behavior, but
this interface declaration is all what we need. We’ve got fully qualied name of
every class and method used in the compiled code.</p>
        <p>This is how our reverse-engineering dependency extraction strategies will
look like. At rst we take a Java Archive. Every java project is distributed as
a Java Archive. The archive is a regular compressed package of data containing
a Class Files. Every Class File contains a byte-code of one Java class. We open
the archive, disassemble every class le and see which methods are called and
which are dened. We ll this information into the Ecco model. Information
gathered this way needs some more processing before we’ll get reliable result.
This post-processing is topic of our further research.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation of Results</title>
      <p>
        To let us compare dierent inter-project dependency retrieval techniques we need
to have a measuring method to let us assign a value to each technique. For this
purpose we’ll use well-known information retrieval metrics - a precision, a recall
and an F-measure [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] adopted for our case by Lungu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To use them we
rst need a golden standard or an oracle. This is the information we retrieve
from Maven’s POM. Thanks to this information we are able to distinguish a
Relevant dependencies which are present in the oracle and a Nonrelevant which
are not present in the oracle. Besides this we can divide the dependencies to
those which have or have not been retrieved by a concrete reverse-engineering
technique. In common we get four dierent statistical sets of dependencies which
can be seen in table 1.
      </p>
      <p>The metrics are then dened as follows. The Precision ( P ) is a fraction of
retrieved dependencies that are relevant. The Recall ( R) is a fraction of relevant
documents that are retrieved. The F-measure ( F ) is the weighted harmonic mean
of precision and recall. The F-measure represents a single measure that trades
o the precision versus the recall and thus indicates an overall accuracy of the
measured technique.</p>
      <p>P = |T P|T∪PF|P |</p>
      <p>R = |T P|T∪PF|N|</p>
      <p>F1 = P2P+RR</p>
      <p>We use a default balance F-measure ( F1) which equally weights the precision
and the recall because we don’t want to emphasize the recall nor the precision.</p>
      <p>During evaluation of our reverse-engineering techniques we’ll calculate these
values for each technique and compare them. This comparison will give us the
required information about the technique’s eectivity.
The information summarized in this paper gives us excellent base for our further
research aimed on dierent reverse-engineering techniques for retrieval of
interproject dependencies in the Java based software ecosystems. We have an excellent
source of data which will help us with a development of the techniques. Using
the explicitly given information about the dependencies and using the mentioned
metrics we are able to compare every techniques and tell which one better suits
our needs. We found a way which lets us to retrieve the dependencies from any
language which can be compiled to the Java byte-code. In connection with the
work done by Lungu et al. on the Smalltalk based software ecosystem we’ll be
also able to summarize dierences between a dependency retrieval from statically
and dynamically typed languages.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to thank for nancial support of Student Grant Competition of
CTU in Prague, grant number SGS12/093/OHK3/1T/18.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Apache</surname>
          </string-name>
          . Maven project,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lungu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Reverse Engineering Software Ecosystems</article-title>
          .
          <source>PhD thesis</source>
          , University of Lugano,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lungu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girba</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Heeck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Reverse engineering superrepositories</article-title>
          .
          <source>In Proceedings of the 14th Working Conference on Reverse Engineering</source>
          (Washington, DC, USA,
          <year>2007</year>
          ), IEEE Computer Society, pp.
          <fpage>120129</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lungu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robbes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lanza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Recovering inter-project dependencies in software ecosystems</article-title>
          .
          <source>In Proceedings of the IEEE/ACM international conference on Automated software engineering (</source>
          New York, NY, USA,
          <year>2010</year>
          ),
          <source>ASE '10</source>
          , ACM, pp.
          <fpage>309312</fpage>
          . ACM ID:
          <volume>1859058</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schtze</surname>
          </string-name>
          , H. Introduction to Information Retrieval. Cambridge University Press New York, NY, USA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Oracle</surname>
          </string-name>
          .
          <article-title>Java se documentation</article-title>
          ,
          <year>February 2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>