<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Software Re-Documentation Process and Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Anquetil</string-name>
          <email>anquetil@ucb.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kathia M. Oliveira</string-name>
          <email>kathia@ucb.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anita G.M. dos Santos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paulo C.S. da Silva jr.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laesse C. de Araujo jr.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susa D.C.F. Vieira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UCB - Catholic University of Bras ́ılia</institution>
          ,
          <addr-line>Bras ́ılia</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Researchers and professionals know the importance of the documentation for the efficient maintenance of legacy software. Unfortunately, many legacy systems lack this important artifact. Maintenance then becomes a difficult process where software engineers must study and understand the system over and over again. A possible solution out of this situation is to re-document the legacy system. In this article we will present a software re-documentation process, its main features, and constituting activities. We will also present a tool we are developing to automate this process as much as possible. This tools runs in Java and is currently designed for Visual Basic legacy systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We were called to help redocumentating the main system of an organization. for
this, we had to define a redocumentation process. Common sense imposed that
the process should have the three following characteristics:
– Reverse engineering process: A redocumentation process should be based on
a bottom-up approach, taking advantage of the existing code.
– Light weight documentation: To lower the costs and maximize the chances
of the recreated documentation being maintained afterward, we will follow
Pressman’s recommendation [4, p.807] to limit it to the minimum required.
– Good quality/price ratio: We tried to favor documentation artifacts that
could be produce automatically or semi-automatically and still offered
valuable information for maintenance.</p>
      <p>As illustrated in Figure 1, our process is composed of three main phases
which include seven activities:
Preparation Phase: analyze the state of the software and its documentation.
Planning Phase: decide what parts of the system should be redocumented
first and what will be the general approach.</p>
      <p>Redocumentation Phase: recreate the various documents, it constitutes the
core of the redocumentation process.</p>
      <p>PREPARATION
System System
Inventory Assessment</p>
      <p>PLANNING
Redocumentation</p>
      <p>Planning
High Level View</p>
      <p>Definition</p>
      <p>REDOCUMENTATION
Cross Reference Subsystems</p>
      <p>Extraction Definition</p>
      <p>Low Level
Documentation</p>
      <p>To keep the documentation to a minimum, we decided to do mostly without
what we call the intermediate documentation which, during development, would
be generated during the analysis and design activities.</p>
      <p>We try, in the process, to concentrate on what we call the high level and the
low level documentation. Traceability between these two levels is guaranteed by a
set of cross references (e.g. between implemented funcionalities and implementing
routines) extracted automatically.</p>
      <p>There are two activities to the Preparation phase:
System Inventory: The goal of the first activity is to get an idea of the size
of the problem and provide basic information needed in the following
activities. It answers questions like: What exactly constitutes the system? What
is known about it? Where to find these informations and new ones? The
inventory is performed along three main axes: (i) software components and
functionalities, (ii) documentation and (iii) people.</p>
      <p>System Assessment: The second step consists in assessing the level of
confidence one can have in the code, the documentation and the other sources
of information. This is useful to plan the redocumentation in itself and the
maintenance in general.
1 The automation of some activities will be discussed in section 4</p>
      <p>There is only one activity to the Planning phase:
Redocumentation Planning: This activity is the prelude to the
redocumentation work in itself. It consists in defining how the redocumentation will
be performed and what are the priorities. The planning will be based on
results of the preceding phase. Other important points to consider are the
maintenance load expectancy and the strategic evaluation of the importance
of each part of the system.</p>
      <p>There are four activities to the Redocumentation phase:
High Level View Definition: In this activity, one must document a first high
level view of the system: functionalities, interaction with other systems or
specific hardware, etc. Each functionality listed in the System Inventory
should be shortly described.</p>
      <p>
        Cross References Extraction: This activity will result in the identification of
cross references: “routine to routine” (call graph), “routine to data” (CRUD
table), “data to data” (data model), and “functionality to routine”. It will
be important to do such things as impact analysis, feature location, etc.
Subsystems Definition: This activity will result in a top down view of the
system, its subsystems and their components. If the architectural decomposition
is known and agreed upon by all, each subsystem listed in the System
Inventory must be documented, describing its objective, and what components
and funcionalities it contains. In case there is no agreed upon decomposition,
we propose to create one using some clustering algorithm (e.g. [
        <xref ref-type="bibr" rid="ref1 ref3 ref7">1,3,7</xref>
        ]).
Low Level Documentation: In this final activity, each independent item
identified during the planning will be commented. This activity is to a large
extent a manual one, the software engineers must consider each item
independently, analyse it and document it.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Existing Approaches to Software Re-Documentation</title>
      <p>Redocumentation is mainly a problem for large systems where the size alone is
already a significant complexity factor. This is one of the reason why there has
been a lot of work on automation of this task over the years.</p>
      <p>
        Freeman and Munroe, in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], discuss some requirements for a
redocumentation tool and what documents should be produced during redocumentation.
      </p>
      <p>There already exists some tools to help redocumenting. The simplest would
be the tools to extract some documentation from the source code (e.g. javadoc).
These tools extract the signature of classes, methods, etc. and sometimes also
format comments.</p>
      <p>
        Rajlich [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposes a tool to incrementally generate an hypertext
documentation of a software as it is maintained. But there is no specific process for
redocumentation per se.
      </p>
      <p>
        Rigi [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a reverse engineering environment to help understand, restructure,
and visualize the components of a legacy system. It could help in a
redocumentation effort, but it is primarily a program comprehension tool and it does
not, in itself, specify how one goes about redocumenting. A more organizational
approach is adopted in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where Tilley et al. specify some requirements for
re-documentation and show how Rigi could help in producing it. However the
work does not specify a process (sequence of steps).
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>A Software Re-Documentation Environment</title>
      <p>We started to develop a software environment to support software engineers
in the execution of the various activities of the redocumentation process. The
“Redoc” environment has two goals:
– First, it should guide its users through the execution of the various activities,
allowing them to register the result of these activities.
– Second, it should provide automate as much as possible the activities of the
process that may be automated.</p>
      <p>The Redoc environnement is still in an early stage. It is developed in java
using the graphical library Swing. It currently parses systems written in Visual
Basic and implements the two activities of the software redocumentation process
which may be automated (in black in Figure 1).</p>
      <p>In the System Inventory activity, one must list the components of the
system, and its functionalities. Since Visual Basic (VB) is Object Oriented2, the
components will be classes and their methods. One must also list the tables that
make up the system.</p>
      <p>The Redoc environment uses javacc3 to parse the Visual Basic code:
– Extraction of classes and methods is straightforward, they are readily
available in the language grammar.
– To discover all the functionalities implemented in the system, we use the
menus of the application. This is possible because, in VB, the graphical
interface is built through a tool which generates the code in a standardized
way. Other languages may raise more difficulties.
– To identify the tables used in the system, we also parse the code to identify
the SQL queries it contains and to what tables they refer. Usually the SQL
queries are manually programmed and do not follow the same strict patterns
as graphical instructions do. They may also be dynamically constructed in
the program from data entered by the user. This would make them impossible
to be automatically analyzed in the general case. Fortunately, in practice,
SQL queries are dynamic only with regard to the values of the columns, and
not the tables accessed. This allows our approach to work in most cases.
2 Actually, VB is not truly OO, but it does contain classes, methods . . .
3 https://javacc.dev.java.net/</p>
      <p>These two examples illustrate the two goals of the environment: automating
some activities for the user (left part of the picture), and keeping track of the
realization of the activities and registering their result (right part).</p>
      <p>The second automated activity is the Cross Reference Extraction. There are
four types of cross references:
Data X data: The data X data cross-reference corresponds to finding the
relations between the tables used in the system. This is done, again, parsing
the SQL queries to detect the joins made between tables.</p>
      <p>Routine X Data: The routine X data cross-reference is easy to compute once
the table inventory problem is solved. Knowing what tables are accessed
from the SQL queries, it is simple to know in what method this query occurs
and therefore which methods access which tables. A bit more difficult is to
built a CRUD table where each access is marked as Create, Read, Update,
or Delete. For this, the tool analyzes the first word of each query (insert,
select, update, or delete).</p>
      <p>Routine X routine: The routine X routine cross-reference is a simple call
graph among the methods and offers no special difficulties.</p>
      <p>Funcionalidade X routine: The Functionalities X routine cross-reference
consists in identifying what routines implements a functionality. As we identify
functionalities from the application menus, it is a simple matter to identify
the starting point of a functionality and then compute transitive closure
on the call graph from that point. However, there is more to it than that,
because a functionality will usually call one or several windows where the
actual execution of the functionality will be triggered by clicking a button.
It is generally accepted in software engineering that most of legacy software
suffer from a lack of up-to-date documentation. Redocumentation is the natural
solution to help maintaining these software systems. However, there is little work
on how this can be done and what tools we need to actually redocument.</p>
      <p>In this article, we presented a process for software redocumentation and the
steps we are taking to automate it as much as possible. We are developing a
software redocumentation environment that will (a) help people register the
results of the various activities of the process, and (b) help people gathering the
information they need by automating some activities.</p>
      <p>Two activities (System Inventory and Cross-Reference Extraction) have
already been automated and we are now working on a new project to help automate
a third activity (System Quality Assessment)</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgment</title>
      <p>This work is part of the “Knowledge Management in Software Engineering” project,
which is supported by the CNPq, an institution of the Brazilian government for
scientific and technological development.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Anquetil</surname>
          </string-name>
          and
          <string-name>
            <given-names>Timothy C.</given-names>
            <surname>Lethbridge</surname>
          </string-name>
          .
          <article-title>Experiments with Clustering as a Software Remodularization Method</article-title>
          .
          <source>In Working Conference on Reverse Engineering</source>
          , pages
          <fpage>235</fpage>
          -
          <lpage>255</lpage>
          . IEEE, IEEE Comp. Soc. Press, Oct.
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Robert</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Freeman</surname>
            and
            <given-names>Malcolm</given-names>
          </string-name>
          <string-name>
            <surname>Munro</surname>
          </string-name>
          .
          <article-title>Redocumentation for the maintenance of software</article-title>
          .
          <source>In Proceedings of the ACM 30th Annual Southeast Conference</source>
          , pages
          <fpage>413</fpage>
          -
          <lpage>16</lpage>
          . ACM, ACM Press,
          <year>Apr 1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Arun</given-names>
            <surname>Lakhotia</surname>
          </string-name>
          .
          <article-title>A unified framework for expressing software subsystem classification techniques</article-title>
          .
          <source>J. of Systems and Software</source>
          ,
          <volume>36</volume>
          :
          <fpage>211</fpage>
          -
          <lpage>231</lpage>
          ,
          <year>Mar 1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Roger</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Pressman. Software Engineering</surname>
          </string-name>
          :
          <article-title>A Practitioner's Approach</article-title>
          .
          <source>McGraw-Hill, 5th edition</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. V´
          <article-title>aclav Rajlich. Incremental redocumentation using the web</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>17</volume>
          (
          <issue>5</issue>
          ):
          <fpage>102</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>Sep 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Scott</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tilley</surname>
          </string-name>
          .
          <article-title>Documenting-in-the-large vs. documenting-in-the-small</article-title>
          .
          <source>In Proceedings of CASCON'93</source>
          , pages
          <fpage>1083</fpage>
          -
          <lpage>90</lpage>
          .
          <article-title>IBM Centre for Advanced Studies</article-title>
          , Oct.
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Theo</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Wiggerts</surname>
          </string-name>
          .
          <article-title>Using Clustering Algorithms in Legacy Systems Remodularization</article-title>
          . In Working Conference on Reverse Engineering, pages
          <fpage>33</fpage>
          -
          <lpage>43</lpage>
          . IEEE, IEEE Comp. Soc. Press, Oct.
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Kenny</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Scott R.</given-names>
            <surname>Tilley</surname>
          </string-name>
          , Hausi A. Mu¨ller, and
          <string-name>
            <surname>Margaret-Anne</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Storey</surname>
          </string-name>
          .
          <article-title>Structural redocumentation: A case study</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>54</lpage>
          ,
          <year>Jan 1995</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>