<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Software Re-Documentation Process and Tool</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicolas</forename><surname>Anquetil</surname></persName>
							<email>anquetil@ucb.br</email>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kathia</forename><forename type="middle">M</forename><surname>Oliveira</surname></persName>
							<email>kathia@ucb.br</email>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Anita</forename><forename type="middle">G M</forename><surname>Dos Santos</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paulo</forename><forename type="middle">C S</forename><surname>Da Silva</surname><genName>jr</genName></persName>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Laesse</forename><forename type="middle">C</forename><surname>De Araujo</surname><genName>jr</genName></persName>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Susa</forename><forename type="middle">D C F</forename><surname>Vieira</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">UCB -Catholic University of Brasília</orgName>
								<address>
									<settlement>Brasília</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Software Re-Documentation Process and Tool</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0695B4F8440334CAEAE0316837987690</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T06:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Researchers and professionals know the importance of the documentation for the efficient maintenance of legacy software. Unfortunately, many legacy systems lack this important artifact. Maintenance then becomes a difficult process where software engineers must study and understand the system over and over again. A possible solution out of this situation is to re-document the legacy system. In this article we will present a software re-documentation process, its main features, and constituting activities. We will also present a tool we are developing to automate this process as much as possible. This tools runs in Java and is currently designed for Visual Basic legacy systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>It is an accepted fact that legacy software systems are generally poorly documented. This fact makes it extremely difficult to understand and maintain such systems. Redocumenting them could be a great help to keep them "alive". In this paper, we describe a redocumentation process we designed and our first efforts to automate it. A tool, called Redoc, will be described which currently automates two of the activities of our process for legacy systems written in Visual Basic.</p><p>In the following sections, we will first present the software redocumentation process (section 2). Then we discuss some existing approaches to redocumentation with a focus on existing tools (section 3). In section 4, we present the Redoc tool. And finally we propose our conclusions and possible future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The Redocumentation Process</head><p>We were called to help redocumentating the main system of an organization. for this, we had to define a redocumentation process. Common sense imposed that the process should have the three following characteristics:</p><p>-Reverse engineering process: A redocumentation process should be based on a bottom-up approach, taking advantage of the existing code. -Light weight documentation: To lower the costs and maximize the chances of the recreated documentation being maintained afterward, we will follow Pressman's recommendation <ref type="bibr">[4, p.807]</ref> to limit it to the minimum required.</p><p>-Good quality/price ratio: We tried to favor documentation artifacts that could be produce automatically or semi-automatically and still offered valuable information for maintenance.</p><p>As illustrated in Figure <ref type="figure">1</ref>, our process is composed of three main phases which include seven activities: To keep the documentation to a minimum, we decided to do mostly without what we call the intermediate documentation which, during development, would be generated during the analysis and design activities.</p><p>We try, in the process, to concentrate on what we call the high level and the low level documentation. Traceability between these two levels is guaranteed by a set of cross references (e.g. between implemented funcionalities and implementing routines) extracted automatically.</p><p>There are two activities to the Preparation phase:</p><p>System Inventory: The goal of the first activity is to get an idea of the size of the problem and provide basic information needed in the following activities. It answers questions like: What exactly constitutes the system? What is known about it? Where to find these informations and new ones? The inventory is performed along three main axes: (i) software components and functionalities, (ii) documentation and (iii) people. System Assessment: The second step consists in assessing the level of confidence one can have in the code, the documentation and the other sources of information. This is useful to plan the redocumentation in itself and the maintenance in general.</p><p>There is only one activity to the Planning phase:</p><p>Redocumentation Planning: This activity is the prelude to the redocumentation work in itself. It consists in defining how the redocumentation will be performed and what are the priorities. The planning will be based on results of the preceding phase. Other important points to consider are the maintenance load expectancy and the strategic evaluation of the importance of each part of the system.</p><p>There are four activities to the Redocumentation phase:</p><p>High Level View Definition: In this activity, one must document a first high level view of the system: functionalities, interaction with other systems or specific hardware, etc. Each functionality listed in the System Inventory should be shortly described. Cross References Extraction: This activity will result in the identification of cross references: "routine to routine" (call graph), "routine to data" (CRUD table <ref type="table">)</ref>, "data to data" (data model), and "functionality to routine". It will be important to do such things as impact analysis, feature location, etc. Subsystems Definition: This activity will result in a top down view of the system, its subsystems and their components. If the architectural decomposition is known and agreed upon by all, each subsystem listed in the System Inventory must be documented, describing its objective, and what components and funcionalities it contains. In case there is no agreed upon decomposition, we propose to create one using some clustering algorithm (e.g. <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b6">7]</ref>). Low Level Documentation: In this final activity, each independent item identified during the planning will be commented. This activity is to a large extent a manual one, the software engineers must consider each item independently, analyse it and document it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Existing Approaches to Software Re-Documentation</head><p>Redocumentation is mainly a problem for large systems where the size alone is already a significant complexity factor. This is one of the reason why there has been a lot of work on automation of this task over the years.</p><p>Freeman and Munroe, in <ref type="bibr" target="#b1">[2]</ref>, discuss some requirements for a redocumentation tool and what documents should be produced during redocumentation.</p><p>There already exists some tools to help redocumenting. The simplest would be the tools to extract some documentation from the source code (e.g. javadoc). These tools extract the signature of classes, methods, etc. and sometimes also format comments.</p><p>Rajlich <ref type="bibr" target="#b4">[5]</ref> proposes a tool to incrementally generate an hypertext documentation of a software as it is maintained. But there is no specific process for redocumentation per se.</p><p>Rigi <ref type="bibr" target="#b7">[8]</ref> is a reverse engineering environment to help understand, restructure, and visualize the components of a legacy system. It could help in a redocumentation effort, but it is primarily a program comprehension tool and it does not, in itself, specify how one goes about redocumenting. A more organizational approach is adopted in <ref type="bibr" target="#b5">[6]</ref>, where Tilley et al. specify some requirements for re-documentation and show how Rigi could help in producing it. However the work does not specify a process (sequence of steps).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">A Software Re-Documentation Environment</head><p>We started to develop a software environment to support software engineers in the execution of the various activities of the redocumentation process. The "Redoc" environment has two goals:</p><p>-First, it should guide its users through the execution of the various activities, allowing them to register the result of these activities. -Second, it should provide automate as much as possible the activities of the process that may be automated.</p><p>The Redoc environnement is still in an early stage. It is developed in java using the graphical library Swing. It currently parses systems written in Visual Basic and implements the two activities of the software redocumentation process which may be automated (in black in Figure <ref type="figure">1</ref>).</p><p>In the System Inventory activity, one must list the components of the system, and its functionalities. Since Visual Basic (VB) is Object Oriented<ref type="foot" target="#foot_2">2</ref> , the components will be classes and their methods. One must also list the tables that make up the system.</p><p>The Redoc environment uses javacc<ref type="foot" target="#foot_3">3</ref> to parse the Visual Basic code:</p><p>-Extraction of classes and methods is straightforward, they are readily available in the language grammar. -To discover all the functionalities implemented in the system, we use the menus of the application. This is possible because, in VB, the graphical interface is built through a tool which generates the code in a standardized way. Other languages may raise more difficulties. -To identify the tables used in the system, we also parse the code to identify the SQL queries it contains and to what tables they refer. Usually the SQL queries are manually programmed and do not follow the same strict patterns as graphical instructions do. They may also be dynamically constructed in the program from data entered by the user. This would make them impossible to be automatically analyzed in the general case. Fortunately, in practice, SQL queries are dynamic only with regard to the values of the columns, and not the tables accessed. This allows our approach to work in most cases.  These two examples illustrate the two goals of the environment: automating some activities for the user (left part of the picture), and keeping track of the realization of the activities and registering their result (right part).</p><p>The second automated activity is the Cross Reference Extraction. There are four types of cross references: Data X data: The data X data cross-reference corresponds to finding the relations between the tables used in the system. This is done, again, parsing the SQL queries to detect the joins made between tables. Routine X Data: The routine X data cross-reference is easy to compute once the table inventory problem is solved. Knowing what tables are accessed from the SQL queries, it is simple to know in what method this query occurs and therefore which methods access which tables. A bit more difficult is to built a CRUD table where each access is marked as Create, Read, Update, or Delete. For this, the tool analyzes the first word of each query (insert, select, update, or delete). Routine X routine: The routine X routine cross-reference is a simple call graph among the methods and offers no special difficulties. Funcionalidade X routine: The Functionalities X routine cross-reference consists in identifying what routines implements a functionality. As we identify functionalities from the application menus, it is a simple matter to identify the starting point of a functionality and then compute transitive closure on the call graph from that point. However, there is more to it than that, because a functionality will usually call one or several windows where the actual execution of the functionality will be triggered by clicking a button.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>It is generally accepted in software engineering that most of legacy software suffer from a lack of up-to-date documentation. Redocumentation is the natural solution to help maintaining these software systems. However, there is little work on how this can be done and what tools we need to actually redocument.</p><p>In this article, we presented a process for software redocumentation and the steps we are taking to automate it as much as possible. We are developing a software redocumentation environment that will (a) help people register the results of the various activities of the process, and (b) help people gathering the information they need by automating some activities.</p><p>Two activities (System Inventory and Cross-Reference Extraction) have already been automated and we are now working on a new project to help automate a third activity (System Quality Assessment)</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 ,</head><label>2</label><figDesc>Figure 2, left part, presents a snapshot of the inventory window. The window shows the functionalities, the classes and their methods, and the tables. All these informations are extracted automatically. The rigth part of the figure presents a snapshot of the window to enter (manually) a new contact person. A similar window exist for documents.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Result of the automated System Inventory for a system in Visual Basic (left) and a window to enter a new contact person that may help in the redocumentation process (right)</figDesc><graphic coords="5,58.35,57.50,155.62,170.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Preparation Phase: analyze the state of the software and its documentation. Planning Phase: decide what parts of the system should be redocumented first and what will be the general approach. Redocumentation Phase: recreate the various documents, it constitutes the core of the redocumentation process.</figDesc><table><row><cell cols="2">PREPARATION</cell><cell cols="2">PLANNING</cell></row><row><cell>System</cell><cell>System</cell><cell cols="2">Redocumentation</cell></row><row><cell>Inventory</cell><cell>Assessment</cell><cell>Planning</cell><cell></cell></row><row><cell></cell><cell cols="2">REDOCUMENTATION</cell><cell></cell></row><row><cell>High Level View</cell><cell>Cross Reference</cell><cell>Subsystems</cell><cell>Low Level</cell></row><row><cell>Definition</cell><cell>Extraction</cell><cell>Definition</cell><cell>Documentation</cell></row><row><cell cols="4">Fig. 1. Activities of the Redocumentation Process (black box: activity automated 1 ,</cell></row><row><cell cols="2">gray box:activity is partially automated)</cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Proceedings of the CAiSE'05 Forum -O. Belo, J. Eder, J. Falcão e Cunha, O. Pastor (Eds.) © Faculdade de Engenharia da Universidade do Porto, Portugal 2005 -ISBN 972-752-078-2</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">The automation of some activities will be discussed in section 4</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">Actually, VB is not truly OO, but it does contain classes, methods . . .</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">https://javacc.dev.java.net/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="100" xml:id="foot_4">Nicolas Anquetil, Kathia M.de Oliveira, Anita G.M. dos Santos, Paulo C. S.Silva Jr. ...</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This work is part of the "Knowledge Management in Software Engineering" project, which is supported by the CNPq, an institution of the Brazilian government for scientific and technological development.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Experiments with Clustering as a Software Remodularization Method</title>
		<author>
			<persName><forename type="first">Nicolas</forename><surname>Anquetil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Timothy</forename><forename type="middle">C</forename><surname>Lethbridge</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Conference on Reverse Engineering</title>
				<imprint>
			<publisher>IEEE Comp. Soc. Press</publisher>
			<date type="published" when="1999-10">Oct. 1999</date>
			<biblScope unit="page" from="235" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Redocumentation for the maintenance of software</title>
		<author>
			<persName><forename type="first">Robert</forename><forename type="middle">M</forename><surname>Freeman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malcolm</forename><surname>Munro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM 30th Annual Southeast Conference</title>
				<meeting>the ACM 30th Annual Southeast Conference</meeting>
		<imprint>
			<publisher>ACM, ACM Press</publisher>
			<date type="published" when="1992-04">Apr 1992</date>
			<biblScope unit="page" from="413" to="416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A unified framework for expressing software subsystem classification techniques</title>
		<author>
			<persName><forename type="first">Arun</forename><surname>Lakhotia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Systems and Software</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="211" to="231" />
			<date type="published" when="1997-03">Mar 1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Software Engineering: A Practitioner&apos;s Approach</title>
		<author>
			<persName><forename type="first">Roger</forename><forename type="middle">S</forename><surname>Pressman</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>McGraw-Hill</publisher>
		</imprint>
	</monogr>
	<note>5th edition</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Incremental redocumentation using the web</title>
		<author>
			<persName><forename type="first">Václav</forename><surname>Rajlich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Software</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="102" to="106" />
			<date type="published" when="2000-09">Sep 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Documenting-in-the-large vs. documenting-in-the-small</title>
		<author>
			<persName><forename type="first">R</forename><surname>Scott</surname></persName>
		</author>
		<author>
			<persName><surname>Tilley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CASCON&apos;93</title>
				<meeting>CASCON&apos;93</meeting>
		<imprint>
			<publisher>IBM Centre for Advanced Studies</publisher>
			<date type="published" when="1993-10">Oct. 1993</date>
			<biblScope unit="page" from="1083" to="1090" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Using Clustering Algorithms in Legacy Systems Remodularization</title>
		<author>
			<persName><forename type="first">Theo</forename><forename type="middle">A</forename><surname>Wiggerts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Conference on Reverse Engineering</title>
				<imprint>
			<publisher>IEEE Comp. Soc. Press</publisher>
			<date type="published" when="1997-10">Oct. 1997</date>
			<biblScope unit="page" from="33" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Structural redocumentation: A case study</title>
		<author>
			<persName><forename type="first">Kenny</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Scott</forename><forename type="middle">R</forename><surname>Tilley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hausi</forename><forename type="middle">A</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Margaret-Anne</forename><forename type="middle">D</forename><surname>Storey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Software</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="46" to="54" />
			<date type="published" when="1995-01">Jan 1995</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
