Interactive User-Oriented Views for Better
                    Understanding Software Systems

                             Truong Ho-Quang, Michel R.V. Chaudron
                         Department of Computer Science and Engineering
                    Chalmers University of Technology and Gothenburg University
                                        Gothenburg, Sweden
                                 {truongh, chaudron}@chalmers.se


                                                       Abstract

                       Understanding software artefacts is a crucial task for people who want
                       to participate in any software development process. However, because
                       of the large amount of detailed and scattered information in software
                       artefacts, understanding them is usually time-consuming and vulnera-
                       ble to human errors and subjectivities. A system that aids practitioners
                       to investigate understanding about software artefacts could reduce the
                       vulnerabilities and speed up software development/maintenance pro-
                       cess. Our research focuses on building a comprehensive view of soft-
                       ware system in order for developers to achieve the two goals: (i) to save
                       the time spending on searching and navigating on source code; and
                       (ii) to gain better understanding about software artefacts regarding to
                       domain-specific tasks. To achieve these goals, we propose an emprical
                       approach in which the visualisation and the generation of high-level
                       design and architectural views from source code and design documen-
                       tations have been played central roles. The research is on-going and
                       could potentially be extended to di↵erent software artefacts (such as
                       requirements, use-cases, test-cases, revision logs).


1    Introduction
Software artefacts are created, maintained and evolved as part of a software development project. Understanding
software artefacts is a crucial task for every person who wants to participate in any phase of software development
life cycle [26]. However, because of the large amount of detailed and scattered information in software artefacts,
understanding them is usually very time-consuming and vulnerable to human errors and subjectivities [9][20].
The task becomes even more difficult when it comes to large-size software projects, which contain a huge amount
of code, designs and documentation.
    Recently, a significant number of research and tools has been conducted in order to investigate better un-
derstanding of software artefacts. It can be listed as reverse engineering [3], feature location [8], document
summarization, etc. However, there is lack of automatic approaches were proposed. With regards to those
that can automatically perform the task, the accuracy is not high [12]. Soh et.al, with an observation on 2408

Copyright c by Truong Ho-Quang, Michel R.V. Chaudron. Copying permitted for private and academic purposes.
Proceedings of SATToSE, Mons, Belgium, 6-7-2015, published at http://ceur-ws.org


                                                             1
developers interaction logs, has pointed out that 62% of files explored during the implementation of a task are
not significantly relevant to the final implementation of the task [20].
   In addition, the approaches are usually applied on a single software artefact (e.g. source code, revision history),
resulting in a single view(navigation or search results). This fact, at some points, limits developers ability to
obtain the overview of the whole (OR a part of) system, which is an essential part of understanding, with
regards to the development/maintenance task. Thus, creating of a comprehensible view which can automatically
navigate and generate suitable views on di↵erent software artefacts would be very beneficial.
   To this end, our research has been focusing on the visualisation and the generation of high-level design and
architectural views from source code and design documentations. The research could potentially be extended to
di↵erent software artefacts (such as requirements, use-cases, test-cases, revision logs [6]).


                         Figure 1: A comprehensible view for task ”Maintain a feature”

   Figure 1 shows a prototype design of such the view. Given a maintenance task, the sub-views show the task-
related parts on di↵erent software artefacts. Sub-view Architecture Overview shows an overview of the system
with highlights on the task-related components. Sub-view Editor locates to the relevant part of source code.
Sub-view Chat shows the list of the responsible developers and the historical chat regarding the observed source
code. Sub-views are linkable between themselves and automatically or manually updated. We take developers as
the main audience of our research. The following example reveals how the developers perform the understanding
task using the view. We consider it as the motivation in our research.
   Motivation example. Developer X has to conduct a task: Maintain Feature A. Software artefacts are stored
in projects database. X starts by logging into his work space (e.g. an IDE), then performs searching for the
Feature A by key words. Sub-view Editor will automatically address the related parts of source code which could
potentially be changed during the maintenance work. Sub-views Architecture Overview and Sequence Diagram
will provide the developer with an overview about the code structure and probably a suggestion about which
parts could be subsequently changed. The developer can ask for recommendations from responsible persons
through Chat space. Using such the view could allow the developer to better investigate understanding about
the task and the system, and to reduce the implementation and maintenance time.
   The rest of this paper is organized as follows. In Section 2, we dicuss the problem definition and formulate
two main research questions. Section 3 presents the outlines of our approach.

2     Problem Statement and Research Questions
2.1   Problem Statement
Lack of empirical research on developers cognitive task during software maintenance phase. Despite
the large body of work on software maintenance [26][20][2], there are very few studies that empirically investi-
gated how developers achieve the understanding of software artefacts during software development/maintenance
activities. Ko et.al [13][14] has revealed a number of issues that cause developers more time on navigation
between source files. The authors have suggested ideas for tools that help developers seek, relate, and collect
information in a more e↵ective and explicit manner. On the other hand, it seems that there is a huge gap
between state-of-the-art research and practice in software comprehension. An observational study by Roehm
et.al shows that no one in the 28 professional developers (from seven software companies) observes any use of
state-of-the-art comprehensive tools [18].
   Hard to collect relevant task-based information e↵ectively and automatically. For most tasks,
developers begin by searching then navigating by search results. However, traditional searching methods seem


                                                          2
not very e↵ective. A. J. Ko et.al have revealed that an average of 88 percent (±11) of developers searches led
to nothing of later use in the task. Those failed searches were at least partially responsible for approximately
36 percent of their time spent on inspecting irrelevant code [14]. Recently, a number of task-specific searching
methods (such as features location, program slicing, UML slicing, etc.) has been introduced. However, they are
not easy to apply and sensitive to inputs quality [8]. Thus, developing an easy-to-use solution could improve
searching and navigating efficiencies.
   Lack of visualisation of relevant information in understandable manners. Apart from searching
and navigation, visualisation of software artefacts is widely used in the areas of software maintenance, reverse
engineering, and re-engineering, where typically large amounts of complex data need to be understood and a
high degree of interaction between software engineers and automatic analyses is required. Over the past few
years, software visualisation has greatly evolved. However, despite the fact that software visualisation tools have
a great potential, when it comes to contextual information, finding a suitable solution is not an easy work [21][10]
(e.g. UML design layouting, personalised view, etc.).

2.2     Research Questions
Our research focuses on the visualisation and the generation of high-level design and architectural views from
source code and design documentations. In order to come up with a systematic answer for the question, it’s
necessary to find out what is practitioner’s mind when performing the understanding task. Thus, we would think
about the following research questions:
   RQ1. What are practitioners needs in order to understand a part of system with regards to a specific task?
   RQ2. How to generate and present the information by an e↵ective way?

3     Research Approaches


                                          Figure 2: Research Activities

   In order to investigate the two research questions, we conduct two research activities as shown in the Figure
2. In Research Activity 1, we use both qualitative and quantitative approaches to learn practitioners needs and
strategies during the understanding phase. In particular, by conducting interviews with industrial and academic
practitioners, we could achieve better understanding on what is their cognitive thinking and possibly the strategy
that was used to understand the system. By logging practitioners activities and analysing the logging file, we
could statistically investigate their unconscious behaviors and the difficulties performing the understanding task.
This approach is discussed in detail in subsection A.
   Research Activity 2 aims at answering RQ2 with a focus on generating high-level abstraction of design and ar-
chitectural views from source code. We have been studying sense-making and software architectural visualisation.
The approaches are discussed in subsection B.
   Activity 1 and Activity 2 are concurrently performed. On one hand, outcomes of Activity 1 can be considered
as requirements for Activity 2. On the other hand, research ideas and the views that are generated from Activity
2 will be introduced to pratitioners. Validation will be made during the iterations of the two activities.

3.1     An exploratory study of practitioners
3.1.1    Conduct interviews
Developers are often not up-to-date with state-of-the-art comprehension tools. On the other hand, academia has
a limited knowledge about industrial practitioners. For example, when it comes to questions like: How could we


                                                         3
understand a (part of a) software system? Referring to software design seems to be an obvious answer. However,
none of the observed research has mentioned the use of architecture design as part of the understanding process.
  Thefore, semi-structured interviews will be used to get a better understanding of software practitioners.
We consider both academic and industrial developers as targeted interviewees. We split them into groups
by several ways: level of software comprehension expertise; familiar with a specific software system/software
maintenance task. Shedding some lights in the di↵erences between groups in understanding software systems
could be beneficial for us in order to generate suitable views for each group. We take our colleagues and Software
Engineering students at the University of Gothenburg and Chalmers University of Technology as academic
candidates. We have been inviting a number of local companies (such as Volvo Cars, Ericsson) and out-of-border
companies (which locate in Vietnam, The Netherland) to involve in this research.

3.1.2    Process Mining
In the quest for knowledge about strategies and struggles that practitioners have found during the understand-
ing phase, process mining is possible research tool. Process mining techniques make use of historical data to
graphically represent and analyse a particular process [23].
   Blikstein reported of the use of a logging module for programming tasks [1] and identified student strategies
that could help lecturers identify student problems in an early stage of the task. Ko et al. conducted a study
in which they used the combination of a logging file and a visual interpretation tool to analyse the behaviour of
software developers during a maintenance task [14]. They successfully identified di↵erent strategies the developers
used. Claes, Pingerra et.al [4][17] logged students events during business process modeling sessions. They used
visual analysis [24] and found di↵erent styles and related them to model quality.
   By using a process mining approach, we have conducted an exploratory study on students strategies performing
software modeling tasks [5]. We found out that students use di↵erent strategies for solving the tasks. We
categorised these strategies into four main strategies: Depthless, Depth First, Breadth First and Ad-Hoc. From
our results Depth First indicates to support better layout and richness (detail). We wanted to examine our
insights by conducting this experiment on a bigger sample size of students, and possibly on industrial side.

3.2     Data generation and development of the views
3.2.1    Sense-making on source code
Sensemaking, as described by Weick [7], literally means making sense of events. According A. von Mayrhauser,
sensemaking is a term used to refer to humans capability to actively comprehend the significance of ambiguous
events and data [25]. To our point of view, sense-making is considered as a process where software artefacts are
manipulated and presented in a higher level of abstraction. With the focus on high-level design and architectural
concepts from source code, we take software reverse engineering (RE) and natural language processing (NLP) as
the main drivers for the sense-making process.
   Reverse Engineering. Reverse engineering aims to analyse the source code of a system and create design
representations of the system [3]. Open source and commercial tools have been developed to generate software
design from source code. However, the reverse engineered presentation often contains too much details. When
a RE class diagram becomes too large, it provides little benefit towards program. We have been working on
possible solutions to present the RE diagrams in a more informative way.
   We take the prior research done by Osman et.al as inspiration. The authors have proposed a supervised
machine learning approach to condense RE class diagrams into another class diagram that is close to forward
design diagram [15][16]. The authors compute values of a number of design metrics from source code and use those
to predict classes as important or not. The condensed diagram is then constructed from the reverse-engineered
diagram by keeping the important classes and eliminating unimportant ones. Thung et.al have extended Osmans
work by using networks metrics as predictors [22]. This work could be extended by considering more predictor
features, i.e. dynamic metrics (from execution traces), text mining metrics (from use cases, requirements).
   Natural Language Processing. NLP can be considered as a process of extracting information from human
or natural language inputs. By adapting NLP to source code analysis, one could be able to extract semantically-
related parts of source code, which in the end results in the reducing of maintenance cost. Shepherd et.al has
introduced a Find-Concept search process which makes use of NLP analysis that captures the relations between
actions (verbs) and the objects (nouns) that these actions act upon [19]. Hill et.al propose a technique and
a tool to score method relevance with respect to natural language descriptions of a specific maintenance task


                                                        4
[11]. Starting with a seed method and a natural language description of the bug to be fixed or feature to be
added, the tool automatically generate and show a reduced version of the call-graph of the method by pruning
irrelevant structure edges from consideration. We have been working on automatic recognition of classs roles
(such as UI, security, persistence, etc.) by using text-analysis on source code. The result could then be displayed
in a role-based view of reverse engineering design.

3.2.2   Data visualisation
Visualisation depends on target audience and its information needs which are not available at the first phases
of the research. Therefore, visualisation is not our main focus at current time. So far, we have been working
on the two main directions: 1) With regards to the visualisation of RE diagrams: We are considering di↵erent
visualisation strategies for class’s roles; 2) Regarding presentation of activity logging data: We have developed
the tool LogViz which is capable of showing multiple logging files and filtering the logging by activities and
architectural elements [5]. In the time to come, we tend to contact with companies for validating our approaches.

References
 [1] P. Blikstein. Using learning analytics to assess students’ behavior in open-ended programming tasks. In
     Proceedings of the 1st International Conference on Learning Analytics and Knowledge, LAK ’11, pages
     110–116, New York, NY, USA, 2011. ACM.
 [2] B. W. Boehm. Software engineering. IEEE Trans. Comput., 25(12):1226–1241, Dec. 1976.
 [3] E. Chikofsky and I. Cross, J.H. Reverse engineering and design recovery: a taxonomy. Software, IEEE,
     7(1):13–17, Jan 1990.
 [4] J. Claes, I. Vanderfeesten, J. Pinggera, H. Reijers, B. Weber, and G. Poels. A visual analysis of the process
     of process modeling. Information Systems and e-Business Management, 13(1):147–190, 2015.
 [5] M. R. C. Dave R. Stikkolorum, Truong Ho-Quang. Revealing students uml class diagram modelling strategies
     with webuml and logviz. In (accepted for presentation at the Euromicro SEAA conference, August 26-28,
     2015 and publication in the conference proceedings).
 [6] S. C. B. de Souza, N. Anquetil, and K. M. de Oliveira. A study of the documentation essential to software
     maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication:
     Documenting &Amp; Designing for Pervasive Information, SIGDOC ’05, pages 68–75, New York, NY, USA,
     2005. ACM.
 [7] C. A. Decker. Sensemaking in organizations, by k. e. weick. (1995). thousand oaks, ca: Sage. 321 pp.,
     44.00cloth,19.95 paper. Human Resource Development Quarterly, 9(2):198–201, 1998.
 [8] B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk. Feature location in source code: a taxonomy and
     survey. Journal of Software: Evolution and Process, 25(1):53–95, 2013.
 [9] A. Dunsmore, M. Roper, and M. Wood. The role of comprehension in software inspection. Journal of
     Systems and Software, 52(23):121 – 129, 2000.
[10] Q. Gan, M. Zhu, M. Li, T. Liang, Y. Cao, and B. Zhou. Document visualization: an overview of current
     research. Wiley Interdisciplinary Reviews: Computational Statistics, 6(1):19–36, 2014.
[11] E. Hill, L. Pollock, and K. Vijay-Shanker. Exploring the neighborhood with dora to expedite software
     maintenance. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated
     Software Engineering, ASE ’07, pages 14–23, New York, NY, USA, 2007. ACM.
[12] T. Ishio, S. Hayashi, H. Kazato, and T. Oshima. On the e↵ectiveness of accuracy of automated feature
     location technique. In Reverse Engineering (WCRE), 2013 20th Working Conference on, pages 381–390,
     Oct 2013.
[13] A. J. Ko, H. Aung, and B. A. Myers. Eliciting design requirements for maintenance-oriented ides: A detailed
     study of corrective and perfective maintenance tasks. In Proceedings of the 27th International Conference
     on Software Engineering, ICSE ’05, pages 126–135, New York, NY, USA, 2005. ACM.


                                                        5
[14] A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung. An exploratory study of how developers seek, relate,
     and collect relevant information during software maintenance tasks. IEEE Trans. Softw. Eng., 32(12):971–
     987, Dec. 2006.

[15] M. Osman, M. Chaudron, and P. Van Der Putten. An analysis of machine learning algorithms for con-
     densing reverse engineered class diagrams. In Software Maintenance (ICSM), 2013 29th IEEE International
     Conference on, pages 140–149, Sept 2013.
[16] M. Osman, M. Chaudron, P. Van Der Putten, and T. Ho-Quang. Condensing reverse engineered class
     diagrams through class name based abstraction. In Information and Communication Technologies (WICT),
     2014 Fourth World Congress on, pages 158–163, Dec 2014.
[17] J. Pinggera, P. So↵er, S. Zugal, B. Weber, M. Weidlich, D. Fahland, H. Reijers, and J. Mendling. Modeling
     styles in business process modeling. In I. Bider, T. Halpin, J. Krogstie, S. Nurcan, E. Proper, R. Schmidt,
     P. So↵er, and S. Wrycza, editors, Enterprise, Business-Process and Information Systems Modeling, volume
     113 of Lecture Notes in Business Information Processing, pages 151–166. Springer Berlin Heidelberg, 2012.
[18] T. Roehm, R. Tiarks, R. Koschke, and W. Maalej. How do professional developers comprehend software?
     In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 255–265,
     Piscataway, NJ, USA, 2012. IEEE Press.
[19] D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis
     to locate and understand action-oriented concerns. In Proceedings of the 6th International Conference on
     Aspect-oriented Software Development, AOSD ’07, pages 212–224, New York, NY, USA, 2007. ACM.
[20] Z. Soh, F. Khomh, Y.-G. Gueheneuc, and G. Antoniol. Towards understanding how developers spend their
     e↵ort during maintenance activities. In Reverse Engineering (WCRE), 2013 20th Working Conference on,
     pages 152–161, Oct 2013.

[21] M.-A. D. Storey, D. Čubranić, and D. M. German. On the use of visualization to support awareness of
     human activities in software development: A survey and a framework. In Proceedings of the 2005 ACM
     Symposium on Software Visualization, SoftVis ’05, pages 193–202, New York, NY, USA, 2005. ACM.
[22] F. Thung, D. Lo, M. H. Osman, and M. R. V. Chaudron. Condensing class diagrams by analyzing design
     and network metrics using optimistic classification. In Proceedings of the 22Nd International Conference on
     Program Comprehension, ICPC 2014, pages 110–121, New York, NY, USA, 2014. ACM.
[23] W. M. P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement of Business Processes.
     Springer Publishing Company, Incorporated, 1st edition, 2011.
[24] B. van Dongen, A. de Medeiros, H. Verbeek, A. Weijters, and W. van der Aalst. The prom framework: A
     new era in process mining tool support. In G. Ciardo and P. Darondeau, editors, Applications and Theory
     of Petri Nets 2005, volume 3536 of Lecture Notes in Computer Science, pages 444–454. Springer Berlin
     Heidelberg, 2005.
[25] A. von Mayrhauser and A. Vans. From code understanding needs to reverse engineering tool capabilities. In
     Computer-Aided Software Engineering, 1993. CASE ’93., Proceeding of the Sixth International Workshop
     on, pages 230–239, Jul 1993.
[26] A. von Mayrhauser and A. Vans. Program comprehension during software maintenance and evolution.
     Computer, 28(8):44–55, Aug 1995.


                                                       6