=Paper=
{{Paper
|id=None
|storemode=property
|title=Automated Quality Defect Detection in Software Development
Documents
|pdfUrl=https://ceur-ws.org/Vol-708/sqm2011-dautovic-et-al-11-autoQualityDefectDetect.pdf
|volume=Vol-708
}}
==Automated Quality Defect Detection in Software Development
Documents==
Automated Quality Defect Detection in Software
Development Documents
Andreas Dautovic, Reinhold Plösch Matthias Saft
Institute for Business Informatics - Software Engineering Corporate Technology
Johannes Kepler University Linz Siemens AG
Altenberger Straße 69, 4040 Linz, Austria Otto-Hahn-Ring 6, 81739 Munich, Germany
andreas.dautovic | reinhold.ploesch@jku.at matthias.saft@siemens.com
Abstract—Quality of software products typically has to be inadequate or poor project documentation, which makes
assured throughout the entire software development life-cycle. software hard to understand, change or modify. Based on a
However, software development documents (e.g. requirements comprehensive literature study, Chen and Huang [3] identified
specifications, design documents, test plans) are often not as five quality problems of software documentation:
rigorously reviewed as source code, although their quality has a
major impact on the quality of the evolving software product. • Documentation is obscure or untrustworthy.
Due to the narrative nature of these documents, more formal • System documentation is inadequate, incomplete or
approaches beyond software inspections are difficult to establish. does not exist.
This paper presents a tool-based approach that supports the • Documentation lacks traceability, as it is difficult to
software inspection process in order to determine defects of
trace back to design specifications and user
generally accepted documentation best practices in software
development documents. By means of an empirical study we
requirements.
show, how this tool-based approach helps accelerating inspection • Changes are not adequately documented.
tasks and facilitates gathering information on the quality of the • Documentation lacks integrity and consistency.
inspected documents.
In order to improve the overall quality of natural language
Keywords-quality defect detection; software development project documents throughout the software life-cycle, the use
document; tool-based approach; software inspection of inspections is generally accepted. Since the introduction of
inspections in the mid-1970s by Fagan [4], some
I. INTRODUCTION
modifications have been made to the original process.
Improved reading techniques including: checklist-based
Software quality assurance aims at ensuring explicitly or reading [5] [6], usage-based reading [6] [7] or perspective-
implicitly defined quality goals for a software product. based reading [8] [9] are nowadays available for checking
Assuring the quality of a software product basically deals with consistency and completeness of natural language texts.
the fulfillment of specified functional and quality However, analysis [10] [11] show that currently available
requirements, where the checks are often realized by static and inspection methods are mainly used for source code reviews.
dynamic testing of the software product. Software This is surprising and can be explained by the lack of tools
development documents like requirements specifications, that fully support software inspections [12] [13], especially in
which define how to build the right software product, or dealing with specific artifact types and locating potential
design documents, which define how to build the software defects in this artifacts. Furthermore, high inspection costs due
product right, are also an essential part of the entire software to the resource-intensive nature of reviews and tedious
product. However, they are often not treated with the same searching, sorting or checking tasks often restrain the
enthusiasm as source code. Consequently, software bugs are application of software inspections [11].
fixed in a later phase of the software product life-cycle, which
leads to increased costs for software changes [1]. For instance, In this paper we present a tool-based approach that tries to
a cost/benefit-model reveals that due to the introduction of identify potential document quality defects. This tool-based
design inspection 44 percent of defect costs compared to analysis relies on best practices for software documentation. In
testing alone can be saved [2]. Therefore, to positively section II we give an overview of related work in the context
influence the development of a software product, quality of software inspection and document quality defect
assurance also has to systematically deal with the quality of management tools. Section III shows how documentation best
software development documents. practices can be used to identify document quality defects. In
section IV we present our developed tool-based document
Natural language is a commonly used representation for quality defect detection approach. Section V gives an
software development documents. However, as a result of its overview of the results of an empirical study where we used
informal nature, natural language text can easily lead to
our approach to detect document quality defects in real-world contrast to the approach presented in [19], XAF uses a facade
project documentation. Finally, in section VI we give a layer for transforming XQuery rules to the individual XML
conclusion and discuss further work. representation of the underlying software artifact. As a result
of this layered architecture, XAF enables the creation of re-
II. RELATED WORK usable analysis rules that are independent from the specific
In this section we give an overview of existing tools that target software artifact.
can be used in the document inspection process. As there are
different criteria for categorizing and distinguishing inspection III. DOCUMENTATION BEST PRACTICES FOR MEASURING
tools [12] [13], we focus on tools that directly address data DOCUMENT QUALITY
defects, i.e. tools that enable locating potential quality defects International documentation and requirements
in the documents. Following, we also discuss work of software specification standards like NASA-STD-2100-91 [22], IEEE
engineering domains apart from software inspections, but Std 830-1998 [23], IEEE Std 1063-2001 [24], ISO/IEC
enable comprehensive document quality analysis and 18019:2004 [25], and ISO/IEC 26514:2008 [26] provide best
assessments. However, tools that support e.g. the collaborative practices and guidelines for information required in software
inspection process or the process improvement are out of documentation. Most of these documentation standards focus
scope of this work and will not be discussed in this section. on guidelines for technical writers and editors producing
manuals targeted towards end users. Hargis et al. [27] focus on
Wilson, Rosenberg and Hyatt [14] present an approach for quality characteristics and distinguish nine quality
the quality evaluation of natural language software characteristics of technical information, namely “task
requirements specifications, introducing a quality model orientation”, “accuracy”, “completeness”, “clarity”,
containing eleven quality attributes and nine quality indicators. “concreteness”, “style”, “organization”, “retrieveability”, and
Furthermore, a tool called ARM (Automatic Requirements “visual effectiveness”. Moreover, they provide checklists and
Measurement) is described, which enables performing analysis a procedure for reviewing and evaluating technical
of natural language requirements against the quality model documentation according to these quality characteristics. In
with the help of quality metrics. Lami and Ferguson [15] order to determine the quality of project documents, Arthur
describe a methodology for the analysis of natural language and Stevens [28] identified in their work four characteristics
requirements based on a quality model that addresses the (“accuracy”, “completeness”, “usability”, and
expressiveness, consistency and completeness of “expandability”) that are directly related to the quality of
requirements. Moreover, to provide support for the adequate documentation. Nevertheless, documentation quality
methodology on the linguistic level of requirements is difficult to measure. Therefore, Arthur and Stevens [28]
specifications, they present the tool QuARS (Quality Analyzer refined each documentation quality attribute to more tangible
of Requirements Specifications) [16]. Further tools that also documentation factors, which can be measured by concrete
support the automatic analysis of natural language quantifiers. In order words, similar to static code analysis,
requirements documents are described e.g. by Jain, Verma, some quality aspects of project documentation can be
Kass and Vasquez [17] and Raven [18]. However, all these determined by means of metrics. Moreover, we think that
tools are limited to the analysis of requirements specifications violations of documentation best practices or generally
that are available as plain text documents. Although, the accepted documentation guidelines can also serve as
quality of a software project strongly depends on its measurable quantifiers. Consequently, violations of defined
requirements, there are a number of additional document types rules, which represent such best practices or guidelines, can be
and formats that have to be considered throughout the used for determining quality defects of documentation.
software development life-cycle.
So far we have identified and specified more than 60
Farkas, Klein and Röbig [19] describe an automated quantifiable documentation rules. Most of these document
review approach for ensuring standard compliance of multiple quality rules cover generally accepted best practices according
software artifacts (e.g. requirements specifications, UML to the documentation and requirements standards mentioned
models, SysML models) for embedded software using a above. In order to get a better understanding about document
guideline checker called Assessment Studio. The tool quality rules, we show four typical examples. Furthermore we
performs checks on XML-based software artifacts by using try to emphasize the importance of these rules for software
rules formalized in LINQ (Language Integrated Query) [20]. projects, as well as the challenges of checking them
As they use XML as a common file-basis, their approach is automatically.
not limited to one specific document type or format.
A. Adhere to document naming conventions
Moreover, traceability checks of multiple software artifacts are
facilitated. Nödler, Neukirchen and Grabowski [21] describe a Each document name has to comply with naming
comparable XQuery-based Analysis Framework (XAF) for conventions. The usage of document naming conventions helps
assuring the quality of various software artifacts. XAF enables recognizing the intended and expected content of a document
the specification of XQuery analysis rules, based on from its name, e.g., requirements document, design
specification, test plan. A consistent naming scheme for
standardized queries and pattern matching expressions. In
documents is especially important in large-scale software
projects. However, defining generally accepted naming
conventions for arbitrary projects is not always simple and
requires support for easy configuration of a specific project.
B. Each document must have an author
Each document must explicitly list its authors, as content of
documents without explicit specified authors cannot be traced
back to its creators. This is important e.g., for requirements
documents in order to clarify ambiguous specifications with the
authors. However, identifying document content particles
describing author names is difficult and needs sophisticated
heuristics.
C. Ensure that each figure is referenced in the text
If a figure is not referenced in the text, this reference might
either be missing or the intended reference might be wrong.
Typically, many figures are not self-explanatory and have to be
described in the text. It is good style (e.g., in a software design
document), to explain a UML sequence diagram or a class
diagram. In order to make this explanation readable and
consistent, it must always be clear which specific UML Figure 1. Conceptual overview of the document quality defect detection tool
artifacts are explained in the text. usage process
D. Avoid duplicates in documents Open Document Format [30] has enabled the extraction of
Within one document duplicated paragraphs exceeding a document information in a way, which is beyond unstructured
defined length should be avoided and explicitly be referenced text. In fact, our tool facilitates beside content quality analysis
instead, as duplicates make it difficult to maintain the also the check of document metadata like directory
document content. Therefore, word sequences of a specified information and version information. The use of standardized
length that are similar (e.g., defined by a percent value) to other and structured document models also allows checking more
word sequences in the same document violate this rule. specific content attributes based on the meaning of specific
However, the defined length of the word sequence strongly document particles. Moreover, it enables traceability checks to
depends on the document type and differs from project to prove document information for project wide consistency. In
project. order to support inspectors in their task, the tool can therefore
be used to automatically check cross-references within the
IV. THE AUTOMATED DOCUMENT QUALITY DEFECT document under inspection as well as from the document
DETECTION APPROACH under inspection to other related documents. The latter aspect
As shown in section III, the identification of is especially important for identifying missing or broken
documentation defects in software development documents relations between e.g., design documents and software
can rely on finding violations of document quality rules, which requirements specifications.
represent generally accepted documentation best practices and
guidelines. However, manual checks of these rules can be very Fig. 1 gives a conceptual overview of our developed tool
resource and time consuming, especial in large-scale software and describes the process of quality defect detection. First of
projects. Due to this, we developed a document quality defect all, the tool user (e.g. project manager, software inspector,
detection tool, which checks software development documents quality manager) has to choose the software development
against implemented document quality rules. documents as well as the document quality rules respectively
rule sets, which will be used for the defect detection. If
Similar to existing static code analysis suites for source necessary, the selected rules can be configured by the user to
code, our tool analyzes document information elements to find meet defined and project specific documentation requirements.
out, whether documents adhere to explicitly defined After the user has started the tool, all relevant information is
documentation best practices. In contrast to approaches and extracted from the selected software development documents.
tools mentioned in section II, our document quality defect The information is represented as a hierarchical data structure
detection tool is not restricted to elementary lexical or containing information of specific document elements (e.g.
linguistic document content analysis. Furthermore, it is also sections, paragraphs, references, figures, sentences). In a next
not limited to specific software development artifacts but step, each rule will be applied onto this document information
covers the range from requirements across system, architecture to check whether the document adheres to the rule conditions.
and design, up to test specifications. The introduction of open, Finally, all detected potential document quality defects are
standardized document formats like Office Open XML [29] or linked to the original document to provide a comprehensive
quality defect detection report to the user.
As software development documents can exist in many The entire documentation of the software project contains
different document formats, considering each format for of 179 software development documents of four different
automated quality defect detection with our tool is document format types. As it is shown in Table I, more than
challenging. Due to this we developed in a first step a tool that two-third of them are Microsoft Office Word Binary Format
is applicable for Open Office XML documents [29] and documents. However, some of them are internal documents
Microsoft Office Binary Format documents [31], as these are with intentionally lower quality. Due to this, we used a set of
one of the most commonly used formats for software 50 officially published Microsoft Word project documents
development documents. Furthermore, these standardized consisting of different document types (requirements
document formats provide access to particular document specifications, systems specifications, concept analysis,
information that is required, as some rules are applied on market analysis, delta specifications, function lists, user
specific document elements. Consequently, a traversal strategy documentation, etc.) as objects of analysis for our feasibility
to visit all these elements is needed. Due to this, we have study. These 50 documents should meet high documentation
implemented the visitor pattern [32] [33] for our tool. Using quality standards and are already checked by software
this pattern, which provides a methodology to visit all nodes inspectors. Therefore, they testify to be of a high maturity
of a hierarchical data structure, enables applying rules on each level and ready to be checked by our tool.
specified element of the extracted document information. A
B. Applied document quality rules
similar rule-checking mechanism is used by the Java source
code measurement tool PMD [34]. However, instead of using Following, we list and give a short description for all
software development rules for source code, our document document quality rules we used in our study and motivate their
quality defect detection tool uses this methodology in order to importance for software development documents. However,
check software development documents by means of easily the used rules settings are not discussed in this work.
adaptable and highly configurable document quality rules.
• ADNC - Adhere to Document Naming Conventions:
V. FEASIBILITY STUDY OF THE AUTOMATED DOCUMENT Each software development document name has to
QUALITY DEFECT DETECTION APPROACH comply with explicitly specified naming conventions,
This section describes the results of a feasibility study as project members can better grasp the document
content if documents follow a defined project-wide
conducted to test, whether an automated quality defect
document naming scheme.
detection tool using 24 document quality rules is able to reveal
• ADNS - Avoid Deeply Nested Sections: Documents
additional documentation defects human inspectors did not
should not contain a deeply nested section hierarchy.
find before. Furthermore the trustworthiness of these quality
Particularly in software development documents the
rules is shown as well as the effort to fix documentation content structure should be flat, simple and clear in
defects and rule settings. Before, we will give in (A) a brief order to support clarity.
description of the software project documentation we used in • ADUP - Avoid Duplicates in Document: Similar to
our study and in (B) an overview of the applied document duplicated source code, within software development
quality rules. documents duplicated paragraphs exceeding a defined
A. Description of the used software project documentation length (number of characters) should be omitted and
are better referenced explicitly. Otherwise the
In order to get appropriate software development document content will be more difficult to maintain.
documents for our feasibility study, we used project • AES - Avoid Empty Sections: Each section of a
documents of a real-world software project. The software as software development document must contain at least
well as the associated documentation was developed by one sentence, otherwise the content may not be
Siemens AG Corporate Technology in several iterations using complete or lacks of clarity and conciseness.
a semi-formal development process. The software is used for • AESD - Avoid Extremely Small Documents: Extremely
monitoring the communication and control flow in distributed small software development documents are indicators
applications. As we have our focus on the quality of software for unfinished content or for a bad project-wide
development documents, more technical or organizational document structure, as small software development
background information of the project is not necessary for the documents might be better combined to larger
purpose of our study. document of reasonable size.
• AIDOC - Avoid Incomplete Documents: Particularly in
TABLE I. SOFTWARE PROJECT DOCUMENT FORMAT TYPES later phases of the software development process
documents should contain all information that is
document format type no. documents in project
required. Therefore, documents that are formally
DOC 124 incomplete, i.e., contain phrases like “TBD” or
XLS 11 “TODO” are not yet complete by definition.
PPT 37
PDF 7
• ALS - Avoid Long Sentences: Identify those sentences documentation, as this contributes to a higher
in a project document that exceed a given length, consistency and comprehensibility of the documents.
where length is expressed by the number of words • ECNT - Ensure Continuous Numbering of Tables: In
contained in the sentence. Long sentences harm the software development documents an ascending
readability of e.g. requirements specifications or test numbering of tables improves the document quality, as
plans and are therefore indicators for difficult to this leads to higher consistency and comprehensibility
understand content of software development of the document.
documents. • EFRT - Ensure that each Figure is Referenced in the
• AULD - Avoid Ultra Large Documents: Ultra-large Text: Each figure has to be referenced in the text of
software development documents should be avoided as software development documents; otherwise it is
they are more difficult to maintain and to keep incoherent or might be ambiguous.
consistent. Furthermore, it is harder to check whether • ETRT - Ensure that each Table is Referenced in the
all information needed is present. Text: Each table has to be referenced in the text of
• ARHT - Avoid Repeated Heading Text: In a software software development documents; otherwise it is
development document paragraphs of a section should incoherent or might be ambiguous.
not only consist of a copy of the heading text, as this a • FMHC - Figures Must Have a Caption: Each figure in
indicator of a underspecified and incomplete section a software development document must have a caption
• ASPE - Avoid Spelling Errors: Each software in order to express the visualized topics linguistically;
development document should be free of spelling otherwise it may be ambiguous for the readers.
errors, regardless whether it is written in one language • PIFF - Provide Index For Figures: If a software
or contains a mix of languages. development document contains figures, there must be
• ATSS - Adhere To Storage Structure: Each software an index listing all figures in order to keep information
development document should be put in the right place quickly retrievable for all project members.
of the storage system, i.e. it should typically be stored • PIFT - Provide Index For Tables: If a software
in a directory according to project-wide rules (typically development document contains tables, there must be
for different types and/or phases of the software an index listing all tables in order to keep the
development process). information quickly retrievable for all project
• DESOR - Define Expected Skills Of Readers: For each members.
software development document the skills of readers • TMHC – Tables Must Have a Caption: Each table in a
should be explicitly defined, as depending on the skills software development document must have a caption
of readers, the content of the software development in order to express the visualized data of the table
document has to be presented in a different way. So, linguistically; otherwise it may be ambiguous for the
depending on the expected skills of the readers data readers.
might be presented more formally using e.g., UML, or
must definitely avoid any formalisms. C. Violations
• DMHA - Document Must Have Author: Each software In this section we give an overview of the results of our
development document must explicitly list its authors, software development document defect detection analysis.
as in the case of changes each document has to be
traceable to its creators. Therefore, this rule is violated, TABLE II. DEFECT DETECTION TOOL RESULTS
if there is no author defined in the document meta-
no. documents analyzed 50
information and no key word is found that indicates the
existence of an author name. no. document quality rules 24
• DMHV - Document Must Have Version Id: Similar to total no. violations found 8,955
source code each document in a software project avg. false positive rate per rule 0.172
should have an explicit version identifier.
• DMHVH - Document Must Have Version History: In In our feasibility study 50 project documents were
order to keep software development documents automatically checked by 24 document quality rules, which
comprehensible, each document must provide a version revealed a total number of 8,955 violations. For these findings
history that roughly outlines the changes over time we determined an average false positive rate per rule of 17.2
(versions) during the entire software development percent and a false negative rate per rule of 0.4 percent. False
process. positive findings are (in our case) over-detected defects that
• DMS - Document Must have a State: Each software are no documentation defects in the sense of human software
development document should outline its defined state inspectors. On the other hand, false negative findings are
(e.g., draft, in review, final, customer approved), in defects that have not been found by our tool but that are
order to present the current document state to the definitely documentation defects in the sense of human
project members. software inspectors.
• ECNF - Ensure Continuous Numbering of Figures: In
software development documents ascending
numbering of figures improves the quality of
Figure 2. Rule violations per document quality rule distribution Figure 3. Trustworthiness of all applied document quality rules
TABLE III. RESULTS OVERVIEW PER RULE D. Trustworthiness
rule no. violations false positive false negative The trustworthiness of a rule specifies how reliable the
rate rate detection of a violation is. We classify trustworthiness into:
ADNC 11 0 0
ADNS 14 0 0 • very low: There is too much over- and/or under-
ADUP 823 0 0 detection in order to rely on the results.
AES 337 0.033 0 • low: There is significant over- and under-detection.
AESD 30 0.933 0 • medium: Most issues are found, but there is over-
AIDOC 0 0 0.020 detection.
ALS 49 0 0 • high: Almost no over- and under-detection. Very
AULD 9 0 0.100 reliable findings.
ARHT 13 0.846 0 • very high: No known over- or under-detection.
ASPE 5,956 0.602 0 Absolutely reliable findings.
ATSS 25 0 0
DESOR 50 0 0 As a result of this classification scheme, a main factor to
determine the trustworthiness for a document quality rule is its
DMHA 0 0 0.040
false positive rate. Indeed, we also take false negative
DMHV 21 0 0
findings, as far as possible to indentify, and known
DMHVH 14 0 0
weaknesses of the rule implementation into account, i.e., a rule
DMS 48 0 0.040 with a false positive rate of 0.0 and/or a false negative rate of
ECNF 43 0.488 0 0.0 does not implicitly have to have a trustworthiness rating of
ECNT 46 0.326 0 ‘very high’.
EFRT 106 0.274 0
ETRT 75 0.160 0 As shown in Fig. 3, we rated the trustworthiness of the
FMHC 329 0.365 0 document violations for eight of our 24 applied rules with
PIFF 50 0 0 ‘very high’, i.e., these violations are very reliable.
PIFT 50 0 0
TMHC 856 0.093 0 TABLE IV. ‘VERY HIGH’ TRUSTWORTHY RULES
ADNC Adhere to Document Naming Conventions
During our investigations we also found out that the ADNS Avoid Deeply Nested Sections
violations per rule are unequally distributed. As shown in ADUP Avoid Duplicates in Document
Table III, the rules ADUP, AES, ASPE, FMHC and TMHC
ALS Avoid Long Sentences
identified more than 300 violations each. Due to this, we
ATSS Adhere To Storage Structure
accumulated the number of violations found by these five
DMHVH Document Must Have Version History
rules and compared it with the total number of violations.
Consequently, as it can be seen in the ABC analysis diagram PIFF Provide Index For Figures
in Fig. 2, we revealed that these five document quality rules PIFT Provide Index For Tables
are responsible for more than 90 percent of all thrown
violations. Furthermore, we also determined for eight document
quality rules a ‘high’ trustworthiness, as we identified almost
no over- or under-detection for this rules. As a result of this
more than two-third of our rules are identified to be ‘very As shown in Fig. 4, most violations thrown by 19 of our 24
high’ or ‘high’ trustworthy. applied rules affect only some lines in the documents, i.e.
these defects can be quickly corrected and represent easy wins.
TABLE V. ‘HIGH’ TRUSTWORTHY RULES Moreover, for fixing the defects of four of our rules we
AES Avoid Empty Sections determined that document-wide changes are required.
AIDOC Avoid Incomplete Documents
TABLE VII. ‘MEDIUM’ EFFORT TO FIX DEFECTS
AULD Avoid Ultra Large Documents
DESOR Define Expected Skills Of Readers ADNS Avoid Deeply Nested Sections
DMHA Document Must Have Author ADUP Avoid Duplicates in Document
DMHV Document Must Have Version Id AESD Avoid Extremely Small Documents
ETRT Ensure that each Table is Referenced in DMHVH Document Must Have Version History
the Text
TMHC Table Must Have a Caption Nevertheless, during our feasibility study we also
determined that all true positive AULD violations lead to
However, our feasibility study also revealed three rules project-wide document changes. In this case, high effort is
with a ‘low’ trustworthiness. needed as an ultra large document has to be split into separate
documents. Furthermore, all references are affected and have to
TABLE VI. ‘LOW’ TRUSTWORTHY RULES be checked for correctness. It is very hard to determine whether
AESD Avoid Extremely Small Documents defects of a specific rule generally affect only some lines in a
document or the entire software project, as e.g. small changes
ARHT Avoid Repeated Heading Text
in some lines can also lead to broken references in other
ASPE Avoid Spelling Errors documents.
These rules have to deal with a false positive rate of more F. Effort to change settings
than 60 percent, e.g. most of the ASPE violations are thrown as The effort to adapt configuration settings of the rules to the
domain specific terms or abbreviations are falsely identified as needs of the specific project specifies how much effort is
misspelled words. Nevertheless, some of the violations of these needed to spent for adapting the rule configurations, before the
three rules are informative as we think that, although there is document defect detection tool can be applied:
much over- and under-detection, they can be categorized as
‘low’ trustworthy. Moreover, we think that small rule
improvements, e.g. adding the usage of a domain specific • low: Nothing or very small adaptations are necessary
dictionary for the ASPE rule, would lead to a higher in a settings file.
trustworthiness. • medium: Some lines have to be changed in a settings
file. Some knowledge of the analyzed project
documents is necessary to define, e.g., suitable regular
E. Effort to fix defects
expressions.
The effort to fix true positive findings specifies how much • high: Settings files have to be changed considerably.
is needed to spent for removing a defect (qualitatively): Detailed information of the project document content
and structure is necessary to define, e.g., suitable
• low: Only some local lines in a document have to be regular expressions.
changed.
• medium: Document-wide changes are necessary. As stated in Fig. 5, more than two-thirds of all applied
• high: Project-wide document changes are necessary. document quality rules do not need considerable effort to be
Figure 4. ‘Effort to change defects’ of all applied document quality rules Figure 5. ‘Effort to change settings’ of all applied document quality rule
suitably configured. In order to correctly configure six of our project with 24 document quality rules. The tool revealed 8,955
rules it is necessary to have some further knowledge in violations with an average false positive rate per rule of
specifying correct regular expressions. 17.2 percent and an average false negative rate per rule of 0.4
percent. As our study shows, two-thirds of all applied
TABLE VIII. ‘MEDIUM’ EFFORT TO CHANGE SETTINGS document quality rules were rated with a ‘high’ or ‘very high’
trustworthiness. Furthermore, it has been pointed out that most
ADNC Adhere to Document Naming Conventions
of the violations found can be easily removed (effort to change
ASPE Avoid Spelling Errors defect = ‘low’), as they often only affect some few lines.
ATSS Adhere To Storage Structure
DMHV Document Must Have Version Id In our feasibility study we determined that nearly 75
EFRT Ensure that each Figure is Referenced in percent of all rules did not need any further configuration
the Text changes before they could be suitably applied to software
development documents. Nevertheless, seven rules had to be
ETRT Ensure that each Table is Referenced in
adapted to project specific document conventions before they
the Text
could be applied. In the case of the project documentation used
for our study the configuration of these rules took us
Furthermore, it is required to have an overview of the approximately six hours, as we were not familiar with the
document structure and document content. However, to conventions defined for the document naming and content
correctly configure the DESOR rule (effort to change settings = structure. However, we saw that after the rules had been
‘high’), there must be also some knowledge of the used suitably configured, the trustworthiness of the rule violations
expressions and languages in order to identify and extract the rose considerably, i.e., the configuration effort well paid-off.
specific document content properties that define the skills of
readers. In a next step, we will apply our document quality defect
detection tool on the documents of additional software projects
VI. CONCLUSION AND FURTHER WORK to improve the implementation of the rules with an emphasis
on reducing the false positive rate and to validate the results of
Empirical studies show that tool support can significantly our feasibility study in more detail. Moreover, as we have seen
increase the performance of the overall software inspection that some of our rules are applicable for most technical
process [10][11][12][13]. However, most available software documents, we also want to implement some document quality
inspection tools are optimized for code inspections, which rules that are even more specific for software development
usually provide support for plain text documents, only. Due to rules. For instance, we will add rules that deal with domain
this they are inflexible with respect to different artifact types specific terms and glossaries used in software documents or the
and limit inspectors in their work. For natural language text, traceability of references between various software
inspection tools cannot fully replace human inspectors in development documents (of different software life-cycle
detecting defects. Nevertheless, software inspection tools can phases).
be used to make defect detection tasks easier [11]. Encouraged
by this, we developed a tool-based quality defect detection We currently also work on transferring Adobe PDF
approach to support the inspection process by checking documents in a way that the already developed document
documentation best practices in software development quality rules for the Office Open XML documents and
documents. Microsoft Office Binary Format documents can be used
without changes. As a result of this, we think that the definition
International documentation and specification standards of an abstract document structure that separates the rules from
[22] [23] [24] [25] [26] define a set of generally accepted the underlying software artifacts is essential. Consequently, this
documentation best practices. Furthermore, checklists and would enable a much easier development of rules that can be
reviewing procedures [27] are widely used as well as applied on elements of a general document model, as there is
documentation quantifiers in order to check specific no need to deal with the complexity of specific document
documentation characteristics representing quality aspects [28]. formats for the rule development. Furthermore, we recognized
As a result of these studies we came to the conclusion, that that our rules are too loosely grouped. From our experience
measurable document quality rules expressing best practices with rules in the context of code quality [35], we will develop a
can also be used to detect defects in software development quality model that allows a systematic clustering of rules by
documents and to help enhancing documentation quality. So far means of quality attributes. This will give us the possibility to
we have implemented a document quality defect detection tool, evaluate document quality on more abstract levels, like
which is applicable on Office Open XML documents [29] and readability or understandability of a document.
Microsoft Office Binary Format documents [31]. The tool
allows checking, whether software development documents
adhere to explicitly defined document quality rules. In a ACKNOWLEDGMENTS
feasibility study we showed that our automatic defect detection We would like to thank Siemens AG Corporate Technology
tool is capable of finding additional uncovered significant for supporting our empirical investigations by providing us
documentation defects that had been overlooked by human with software development documentation data in order to
inspectors. conduct our feasibility study and test our approach.
During our analysis, we automatically checked 50
Microsoft Office Word documents of a real-world software
REFERENCES Proceedings of the 2nd India software engineering conference, ACM,
2009, pp. 37-46.
[1] B. W. Boehm, Software Engineering. Barry W. Boehm’s lifetime
contributions to software development, management, and research. [18] Raven: Requirments Authoring and Validation Environment,
Hoboken, N.J., Wiley-Intersience, 2007. www.ravenflow.com.
[2] L. Briand, K. E. Emam, O. Laitenberger, and T. Fussbroich, Using [19] T. Farkas, T. Klein, H. Röbig, "Application of Quality Standards to
Simulation to Build Inspection Efficiency Benchmarks for Development Mutliple Artifacts with a Universal Compliance Solution“, in Model-
Projects. International Conference on Software Engineering, IEEE Based Engineering of Embedded Real-Time Systems. International
Computer Society, 1998. Dagstuhl Workshop, Dagstuhl Castle, Germany, 2007.
[3] J. C. Chen and S. J. Huang, “An empirical analysis of the impact of [20] Microsoft Developer Network: The LINQ Project,
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
software development problem factors on software maintainability,” in
Journal of Systems and Software. Elsevier Science Inc., 2009, vol. 82, [21] J. Nödler, H. Neukirchen, and J. Grabowski, ”A Flexible Framework for
pp. 981-992. Quality Assurance of Software Artefacts with Applications to Java,
UML, and TTCN-3 Test Specifications” in Proceedings of the 2009
[4] M. E. Fagan, “Design and code inspections to reduce errors in program
development,” in IBM Systems Journal. vol. 15 (3), 1976, pp. 182-211. International Conference on Software Testing Verification and
Validation. IEEE Computer Society, 2009, pp. 101-110.
[5] T. Gilb and D. Graham, Software Inspection. Addison-Wesley
Publishing Company, 1993. [22] NASA Software Documentation Standard, NASA-STD-2100-91.
National Aeronautics and Space Administration, NASA Headquarters,
[6] T. Thelin, P. Runeson, and C. Wohlin, “An Experimental Comparison of Software Engineering Program, July, 1991.
Usage-Based and Checklist-Based Reading,” in IEEE Trans. Software
Engineering, vol. 29, no. 8, Aug. 2003, pp. 687-704. [23] IEEE Recommended Practice for Software Requirements Specifications,
IEEE Std 830-1998. 1998.
[7] T. Thelin, P. Runeson, C. Wohlin, T. Olsson, and C. Andersson,
“Evaluation of Usage-Based Reading-Conclusions after Three [24] IEEE Standard for Software User Documentation, IEEE Std 1063-2001.
2001
Experiments,” in Empirical Software Engineering: An Int’l J., vol. 9, no.
1, 2004, pp. 77-110. [25] ISO/IEC 18019:2004: Software and system engineering - Guidelines for
[8] F. Shull, I. Rus, and V. Basili, “How Perspective-Based Reading Can the design and preparation of user documentation for application
software, 2004.
Improve Requirements Inspections,” in Computer, vol. 33, no. 7, July
2000, pp. 73-79. [26] ISO/IEC 26514:2008: Systems and software engineering - Requirements
for designers and developers of user documentation, 2008.
[9] J. Carver, F. Shull, and V.R. Basili, “Can Ovservational Techniques
Help Novices Overcome the Software Inspection Learning Curve? An [27] G. Hargis, M. Carey, A. K. Hernandez, P. Hughes, D. Longo, S.
Empirical Investigation,” in Empirical Software Engineering: An Int’l J., Rouiller, E. Wilde, Developing Quality Technical Information: A
vol. 11, no. 4., 2006, pp. 523-539. Handbook for Writers and Editors, 2nd ed. IBM Press, 2004.
[10] O. Laitenberger and J.-M. DeBaud, “An encompassing life cycle centric [28] J. D. Arthur, and K. T. Stevens, Document Quality Indicators: A
survey of software inspection,” in Journal of Systems and Software. vol. Framework for Assessing Documentation Adequacy. Virginia
50, 2000, pp. 5-31. Polytechnic Institute, State University, 1990.
[11] S. Biff; P. Grünbacher, and M. Halling, “A family of experiments to [29] ISO/IEC 29500:2008. Information technology – Document description
investigate the effects of groupware for software inspection,” in and processing languages – Office Open XML File Formats Open
Automated Software Engineering, Kluwer Academic Publishers, vol. 13, Document Format, 2008.
2006, pp. 373-394. [30] ISO/IEC 26300:2006. Information technology- Open Document Format
[12] H. Hedberg and J. Lappalainen, A Preliminary Evaluation of Software for Office Applications (OpenDocument), 2006.
Inspection Tools, with the DESMET Method. Fifth International [31] Microsoft Office Binary File Format: http://www.microsoft.com/
Conference on Quality Software, IEEE Computer Society, 2005, pp. 45- interop/docs/OfficeBinaryFormats.mspx
54. [32] P. Buchlovsky and H. Thielecke, “A Type-theoretic Reconstruction of
[13] V. Tenhunen and J. Sajaniemi, An Evaluation of Inspection Automation the Visitor Pattern. Electronic Notes,” in Theoretical Computer Science.
Tools. International Conference on Software Quality, Spinger-Verlag, vol. 155, 2006, pp. 309 - 329.
2002, pp. 351-362. [33] B. C. Oliveira, M. Wang, and J. Gibbons, The visitor pattern as a
[14] W. M. Wilson, L. H. Rosenberg, and L.E. Hyatt, Automated quality reusable, generic, type-safe component. Proceedings of the 23rd ACM
analysis of Natural Language Requirement specifications. PNSQC SIGPLAN conference on Object-oriented programming systems
Conference, October 1996. languages and applications. ACM, 2008, pp. 439-456.
[15] G. Lami and R. W. Ferguson, “An Empirical Study on the Impact of [34] T. Copeland, PMD Applied. Centennial Books, 2005.
Automation on the Requirements Analysis Process,“ in Journal of [35] R. Plösch, H. Gruber, A. Hentschel, Ch. Körner, G. Pomberger, S.
Computer Science Technology. vol. 22, 2007, pp. 338-347. Schiffer, M. Saft, and S. Storck, “The EMISQ Method and its Tool
[16] G. Lami, QuARS: A tool for analyzing requirements. Software Support - Expert Based Evaluation of Internal Software Quality,” in
Engineering Institute, 2005. Journal of Innovations in Systems and Software Engineering. Springer
[17] P. Jain, K. Verma, A. Kass, and R. G. Vasquez, Automated review of London, vol. 4(1), March 2008.
natural language requirements documents: generating useful warnings
with user-extensible glossaries driving a simple state machine.