<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Requirements Communication in Issue Tracking Systems in Four Open-Source Projects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thorsten Merten</string-name>
          <email>thorsten.merten@h-brs.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bastian Mager</string-name>
          <email>bastian.mager.2010w@informatik.h-brs.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Hu¨ bner</string-name>
          <email>huebner@informatik.uni-heidelberg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Quirchmayr</string-name>
          <email>quirchmayr@informatik.uni-heidelberg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Paech</string-name>
          <email>paech@informatik.uni-heidelberg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Bu¨ rsner</string-name>
          <email>simone.buersner@h-brs.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bonn-Rhein-Sieg University of Applied Sciences, Dept. of Computer Science</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Heidelberg, Institute of Computer Science</institution>
          ,
          <addr-line>Heidelberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>114</fpage>
      <lpage>125</lpage>
      <abstract>
        <p>[Context and motivation] Communication in distributed software development is usually supported by issue tracking systems. Within these systems, most of the communication is stored as unstructured natural language text. The natural language text, however, contains much information with respect to requirements management, e.g. discussion, clarification and prioritization of features, bugs, and refactorings. [Question] This paper investigates the information stored in the issue tracking systems of four different open-source projects. It categorizes the text and reports on the distribution of issue types and information types. [Principal ideas/results] A manual analysis of 80 issues, using a grounded approach, is conducted to derive a taxonomy of issue types and information types. Subsequently, the taxonomy is used as a codebook, to manually categorize and structure the text in another 120 issues. [Contribution] The first contribution of this paper is the taxonomy of issue and information types and the second contribution is an in-depth analysis of the natural language data and the communication. This analysis showed, for example, that information with respect to prioritization and scheduling can be found in natural language data, whether the ITS supports such tasks in a structured way or not.</p>
      </abstract>
      <kwd-group>
        <kwd>Issue Tracking Systems</kwd>
        <kwd>Requirements Communication</kwd>
        <kwd>Empirical Study</kwd>
        <kwd>Grounded Method</kwd>
        <kwd>Content Analysis</kwd>
        <kwd>Issue Types</kwd>
        <kwd>Information Types</kwd>
        <kwd>Taxonomy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Most software projects use an issue tracking system (ITS) to support software
engineering (SE) work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In ITS development, bug fixing or refactoring tasks get tracked and
assigned. The information is stored in different data fields like title, descriptions and
comments. Most of these data fields contain unrestricted natural language (NL) text
accompanied with meta data, like user names or timestamps to complement the NL data
(see Figure 1). In contrast to the meta data, the ITS NL data mixes any kind of
information from feature requests or bug reports to rationales, implementation ideas or social
interaction.
      </p>
      <p>Copyright © 2015 by the authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.</p>
      <p>
        If requirements engineering (RE) is practiced ad-hoc, software requirements
documentation and plans may exist, but this information is typically out-of-date [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Therefore, ITS are often the sole means to document, which software features have been
implemented or deployed in a certain software version or which feature still needs to
be scheduled for implementation. However, ITS’s are neither used nor designed as
documentation systems, although they contain valuable information for the software
development process and for retrospective analysis. ITS NL data is typically unstructured
and most of the metadata does not add meaning to the NL. Additionally, the meta data,
that adds semantics, like the common feature vs. bug categorization of issues, is often
unreliable. E.g. Herzig et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found in their study, that about a third of this meta data
is stored wrongly in open-source (OSS) projects.
      </p>
      <p>This suggests that the NL data, that is used in issues, needs to be analyzed more
thoroughly. This paper analyzes ITS NL data and focusses on the following questions
a) what kinds of issues and what information can be found in ITS NL data, b) how are
these information types related to software requirements and c) how do aspects of the
communication differ in different OSS projects?</p>
      <p>The next section describes study design, details on the research questions, the data
collection procedures, and the data analysis procedures. Section 3 presents the results
for each research question. The Sections 4 and 5 describe the threats to validity and
similar studies. Finally, Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>Study Design</title>
      <sec id="sec-2-1">
        <title>Research Questions</title>
        <p>In our research questions we use the concepts of information type and issue type. An
information type describes the information, that is carried by one or more sentences in
ITS NL data (see the rightmost nodes in Figure 4). An issue type is considered a context
or frame for information types (see the leftmost nodes in Figure 4). E.g. the information
type request can be used in different issue types, such as feature request, request for
fixing a bug or request for refactoring. The main study goals are broken down into the
following three research questions:
RQ1 What are the issue types and information types captured in ITS NL data?
RQ2 What is the distribution of different issue types and information types?
RQ3 Are issue types and information types used differently in different projects?
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Case and Subject Selection</title>
        <p>
          We sampled from the following four OSS projects, since information from OSS projects
is easily available. Furthermore, there are large volumes of data and communication
activities stored in ITSs, if people work in a distributed manner [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]:
– c:geo, an Android app to play the real-world treasure hunting game Geocaching
– Lighttpd, a HTTP web server for static and dynamic content delivery
– Radiant, an extensible content management system application
– Redmine, an extensible issue tracking system with web and REST API interfaces
To answer RQ3, we chose projects with very different characteristics (see Table 1).
These characteristics have influence on the NL data. e.g. in lightttpd, issues are more
detailed and technical than in c:geo. We guess that this is due to the different audiences
and the technical nature of a server application.
        </p>
        <p>Software Type
Audience
ITS
ITS Usage
ITS size (in # of issues)
Main programming lang.</p>
        <p>Project size (in LOC)</p>
        <p>We extracted the first 1000 issues3 from each project and divided the issues in three
sets: Is includes all issues with less than the median number of comments per project
(on average 0 2), Im includes all issues with the median to the mean number of
comments (on average 2 5), and Il includes all issues with more than the mean number
of comments (on average 6 ⇠ 70). Most issues fall in the classes Is and Im as shown
in Figure 2.
To develop the taxonomy, 80 issues were randomly drawn from the Im and Is sets. For
further analysis another 120 issues were drawn by two coders, equally over each project
and each set, to make sure that the sample includes all issue sizes and all projects.
Details of the research process are presented in the next section.</p>
        <p>
          To process the data, the General Architecture for Text Engineering (GATE) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] was
used. GATE could be used a) for the grounded approach to calculate inter-rater
agreements, b) to enforce the use of the taxonomy for deeper analysis, and c) for text analysis,
e.g. to retrieve statistical data or to add additional meta data to the text (our own plugins
were used for analysis and the Stanford Parser [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for annotating the english language).
        </p>
        <p>
          A taxonomy of Information Types: One of the major problems in this study, was to
identify the issue types and information types used in the ITS NL data. Although
earlier studies analyzed ITS NL data, they focussed on specific information types, like
discussions [
          <xref ref-type="bibr" rid="ref3 ref6">3,6</xref>
          ] or the categories of issue types [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In contrast, this study tries not to
focus on a certain aspect of ITS NL data, but provides a broader taxonomy of the issue
and information types.
        </p>
        <p>
          Therefore, a grounded technique [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] was used to create a taxonomy of issue and
information types. Four of the authors manually coded all sentences in 20 issues.
During this process, each coder developed his own taxonomy. The only requirements for
the taxonomy development were, that a) issue and information types should be
distinguished, and b) each sentence should be coded. Intentionally, all coders used a
twophase schema. The first phase represents the issue type (e.g. feature-, bug-, or
software development process-related information) and the second the information type
3 Paech et al. suggest that most requirement-related information can be found in early issues [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
(e.g. functionality or quality request, clarification question, or as-is descriptions). To
consolidate the schemas, we created a large schema out of all issue and information
types from the 80 coded issues. This schema included 14 issue types from coders ci
(5c1 + 3c2 + 3c3 + 3c4) and 127 information types (45c1 + 25c2 + 36c3 + 21c4). To
consolidate the schema, all synonyms were merged (e.g. one coder named a sentence
suggested solution, another solution, and a third potential solution). The remaining
information types were discussed and during discussions, the coders found that
information types are sometimes bound to an issue type and sometimes neutral. An example is
the as-is information type. It describes the current status of the software and can be used
for feature-related issue types to describe the context of a new feature, or bug-related
issue types to describe the problematic behavior as it is in a certain software version.
These neutral information types were simply added to all issue types. So the as-is
information type is present in feature- as well as bug-related issue types as shown in the final
taxonomy in Figure 4. After the consolidation, a code book was created. The codebook
was tested on another 4 issues from the Iml sets by each coder with an inter-rater
agreement of 0.9869 (Cohen’s Kappa = 0.4630) for bug-related and 0.8152 (Cohen’s Kappa
= 0.6786) for feature-related codes. The results of this test run, especially differences,
were discussed again. Finally, the descriptions of the codebook were updated, such that
the coders had a common understanding of each issue and information type.
Further Data Analysis: The taxonomy and code book were implemented in GATE and
two of the coders used the schema to code 60 issues, each. To maximize the variance in
issue selection, each coder drew 5 issues from Is, Im and Il for each project. Overall
120 issues and 3167 sentences, as summarized in Table 2 were analyzed.
Project open/closed issues extracted issues analyzed issues analyzed comments1 coded sentences
c:geo
lighttpd
Radiant
Redmine
Sum
3
3.1
        </p>
        <p>Results
1 Each issue consists of one title, one description and multiple comments (C). Therefore, overall C + 60
ITS data fields were analyzed.</p>
      </sec>
      <sec id="sec-2-3">
        <title>RQ1: Information Types and Issue Types in ITS Data</title>
        <p>The taxonomy shown in Figure 4 includes 6 issue types and 28 information types. The
information types ITS Management, Clarification, and Rationale were again split into
subtypes. The white fields Unclear/Other/Unknown are for information, that could not
be classified by the coders.</p>
        <p>Issue Types The following issue types were discovered: 1) Feature-Related –
information, related to a new software feature or software requirements, 2) Bug-Related –
software failures and problems, 3) Refactoring-Related – software changes that
neither affect the functionalities nor the qualities of the software (besides maintainability),
4) SE Process-Related – discussions about the general SE process, e.g. if a developer
notices that tests should be run more frequently in the project or if documentation should
be relocated, 5) User Problem-Related – problems that are not related to software
development, e.g. a user does not understand a configuration file and asks for help – and
6) Not SE-Related – anything, that is not related to software engineering activities, such
as social interaction between developers .</p>
        <p>Information Types For the issue types Feature-Related, Bug-Related and
RefactoringRelated, the taxonomy provides detailed information types. Information types that
reoccur in feature-, bug- and refactoring-related issues are represented with the same color
in Figure 4.</p>
        <p>Issues generally start with a summary (dark blue) in the title. The summary
describes (certain aspects of) the issue and does not form a whole sentence (e.g. “more
flexible bandwidth limiting”). Some bug-related issues start with an as-is clarification
(light rose) in the title to denote what needs to be fixed and some feature-related
issues put the request in the title (e.g. “provide an infrastructure for content-filtering”).
An obvious information type is the request (green) itself. A request can be found in all
software engineering-related issue types and is sometimes accompanied with rationale
arguments (brown), emphasizing why the feature should be implemented. Rationales,
however, are generally given later in a comment of the issue, e.g. when a user notices,
that more support is needed to get a feature requests implemented, or by other users,
who express that the issue is important for them as well. We found rationales that gave
arguments and also ones which simply tried to up-vote an issue (+1). Especially in
features, question-/answer-pairs for clarification (old rose) often occur after the original
feature request. If this happens, more elicitation is needed to understand and implement
the feature request. For bug-related information, we named the clarification phase cause
diagnostics (light green), since the bug descriptions generally did not need clarification,
but the actual cause of the problem had to be found, e.g. by providing reproducibility
information. The as-is status of the software is often used to describe the problem in a
bug. Sometimes even small user stories, consisting of one or two sentences, were given
to clarify or motivate a new feature. Besides understanding the request and
performing the actual implementation, implementation proposals or solution ideas (purple) are
discussed.</p>
        <p>Another common information type is ITS management (yellow), which is used in
bug-, feature- and refactoring-related issues. ITS management describes NL data wrt.
the management of the current issue and is therefore divided in subtypes like
referencing other information, closing the issue, mentioning duplicated issues or changing
attributes of an issue. Interestingly, this information can be handled by the ITS itself
and is therefore not really necessary in the NL data. Some users, however, prefer not
to use the ITS mechanisms and express duplicates or references in the NL data fields.
In contrast to ITS management, SE process-related information was defined as the NL
that discusses the SE process as a whole.</p>
        <p>We did not find much information wrt. to prioritization. Generally, most
scheduling (magenta) is done in a very pragmatic way. E.g. developers comment: “I will look
into this tomorrow” or “We should delay this feature”. On the other hand, information
regarding the implementation status (light blue) is often communicated, e.g. “this was
fixed in update 10” or “I already implemented part X of this issue”.</p>
        <p>Since we were interested in RE aspects of ITS usage, we did not look into the issue
types SE Process-Related, User Problem-Related and Not SE-Related in such detail.
3.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>RQ2: Distribution of Issue Types and Information Types</title>
        <p>Reporting in detail on each information type would go beyond the scope of this paper
since many information types did not show patterns in their distribution. We did,
however, create matrices of all issue and information types. These matrices include, how
issue and information types are distributed in the NL ITS data fields (title, description
and comment c1 . . . cn), combinations of the types, and relations between information
and issue types. This data is available for download4. The following paragraphs report
4 http://www2.inf.fh-bonn-rhein-sieg.de/⇠tmerte2m
on some of the findings from these matrices and focus on feature-related information,
since this is most relevant for RE.</p>
        <p>Issue Types: Table 3 shows how issue types are distributed in each project. The
maximum numbers are printed in bold font. Qualitatively, as expected, most issues include
only bug-related, feature-related, or refactoring-related information. However, in some
issues, aspects of different issue types are discussed. 4 issues include, bug- and
featurerelated information. An example is c:geo issue #365. It first describes the as-is situation
of a bug. In this particular case, a certain color marking (similar to an icon) is missing in
the application. Then, cause diagnostics of the bug are performed and during this
discussion, implementation ideas for new features (e.g. user configurable color markings
and priority handling for markings) are proposed. Although the feature requests are
related to the original bug description, they go beyond the original problem and form new
feature requests. We found this combination only in longer issues (1 ⇢ Im and 3 ⇢ Il),
since the discussion starts about one topic and takes some time to drift into other topics.
In 4 issues the SE process is discussed as a digression of a feature-related discussion.
This combination occurred, when conditions of a certain issue have enough generality
to discuss the overall SE process. However, we did not find a single issue, that
explicitly discusses the SE process only. Another interesting combination are user problems
and feature-related information. It occurred in two ways: Firstly, when a user has a
very specific problem, that actually does not affect software changes and then ideas for
new features pop up. Secondly, if users suggest a feature and it is already implemented.
Then the feature requests turns into a user problem, e.g. how to find a certain checkbox
(e.g. Redmine issue 638). In practice these combinations of issue types in a single issue
suggest that ITSs should offer better refactoring possibilities, e.g. the extraction of a
related issue from one or many comments.</p>
        <p>Surprisingly, Not SE-related information occurs most often and is at the same time
very short (# of sentences). Mostly, because of small acts of politeness between the
stakeholders. For example, users often thank for “the great project” or for a “fast
reaction” on feature requests or bugs. Although we did not explicitly analyze the sentiment
in ITS NL data, no coder found any maleficent social interaction in the ITS.
Information Types: Table 4a shows the 20 issue types occurring most often and their
combinations. Bug-, feature- and refactoring-related overviews are used only in the
title. However, in bugs sometimes the as-is situation is used in the title (16) to describe
the bug, and in features and refactorings, sometimes a request (e.g. “please add
functionality . . . ”) is used instead of a short overview (10 and 4).</p>
        <p>Out of the 51 issues with feature-related information, 18 include clarification
questions and 16 clarification explanations. These information types are used to detail the
feature or according solution ideas. For about 50% of the issues, no further clarification
was needed. In only 6, we found implementation or solution-related information.
Solution ideas were mentioned mostly before any clarification information. One explanation
for this is, that the solution helped to start a discussion. In contrast, almost all bugs (58)
contained cause diagnostics and 28 times technical information, like stack traces or log
files, is added. For 28 bugs, explicit reproducibility information is added.
Issue type
Bug-Related
Feature-Related
Not SE-Related
Refactoring-Related
SE Process-Related
User Problem
Unclear or Unknown</p>
        <p>Sum
(a) Top Combinations of Issue Types
( 2 occurrences)
Other Observations: Some hypotheses evolved during coding, which could partially be
confirmed: Firstly, that bug-related issues contained much more technical information
(e.g. source code, stack traces, and log files) than feature-related issues (H 1). Secondly,
that technical information in feature-related issues is posted later than in bug-related
issues (H 2). Whereas H 1 seems to be true (642 sentences of technical information in
bugs vs. 58 in features), H 2 only partially holds. Although 20 technical sentences could
be found in feature descriptions, none in comments c1 and c2, and most in comments
c3 to c11 up to c22, the situation for bugs was similar: 305 technical sentences in the
description and another 337 in comments c1 to c12. This implies that early and much
technical information may be used as an indication to identify bug-related issues.</p>
        <p>In terms of issue lengths, feature-related and bug-related issues are roughly of the
same size (up to over 30 comments). Another hypothesis was that bug-related issues
are shorter, since they need to be resolved quickly and they generally do not involve so
many users (H 3). Although more bug-related issues (39% 2 Is) than feature-related
issues (25% 2 Is) were very short, the same medium and long issues were found
(25% 2 Iml). However, late comments in feature-related issues are often longer and
include more discussions. Refactoring-related issues are mostly short (no more than 9
comments). We assume that these issues are mostly used as a reminder for the
developers and no further discussion is necessary.</p>
        <p>
          Herzig et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] found that issue types are often classified wrongly. Besides the
issues, that contain multiple issue types, as mentioned above, we found only 1 wrongly
classified issue in the projects using the Redmine ITS. It seems that the number of
correctly labeled issues varies between different projects, since Herzig et al. researched
other projects than this paper. Furthermore, Redmine itself is an ITS, so the users may
have more discipline. In the GitHub based projects, most issues are not categorized at
all, since an issue does not need to be marked as bug or feature. A tag can be assigned
manually, which simply is not done most of the times. So, besides the project, the ITS’s
architecture seems to have influence. In practice, the ITS should be chosen and
customized according to the needed meta data, since optional meta data (e.g. tags) are
often omitted. Furthermore, defaults for meta data fields should be chosen wisely (e.g.
if an issue is categorized as bug per default this may never be changed. We recommend
a neutral category such as “undecided” as default to prevent such problems).
        </p>
        <p>We also analyzed the ITS NL data for keywords (using e.g. the popular Term
Frequency-Inverse Document frequency, TF-IDF, metric) to check, if issues can be
easily categorized. An excerpt of promising keywords is shown in Table 4b. For issue types,
some obvious keywords, e.g. “bug”, “problem” or “feature” can give a strong hint on
the correct issue type, but there is still a chance of false positives. E.g. in Redmine the
keyword “bug” was found in two feature and only one bug-related issue. Furthermore,
the keywords only occur in a minority of the issues so that the recall is also low. For
information types, no keywords could be identified.
3.3</p>
      </sec>
      <sec id="sec-2-5">
        <title>RQ3: Project Differences</title>
        <p>As shown in Table 3, the 30 analyzed issues for each of the projects c:geo, lighttpd, and
radiant contain roughly the same amount of sentences (592 748). The Redmine issues
are significantly larger with 1160 sentences. The Redmine project is also older than the
other projects, so one possible explanation is, that features and bugs in Redmine are
harder to describe due to the project size. Furthermore, some issues in Redmine were
significantly older than in the other projects, therefore another explanation is that old
issues (and especially features, see Table 3) get reactivated after some time and need to
be discussed again (An example can be found in issue #285 comment #27: “Holy cow,
this issue is [. . . ] over six years old, and we’re still asking what the feature means?”.</p>
        <p>Besides different issue sizes, some information types were also used differently. In
the lighttpd project, 51% of all sentences are bug-related technical information.
Redmine and Radiant have around 15% technical information. c:geo on the contrary,
includes only 2% technical sentences, even though in this project, the maximum amount
of sentences (25%) was composed of bug-related cause diagnostics and
reproducibility information. Our hypothesis is, that this is due to the audience and project type.
Lighttpd is a server application and bug- as well as feature issues often include
configuration snippets. Bugs are also reported by technicians who run a server and therefore
stack traces and log files are often included. c:geo on the other hand, is mostly for
ordinary users who want to play the geo caching game. They seem to report bugs as well as
feature requests on a higher level of abstraction and do not want to deal with technical
details. In practice this implies that the content of a (good) feature or bug report largely
depends on the project type and audience.</p>
        <p>Also, scheduling activities, such as prioritization, differ in the projects. In
Radiant and c:geo, there is more talk about scheduling (⇠ 30 sentences) than in lighttpd
(7). Redmine (normalized over all issues) is about in between. We think that the high
amount of explicit scheduling mentioned in the ITS NL data is due to the fact that the
GitHub issue tracker does not provide the same flexible mechanisms for scheduling as
the Redmine ITS does. E.g. in the Redmine and lighttpd projects, the Milestone- and
the Roadmap-Feature of the Redmine ITS are extensively used and issues get
prioritized and scheduled by assigning them to a certain milestone or software version. In
the GitHub based projects, this needs to be communicated in the NL. Still, it can be
observed that in all projects scheduling is mentioned in the NL, although the Redmine
ITS offers extensive scheduling features.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Threats to Validity</title>
      <p>
        Construct Validity: Overall, we used the guidelines of Runeson [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for the analysis.
The taxonomy was created by using a grounded technique [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. To ensure the validity
of the taxonomy, all coders created and discussed a codebook that was used during the
rest of the study. However, the taxonomy is not very fine-grained and may therefore not
be appropriate for other research without modification or addition. The content
analysis was done by two coders without redundancy and inter-rater agreement. We tried
to minimize this threat with a) test issues (with relatively high inter-rater agreement,
considering that every possible NL sentence was coded), b) extensive discussions on
coding and c) a codebook. Furthermore, the coders worked in the same room, so that
they could ask each other if a sentence was unclear. External Validity: We sampled data
from four different OSS projects. These projects represent some characteristics of
software development projects as discussed in Section 2.2. The results can be transferred to
similar project settings. However, we are well aware that we researched only 30 issues
per project due to limited resources. We do not claim that our results are statistically
significant. We think that project factors influencing the use of ITSs and ITS NL data
need to be investigated in depth, to make results transferable between different projects.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        One of the first studies, that includes information types in SE was provided by
Kitchenham et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. They defined an ontology that includes multiple SE aspects, such as
product and process information of software maintenance. The ontology is defined on a
document level and does not dig deeper into the content of these documents. Herzig et
al. present different categories for bug-related issue types [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Their study does,
however, not include issues with multiple issue types and they did not analyze the issues on
a sentence level. Ko et al. analyzed discussions in bug reports [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in depth. They provide
detailed categories for the discussion elements. No study, however, analyzed all ITS NL
data on a per sentence basis.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>This study presented a taxonomy of issue and information types in ITS. It also analyzed
the ITS NL data of 120 issues in depth and presents insight into this analysis. In about
50% of the feature requests, that were analyzed, no further clarification of the request
was needed and also in only about 50% a solution was described. This implies that often
a single sentence or a small description can be sufficient to implement a feature. The
study also found that the information types in ITS NL data are influenced at least by the
project type, audience, and even the technical opportunities of the used ITS. With the
analyzed data of 120 issues, it was almost impossible to identify keywords that can be
used to find information types. Also no clear communication patterns could be found.</p>
      <p>In future we work on methods to (partially) categorize ITS NL data automatically,
e.g. using NL processing methods. Data categorization is a precondition to support SE
development tasks and retrospective analysis of ITS data.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>This work is partly funded by the Bonn-Rhein-Sieg University of Applied Sciences
Graduate Institute.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Text Processing with GATE (Version 6</article-title>
          ). University of Sheffield Department (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ernst</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>G.C.</given-names>
          </string-name>
          :
          <article-title>Case studies in just-in-time requirements analysis</article-title>
          .
          <source>In: 2012 2nd IEEE Intl. Workshop on Empirical RE (EmpiRE)</source>
          . pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          . IEEE (Sep
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fitzgerald</surname>
            ,
            <given-names>C.E.B.</given-names>
          </string-name>
          :
          <article-title>Structured Discussion and Early Failure Prediction in Feature Requests</article-title>
          . Phd, University College London (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Herzig</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Just</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>It's Not a Bug, It's a Feature: How Misclassification Impacts Bug Prediction</article-title>
          .
          <source>In: Proceedings of the 2013 Intl. Conference on Software Engineering (ISCE)</source>
          . pp.
          <fpage>392</fpage>
          -
          <lpage>401</lpage>
          . IEEE Press (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kitchenham</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Travassos</surname>
            ,
            <given-names>G.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayrhauser</surname>
            ,
            <given-names>A.V.O.N.</given-names>
          </string-name>
          :
          <article-title>Towards an Ontology of Software Maintenance</article-title>
          .
          <source>J. of SW Maintenance: Research and Practice</source>
          <volume>389</volume>
          (May),
          <fpage>365</fpage>
          -
          <lpage>389</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ko</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chilana</surname>
            ,
            <given-names>P.K.</given-names>
          </string-name>
          :
          <article-title>Design, discussion, and dissent in open bug reports</article-title>
          .
          <source>In: Proceedings of the 2011 iConference</source>
          . pp.
          <fpage>106</fpage>
          -
          <lpage>113</lpage>
          . ACM Press, New York, New York, USA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Marneffe</surname>
          </string-name>
          , M.D.,
          <string-name>
            <surname>MacCartney</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Generating typed dependency parses from phrase structure parses</article-title>
          .
          <source>In: Proceedings of the Sixth Intl. Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2004</year>
          . Lisbon, Portugal (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Neuendorf</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          :
          <article-title>The Content Analysis Guidebook</article-title>
          .
          <source>SAGE Publications</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Paech</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hubner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merten</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>What are the Features of this Software?</article-title>
          <source>In: ICSEA</source>
          <year>2014</year>
          ,
          <article-title>The Ninth Intl</article-title>
          .
          <source>Conference on Software Engineering Advances</source>
          . pp.
          <fpage>97</fpage>
          -
          <lpage>106</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Runeson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Ho¨st,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rainer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Regnell</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <source>Case Study Research in Software Engineering. Guidelines and Examples</source>
          . Wiley, Hoboken, NJ, USA, 1st edn. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Skerrett</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <source>The Eclipse Foundation: The Eclipse Community Survey</source>
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>