<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>What Makes a Good Empirical Software Engineering Thesis?: Some Advice</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sira Vegas</string-name>
          <email>svegas@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>- An empirical software engineering (ESE) PhD thesis has some special features, which makes it slightly different from a thesis any other different field. One of the differences between the two is the intensive use of empirical studies in an ESE dissertation. This talk starts by giving students advice on what makes a good ESE PhD thesis in the form of a list of do's and don'ts. The keynote later discusses what different empirical studies can be used (surveys, case studies and experiments). Finally, it focuses on one specific type of empirical study: controlled experiments. Experimentation is a risky business, and software engineering (SE) has some special features, leading to some experimentation issues being conceived of differently than in other disciplines. Some advice is given on how to analyse SE experiments.</p>
      </abstract>
      <kwd-group>
        <kwd>empirical software engineering</kwd>
        <kwd>PhD thesis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. WHAT IS AN EMPIRICAL SOFTWARE ENGINEERING THESIS?</p>
      <p>A few years ago, PhD theses did not use to include
empirical validation. However, times are changing, and,
nowadays, any PhD thesis must include at least some kind of
empirical validation.</p>
      <p>This does not mean, however, that any PhD thesis with an
empirical component is an empirical software engineering
(ESE) thesis. The key characteristic of ESE theses is that they
have a major empirical component.</p>
      <p>There are different types of ESE PhD theses. There is no
formal typology, but, if we look at the ESE PhD theses written
over the last 20 years, two main types stand out: 1) theses
gathering knowledge about a specific topic by means of
empirical studies, and 2) theses proposing methodological
advances in ESE.</p>
    </sec>
    <sec id="sec-2">
      <title>A. Theses Gathering Knowledge about a Specific Topic</title>
      <p>Examples of such theses are (in chronological order):</p>
      <p>
        Seaman [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] conducts an empirical study whose goal is
to characterize certain aspects of communication among
members of a software development organization.
      </p>
      <p>
        Shull [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] runs a series of experiments to develop a
body of knowledge on reading techniques for
inspections.
      </p>
      <p>
        Thelin [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] reports a series of experiments on a reading
technique called usage-based reading.
      </p>
      <p>Copyright © 2015 for this paper by its authors. Copying permitted for
private and academic purposes.</p>
      <p>
        Carver [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] studies the impact of an inspector’s
characteristics (background and experience) on his or
her effectiveness in a software inspection.
      </p>
      <p>
        The thesis may, in some cases, take in methodological
aspects. For example, Seaman addresses the problem of how to
analyse qualitative data. Shull tackles the problem of
synthesizing the results of the different studies run. In other
cases, techniques that have not been used in SE before are
applied. For example, Carver uses grounded theory [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>B. Theses Proposing Methodological Advances in ESE</title>
      <p>Examples of such theses are (in chronological order):</p>
      <p>
        Daly [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposes a multi-method approach to empirical
research, which, when integrated with the technique of
replication, outputs more reliable and generalizable
results.
      </p>
      <p>
        Ciolkowski [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposes an approach for the
quantitative aggregation of evidence from controlled
experiments in software engineering (SE).
• Jedlitschka [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] deals with the problem of reporting the
results in SE experiments so that they are useful for
software managers for decision making.
      </p>
      <p>
        Solari [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] addresses which contents a laboratory
package for running SE experiments should have.
      </p>
      <p>
        Gómez [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposes a taxonomy for replications in
SE. The taxonomy is used as a driver to plan the order
in which replications should be run and how their
results should be aggregated.
      </p>
      <p>In all cases, empirical studies are run (or used) as a means
to develop and validate the research performed.</p>
      <p>II. DO’S AND DON’TS</p>
      <p>Irrespective of the thesis type, some general guidelines can
be established around three key issues that a PhD thesis should
address: tackled problem, research method used and
publication.</p>
    </sec>
    <sec id="sec-4">
      <title>Regarding the definition of the problem:</title>
      <p>the problem.</p>
      <p>DO clearly specify what problem you are tackling.</p>
    </sec>
    <sec id="sec-5">
      <title>Regarding the scope of the problem:</title>
      <sec id="sec-5-1">
        <title>DON´T try to solve a huge problem. DO define a problem with a scope that is reasonable for the time frame of a PhD thesis (your advisor will help with this).</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Regarding the importance of the problem:</title>
      <p>DON´T think that you believing that the problem is
important is enough.</p>
      <p>DO objectively assess the importance of the problem so
that you can establish that the problem exists and that it
is important. There are several ways to do this. One way
is by citing other authors that state that the problem is
important. Another is by citing numbers taken from a
reliable source (for example, “the testing process needs
improvement as the process in place fails to detect X%
of the defects before the software is delivered to the
users”).</p>
      <p>DON´T take it for granted that people will trust you
when you say that the problem has not yet been solved.
DO demonstrate that your work contributes to the
advancement of the state of the art/practice (systematic
literature reviews might be helpful here).</p>
    </sec>
    <sec id="sec-7">
      <title>B. Research Method Regarding the selection of the research method:</title>
      <p>•
•
•
•
•
•
•
•
•
•
•
•
•</p>
      <p>DON´T postpone publication until you have finished
your PhD thesis. Although this was the standard
approach years ago, it does not work like that anymore.
Some universities require you to have published at least
a conference or journal paper before the defence of your
thesis.</p>
      <p>DO try to publish results as early as possible (the state
of the art could be a good choice). Of course, this does
not mean that you should publish non-conclusive
results.</p>
      <p>DON´T think that publishing will be a waste of time
that might be better spent on advancing in your
research.</p>
      <p>DO consider other options for writing your thesis. Some
universities accept PhD thesis formats other than the
traditional dissertation. For example, each thesis
chapter is styled as a paper. This will speed up the
process of writing your thesis and will be an incentive
for publishing.</p>
      <p>
        There are different types of empirical studies [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]: surveys,
case studies, controlled experiments and quasi-experiments.
All of them can be used in an ESE thesis:
      </p>
      <p>
        A survey is a method for collecting information from or
about people to describe, compare or explain their
knowledge, attitudes and behaviour [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        A case study is an empirical study that draws on
multiple sources of evidence to investigate one instance
(or a small number of instances) of a contemporary SE
phenomenon within its real-life context, especially
when the boundary between phenomenon and context
cannot be clearly specified [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        A controlled experiment (or simply experiment) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is
an investigation that establishes a particular set of
circumstances (treatments) under a specified protocol
established and controlled by the investigator- to
observe and evaluate implications of the resulting
observations (dependent variables). SE works with
comparative experiments, which implies: 1) the
establishment of more than one treatment, and 2)
responses resulting from the differing treatments are
compared with one another [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The purpose of a
controlled experiment is to identify causal inference.
A quasi-experiment is an experiment where the
assignment of treatments to experimental units
(subjects) has not been randomized [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Assignment is
made by means of self-selection (units choose treatment
for themselves) or administrator selection (researchers
decide which subject should get which treatment).
      </p>
      <p>DON´T start working until you have sketched your
research method. This will save you from wasting time.
DO explain and properly justify the research method
that you have chosen (remember that ESE PhD thesis
have a very strong empirical component; therefore, your
research method should be empirical).</p>
      <sec id="sec-7-1">
        <title>Regarding the research plan:</title>
      </sec>
      <sec id="sec-7-2">
        <title>DON´T do uncontrolled research. DO draw up a research plan. This will help you to apply your method, and keep track of possible deviations in contents and time.</title>
      </sec>
      <sec id="sec-7-3">
        <title>Regarding the evaluation/validation method: DON´T forget that you need to evaluate/validate your proposal. Different types of thesis require different types of evaluation/validation.</title>
        <p>DO choose the empirical study that best fits your
research.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Regarding the fact that the problem has not yet been solved:</title>
      <sec id="sec-8-1">
        <title>III. TYPES OF EMPIRICAL STUDIES</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>C. Publication</title>
    </sec>
    <sec id="sec-10">
      <title>Regarding dissemination of results:</title>
      <p>
        According to Pfleeger [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Wohlin et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], several
factors should be taken into consideration when deciding the
type of empirical study to be used: 1) how much control the
experimenter has over the study; 2) the degree to which the
researcher can decide which measures are to be collected; 3)
the cost of the investigation and; 4) the easiness of replicating
the investigation. TABLE I [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] shows how these factors vary
with each empirical study.
      </p>
      <sec id="sec-10-1">
        <title>IV. ISSUES WHEN ANALYZING EXPERIMENTS Controlled experiments are very common in SE today. However, this is a challenging error-prone activity. Some common pitfalls that should be avoided are next discussed.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>A. One- vs. Two-Tailed Tests</title>
      <p>Using one-tailed tests implies predicting the direction of the
effect. One-tailed tests are more powerful than two-tailed tests
(we need a smaller test statistic to find a significant result).
However, if the result of a one-tailed test is in the opposite
direction to what you expected, you cannot reject the null
hypothesis, and you will have to disregard the result.</p>
    </sec>
    <sec id="sec-12">
      <title>B. Matching Data Analysis and Experimental Design</title>
      <p>Data analysis is driven by the experimental design. Issues
such as the scale used to measure the treatments and dependent
variables, the number of factors and whether the experiment
has a between- or within–subjects design, will determine the
particular data analysis technique to be applied.</p>
      <p>However, the choice of data analysis technique and/or
statistical model is sometimes not straightforward. Complex
designs may require the addition of some extra factors (and
possibly interactions) to the statistical model. Take, for
example, designs with blocking variables; the blocking
variables and their interactions with treatments have to be
added as factors to the statistical model. Another example are
crossover designs; the order in which subjects apply treatments
(sequences) and the times at which each treatment is applied
(periods) have to be added to the analysis as factors.</p>
    </sec>
    <sec id="sec-13">
      <title>C. What to Do when Test Assumptions Are Not Met</title>
      <p>Parametric tests are more powerful than non-parametric
tests and are capable of analysing several factors and their
interactions. But the data do not always meet the parametric
tests assumptions (typically normality and/or homogeneity of
variances). However, data transformation and robust tests are
an alternative to non-parametric tests.</p>
    </sec>
    <sec id="sec-14">
      <title>D. Effect Size</title>
      <p>Statistical significance measures whether the observed
effect is the result of treatments or sampling error. It gives no
indication of how big the difference in treatments is. For
relatively large sample sizes, even very small differences in
treatments may be statistically significant. If we want to know
whether the differences between treatments are large enough to
be of practical importance, we need a measure of effect size.</p>
      <p>There are different types of effect size measures.</p>
    </sec>
    <sec id="sec-15">
      <title>E. Power Analysis.</title>
      <p>A priori power analysis is used before the experiment is run
to calculate the minimum sample size required for detecting an
effect of a given size. Of course, a bigger sample size will be
needed to detect small effects than medium or large effects.</p>
      <p>Post-hoc power analysis determines the power of a given
study assuming that the effect size of the sample is equal to the
population. While the utility of a priori power analysis is
universally accepted, the usefulness of post-hoc power analysis
is controversial (it is a function of the statistical significance).</p>
      <sec id="sec-15-1">
        <title>ACKNOWLEDGMENT</title>
        <p>Research funded by the Spanish Ministry of Economy and
Competitiveness research grant TIN2014-60490-P.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carver</surname>
          </string-name>
          .
          <source>The Impact of Background and Experience on Software Inspections</source>
          ,
          <source>PhD Thesis</source>
          . Department of Computer Science, University of Maryland,
          <source>Technical Report CS-TR-4476</source>
          ,
          <year>April 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciolkowski</surname>
          </string-name>
          .
          <article-title>An Approach for Quantitative Aggregation of Evidence from Controlled Experiments in Software Engineering</article-title>
          .
          <source>PhD Theses in Experimental Software Engineering</source>
          , Vol.
          <volume>42</volume>
          . Fraunhofer Verlag,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.W.</given-names>
            <surname>Daly</surname>
          </string-name>
          .
          <article-title>Replication and a Multi-Method Approach to Empirical Softwre Engineering Research</article-title>
          ,
          <source>PhD Thesis</source>
          . Department of Computer Science, University of Strathclyde,
          <year>March 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fink</surname>
          </string-name>
          .
          <article-title>The Survey Handbook, 2nd edition</article-title>
          . SAGE,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.G.</given-names>
            <surname>Glaser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.L.</given-names>
            <surname>Strauss</surname>
          </string-name>
          .
          <article-title>The Discovery of Grounded Theory: Strategies for Qualitative Research</article-title>
          . Aldine de Gruyter,
          <year>1967</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jedlitschka</surname>
          </string-name>
          .
          <article-title>An Empirical Model of Software Managers. Information Needs for Software Engineering Technology Selection</article-title>
          .
          <source>PhD Theses in Experimental Software Engineering</source>
          , Vol.
          <volume>28</volume>
          . Fraunhofer Verlag,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.O.</given-names>
            <surname>Kuehl</surname>
          </string-name>
          .
          <article-title>Design of Experiments: Statistical Principles of Research Design and Analysis, 2nd edition</article-title>
          . Brooks/Cole Cengage Learning,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.L.</given-names>
            <surname>Pfleeger</surname>
          </string-name>
          .
          <article-title>Experimental design and analysis in software enginering part 1-5</article-title>
          . ACM Sigsoft Software Engineering Notes,
          <volume>19</volume>
          (
          <issue>4</issue>
          ):
          <fpage>16</fpage>
          -
          <lpage>20</lpage>
          ,
          <issue>20</issue>
          (
          <issue>1</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>26</lpage>
          ,
          <issue>20</issue>
          (
          <issue>2</issue>
          ):
          <fpage>14</fpage>
          -
          <lpage>16</lpage>
          ,
          <issue>20</issue>
          (
          <issue>3</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ,
          <issue>20</issue>
          (
          <issue>5</issue>
          ):
          <fpage>14</fpage>
          -
          <lpage>17</lpage>
          .
          <fpage>1994</fpage>
          -
          <lpage>1995</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Höst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.W.</given-names>
            <surname>Rainer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Regnell</surname>
          </string-name>
          .
          <source>Case Study Research in Software Engineering. Guidelines and Exmples</source>
          . Wiley,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.S.</given-names>
            <surname>Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Juristo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vegas</surname>
          </string-name>
          .
          <source>Understanding Replication of Experiments in Software Engineering: A Classification. Information and Software Technology</source>
          ,
          <volume>56</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1033</fpage>
          -
          <lpage>1048</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Seaman</surname>
          </string-name>
          .
          <source>Organizational Issues in Software Development: An Empirical Study of Communication</source>
          ,
          <source>PhD Thesis</source>
          . Department of Computer Science, University of Maryland,
          <source>Technical Report CS-TR3726</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.R.</given-names>
            <surname>Shadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.D.</given-names>
            <surname>Cook</surname>
          </string-name>
          , D.T. Campbell.
          <article-title>Experimental and Quasiexperimental Designs for Generalized Causal Inference, 2nd edition</article-title>
          .
          <source>Cengage Learning</source>
          , Inc.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.J.</given-names>
            <surname>Shull</surname>
          </string-name>
          .
          <article-title>Developing Techniques for Using Software Documents: A Series of Empirical Studies</article-title>
          .
          <source>Ph.D. Thesis</source>
          . Department of Computer Science, University of Maryland.
          <source>AAI9921012</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Solari</surname>
          </string-name>
          .
          <article-title>Identifying Experimental Incidents in Software Engineering Replications</article-title>
          .
          <source>In Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM'13)</source>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>222</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Thelin</surname>
          </string-name>
          .
          <article-title>Empirical Evaluations of Usage-Based Reading and Fault Content Estimation for Software Inspections</article-title>
          ,
          <source>PhD Thesis</source>
          . Department of Communication Systems. Lund Institute of Technology,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wohlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Höst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.C.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Regnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wesslén</surname>
          </string-name>
          . Experimentation in Software Engineering. Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>