<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Effect of Severity Ratings on Software Developers' Priority of Usability Inspection Results</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Asbjørn Følstad SINTEF ICT Forskningsveien 1 0314</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <volume>24</volume>
      <issue>2008</issue>
      <abstract>
        <p>Knowledge of the factors that affect developers' priority of usability evaluation results is important in order to improve the interplay between usability evaluation and software development. In the presented study, the effect of usability inspection severity ratings on the developers' priority of evaluation results was investigated. The usability inspection results with higher severity ratings were associated with higher developer priority. This result contradicts Sawyer et al. [7], but is in line with Law's [5, 6] finding related to the impact of user test results. The findings serve as a reminder for HCI professionals to focus on high severity issues.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Usability evaluation</kwd>
        <kwd>usability inspection</kwd>
        <kwd>developers' priority</kwd>
        <kwd>impact</kwd>
        <kwd>severity ratings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
usability inspectors’ severity ratings had no effect on the impact
of the evaluation results; reported impact ratios were 72% (low
severity issues), 71% (medium severity issues), 72% (high
severity issues). In contrast to this finding Law [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], in a study
of the impact of user tests, reported a tendency towards higher
severity results having higher impact; reported impact ratios were
26% (minor problems), 42% (moderate problems), 47% (severe
problems). Law’s findings, however, were not statistically
significant [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Hertzum [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] suggested that the effect of severity
classifications may change across the development process, e.g.
high severity evaluation results may have relatively higher impact
in later phases of development. Law’s study was conducted
relatively late in the development process, on the running
prototype of a digital library. Sawyer et al. did not report in which
development phases their usability inspections were conducted.
In order to complement the existing research on the effect of
severity ratings on the impact of evaluation results, an empirical
study of the impact of usability inspection results is presented.
The data of the present study was collected as part of a larger
study reported by Følstad [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], but the results discussed below
have not previously been presented.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RESEARCH PROBLEM AND</title>
    </sec>
    <sec id="sec-3">
      <title>HYPOTHESIS</title>
      <p>The research problem of the present study was formulated as:
What is the effect of usability inspectors’ severity ratings on
developers’ priority of usability inspection results?
The null hypothesis of the study (no effect of severity ratings)
followed the findings of Sawyer et al., and the alternative
hypothesis (H1) was formulated in line with the findings
presented by Law:
H1: High severity issues will tend to be prioritized higher by
developers than low severity issues.</p>
    </sec>
    <sec id="sec-4">
      <title>3. METHOD</title>
      <p>
        Usability inspections were conducted as group-based expert
walkthroughs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The objects of evaluation were three mobile
work-support systems for medical personnel at hospitals,
politicians and political advisors, and parking wardens
respectively. All systems were in late phases of development,
running prototypes close to market. The usability inspectors were
13 HCI professionals, all with &gt;1 year work experience (Mdn=5
years)1. Each inspector participated in one of three evaluation
groups, one group for each object of evaluation. The
walkthroughs were conducted as two-stage processes where (1)
the individual evaluators noted down usability issues (usability
problems and change suggestions) and (2) a common set of
usability issues were agreed on in the group. All usability issues
were to be classified as either Critical (will probably stop typical
users in using the application to solve the task), Serious (will
probably cause serious delay for typical users …), or Cosmetic
(will probably cause minor delay …). The output of the usability
inspections was one report for each object of evaluation, delivered
to each of the three development teams respectively.
      </p>
      <p>Three months after the evaluation reports had been delivered
individual interviews were conducted with development team
representatives. The representatives were requested to prioritize
all usability issues according to the following: High (change has
already been done, or will be done no later than six months after
receiving the evaluation report), Medium (change is relevant but
will not be prioritized the first six months), Low (change will not
be prioritized), Wrong (the item is perceived by the developer to
be a misjudgment). In order to align the resulting developers’
priorities with the impact ratio definitions of Law and Sawyer et
al., the priority High was recoded as ”Change”, and the priorities
Medium, Low and Wrong were recoded as “No change”.</p>
    </sec>
    <sec id="sec-5">
      <title>4. RESULTS</title>
      <p>The evaluation groups generated totally 167 usability issues. The
three objects of evaluation were associated with 44, 61, and 62
usability issues respectively. The total impact ratio (number of
issues associated with change/total number of issues [following 7
and 6]) was 27%, which is relatively low. The relationship
between the developers’ priorities and the usability inspectors’
severity ratings is presented in Table 1.
Visual inspection of Table 1 shows a tendency towards higher
priority given to usability issues with severity ratings serious and
critical. A Pearson Chi-Square test showed statistically significant
differences in priority between severity rating groups; X2=14.446,
df=3, p(one-sided)=.001.</p>
    </sec>
    <sec id="sec-6">
      <title>5. DISCUSSION</title>
      <p>The presented results indicate that severity ratings may have
significant impact on developers’ priority of results from usability
1 The study reported by Følstad also included separate evaluation
groups with work-domain experts. The results of these groups
were not included in the current study, in order to make a
clearcut comparison with the findings of Law and Sawyer et al.
inspections. This finding contributes to our understanding of
severity ratings as a characteristic of usability evaluation results
that may help to identify which usability evaluation results that
are needed in software development.</p>
      <p>The finding is particularly interesting since it contradicts the
conclusions of Sawyer et al. and therefore may provoke necessary
rethinking regarding usability inspectors ability to provide
severity assessments that are useful to software engineers.
It is also interesting to note that the results are fully in line with
Law’s findings related to severity ratings of user test results. The
present study may thus serve to strengthen Law’s conclusions.
Curiously, the impact ratios of the different severity levels in
Law’s study and the present study are close to being identical.
Why, then, do the present study indicate that the severity ratings
of usability inspection results may have an effect on the
developers’ priority, when Sawyer et al. did not find a similar
effect? One reason may be the relatively high impact ratios
reported by Sawyer et al., something that may well result in a
greater proportion of low severity issues being prioritized.
Another reason may be that the present study, as the study of
Law, favored high severity evaluation results since the usability
evaluations were conducted relatively late in the development
process [cf. 3]. Sawyer et al. do not state which development
phases their usability inspections were associated with, but their
relatively high impact ratios suggest that their inspections
possibly may have been conducted in earlier project phases.
The present study, as the study of Law, indicates that the
identification of a low severity usability issue typically is of less
value to software developers than the identification of a high
severity issue. This should serve as a reminder for HCI
professionals to spend evaluation resources on identification and
communication of higher severity usability issues.</p>
    </sec>
    <sec id="sec-7">
      <title>6. ACKNOWLEDGMENTS</title>
      <p>This paper has been written as part of the RECORD project,
supported by the VERDIKT program of the Norwegian Research
Council.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Følstad</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Group-based Expert Walkthrough</article-title>
          . In: D.
          <string-name>
            <surname>Scapin</surname>
          </string-name>
          , and E.L.-C. Law, Eds.
          <source>R3UEMs: Review, Report and Refine Usability Evaluation Methods. Proceedings of the 3rd. COST294-MAUSE International Workshop</source>
          ,
          <volume>58</volume>
          -
          <fpage>60</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Følstad</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Work-Domain Experts as Evaluators: Usability Inspection of Domain-Specific Work-Support Systems</article-title>
          .
          <source>International Journal of Human-Computer Interaction</source>
          <volume>22</volume>
          (
          <issue>3</issue>
          ),
          <fpage>217</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hertzum</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Problem Prioritization in Usability Evaluation: From Severity Assessments Toward Impact on Design</article-title>
          .
          <source>International Journal of Human-Computer Interaction</source>
          ,
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <fpage>125</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>John</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Marks</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Tracking the effectiveness of usability evaluation methods</article-title>
          .
          <source>Behaviour &amp; Information Technology</source>
          ,
          <volume>16</volume>
          ,
          <fpage>188</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>E. L.-C.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>A Multi-Perspective Approach to Tracking the Effectiveness of User Tests: A Case Study</article-title>
          .
          <source>In Proceedings of the NordiCHI Workshop on Improving the Interplay Between Usability Evaluation and User Interface Design</source>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hornbaek</surname>
          </string-name>
          , and J. Stage, Eds. University of Aalborg,
          <source>HCI Lab Report no. 2004/2</source>
          ,
          <fpage>36</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>E. L.-C.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Evaluating the Downstream Utility of User Tests and Examining the Developer Effect: A Case</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sawyer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flanders</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wixon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>1996</year>
          .
          <article-title>Making a Difference - The Impact of Inspections</article-title>
          .
          <source>In Proceedings of the CHI'96 Conference on Human Factors in Computing Systems</source>
          ,
          <volume>376</volume>
          -
          <fpage>382</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>