The Effect of Severity Ratings on Software Developers’
             Priority of Usability Inspection Results
                                                                 Asbjørn Følstad
                                                                    SINTEF ICT
                                                                  Forskningsveien 1
                                                                 0314, Oslo, Norway
                                                                   +47 22067515
                                                                   asf@sintef.no

ABSTRACT                                                                     usability inspectors’ severity ratings had no effect on the impact
Knowledge of the factors that affect developers’ priority of                 of the evaluation results; reported impact ratios were 72% (low
usability evaluation results is important in order to improve the            severity issues), 71% (medium severity issues), 72% (high
interplay between usability evaluation and software development.             severity issues). In contrast to this finding Law [5, 6], in a study
In the presented study, the effect of usability inspection severity          of the impact of user tests, reported a tendency towards higher
ratings on the developers’ priority of evaluation results was                severity results having higher impact; reported impact ratios were
investigated. The usability inspection results with higher severity          26% (minor problems), 42% (moderate problems), 47% (severe
ratings were associated with higher developer priority. This result          problems). Law’s findings, however, were not statistically
contradicts Sawyer et al. [7], but is in line with Law’s [5, 6]              significant [5]. Hertzum [3] suggested that the effect of severity
finding related to the impact of user test results. The findings             classifications may change across the development process, e.g.
serve as a reminder for HCI professionals to focus on high                   high severity evaluation results may have relatively higher impact
severity issues.                                                             in later phases of development. Law’s study was conducted
                                                                             relatively late in the development process, on the running
                                                                             prototype of a digital library. Sawyer et al. did not report in which
Categories and Subject Descriptors                                           development phases their usability inspections were conducted.
H5.m.Information interfaces and presentation (e.g., HCI):
Miscellaneous.                                                               In order to complement the existing research on the effect of
                                                                             severity ratings on the impact of evaluation results, an empirical
                                                                             study of the impact of usability inspection results is presented.
Keywords                                                                     The data of the present study was collected as part of a larger
Usability evaluation, usability inspection, developers’ priority,            study reported by Følstad [2], but the results discussed below
impact, severity ratings.                                                    have not previously been presented.

1. INTRODUCTION                                                              2. RESEARCH PROBLEM AND
One important indicator of successful interplay between usability            HYPOTHESIS
evaluation and software development is the extent to which                   The research problem of the present study was formulated as:
evaluation results are associated with subsequent changes in the
system under development. This indicator, termed the “impact”                What is the effect of usability inspectors’ severity ratings on
[7] or “persuasive power” [4] of usability evaluation results, may           developers’ priority of usability inspection results?
reflect whether or not a usability evaluation has generated results          The null hypothesis of the study (no effect of severity ratings)
that are needed in the development process.                                  followed the findings of Sawyer et al., and the alternative
Problem severity is a characteristic of usability evaluation results         hypothesis (H1) was formulated in line with the findings
that has been suggested to affect the impact of usability                    presented by Law:
evaluation results. There is, however, divergence in the literature          H1: High severity issues will tend to be prioritized higher by
regarding the actual effect of severity ratings on developers’               developers than low severity issues.
prioritizing of usability evaluation results. Sawyer et al.’s [7]
study of the impact of usability inspection results indicated that           3. METHOD
                                                                             Usability inspections were conducted as group-based expert
                                                                             walkthroughs [1]. The objects of evaluation were three mobile
 Permission to make digital or hard copies of all or part of this work for
                                                                             work-support systems for medical personnel at hospitals,
 personal or classroom use is granted without fee provided that copies are
 not made or distributed for profit or commercial advantage and that         politicians and political advisors, and parking wardens
 copies bear this notice and the full citation on the first page. To copy    respectively. All systems were in late phases of development,
 otherwise, or republish, to post on servers or to redistribute to lists,    running prototypes close to market. The usability inspectors were
 requires prior specific permission and/or a fee.                            13 HCI professionals, all with >1 year work experience (Mdn=5

 I-USED’08, September 24, 2008, Pisa, Italy
years) 1 . Each inspector participated in one of three evaluation       inspections. This finding contributes to our understanding of
groups, one group for each object of evaluation. The                    severity ratings as a characteristic of usability evaluation results
walkthroughs were conducted as two-stage processes where (1)            that may help to identify which usability evaluation results that
the individual evaluators noted down usability issues (usability        are needed in software development.
problems and change suggestions) and (2) a common set of                The finding is particularly interesting since it contradicts the
usability issues were agreed on in the group. All usability issues      conclusions of Sawyer et al. and therefore may provoke necessary
were to be classified as either Critical (will probably stop typical    rethinking regarding usability inspectors ability to provide
users in using the application to solve the task), Serious (will        severity assessments that are useful to software engineers.
probably cause serious delay for typical users …), or Cosmetic
(will probably cause minor delay …). The output of the usability        It is also interesting to note that the results are fully in line with
inspections was one report for each object of evaluation, delivered     Law’s findings related to severity ratings of user test results. The
to each of the three development teams respectively.                    present study may thus serve to strengthen Law’s conclusions.
                                                                        Curiously, the impact ratios of the different severity levels in
Three months after the evaluation reports had been delivered            Law’s study and the present study are close to being identical.
individual interviews were conducted with development team
representatives. The representatives were requested to prioritize       Why, then, do the present study indicate that the severity ratings
all usability issues according to the following: High (change has       of usability inspection results may have an effect on the
already been done, or will be done no later than six months after       developers’ priority, when Sawyer et al. did not find a similar
receiving the evaluation report), Medium (change is relevant but        effect? One reason may be the relatively high impact ratios
will not be prioritized the first six months), Low (change will not     reported by Sawyer et al., something that may well result in a
be prioritized), Wrong (the item is perceived by the developer to       greater proportion of low severity issues being prioritized.
be a misjudgment). In order to align the resulting developers’          Another reason may be that the present study, as the study of
priorities with the impact ratio definitions of Law and Sawyer et       Law, favored high severity evaluation results since the usability
al., the priority High was recoded as ”Change”, and the priorities      evaluations were conducted relatively late in the development
Medium, Low and Wrong were recoded as “No change”.                      process [cf. 3]. Sawyer et al. do not state which development
                                                                        phases their usability inspections were associated with, but their
                                                                        relatively high impact ratios suggest that their inspections
4. RESULTS                                                              possibly may have been conducted in earlier project phases.
The evaluation groups generated totally 167 usability issues. The
three objects of evaluation were associated with 44, 61, and 62         The present study, as the study of Law, indicates that the
usability issues respectively. The total impact ratio (number of        identification of a low severity usability issue typically is of less
issues associated with change/total number of issues [following 7       value to software developers than the identification of a high
and 6]) was 27%, which is relatively low. The relationship              severity issue. This should serve as a reminder for HCI
between the developers’ priorities and the usability inspectors’        professionals to spend evaluation resources on identification and
severity ratings is presented in Table 1.                               communication of higher severity usability issues.
      Table 1. Usability issues distributed across developers’
       priorities and usability inspectors’ severity ratings            6. ACKNOWLEDGMENTS
                                                                        This paper has been written as part of the RECORD project,
                   Not                                                  supported by the VERDIKT program of the Norwegian Research
                              Cosmetic      Serious       Critical
                Classified                                              Council.
    Change          6             9            18            12
      No                                                                7. REFERENCES
                   46            31            26            16         [1] Følstad, A. 2007. Group-based Expert Walkthrough. In: D.
    change
    Impact                                                                  Scapin, and E.L.-C. Law, Eds. R3UEMs: Review, Report
                  12%           23%           41%           43%             and Refine Usability Evaluation Methods. Proceedings of the
     ratio
                                                                            3rd. COST294-MAUSE International Workshop, 58-60.
Visual inspection of Table 1 shows a tendency towards higher            [2] Følstad, A. 2007. Work-Domain Experts as Evaluators:
priority given to usability issues with severity ratings serious and        Usability Inspection of Domain-Specific Work-Support
critical. A Pearson Chi-Square test showed statistically significant        Systems. International Journal of Human-Computer
differences in priority between severity rating groups; X2=14.446,          Interaction 22(3), 217-245.
df=3, p(one-sided)=.001.                                                [3] Hertzum, M. 2007. Problem Prioritization in Usability
                                                                            Evaluation: From Severity Assessments Toward Impact on
5. DISCUSSION                                                               Design. International Journal of Human-Computer
The presented results indicate that severity ratings may have               Interaction, 21(2), 125–146.
significant impact on developers’ priority of results from usability
                                                                        [4] John, B.E., and Marks, S.J. 1997. Tracking the effectiveness
                                                                            of usability evaluation methods. Behaviour & Information
1                                                                           Technology, 16, 188–202.
    The study reported by Følstad also included separate evaluation
    groups with work-domain experts. The results of these groups        [5] Law, E. L.-C. 2004. A Multi-Perspective Approach to
    were not included in the current study, in order to make a clear-       Tracking the Effectiveness of User Tests: A Case Study. In
    cut comparison with the findings of Law and Sawyer et al.               Proceedings of the NordiCHI Workshop on Improving the
    Interplay Between Usability Evaluation and User Interface       Study. International Journal of Human-Computer Interaction,
    Design, K. Hornbæk, and J. Stage, Eds. University of            21(2), 147-172.
    Aalborg, HCI Lab Report no. 2004/2, 36-40.                  [7] Sawyer, P., Flanders, A., Wixon, D. 1996. Making a
[6] Law, E. L.-C. 2006. Evaluating the Downstream Utility of        Difference - The Impact of Inspections. In Proceedings of the
    User Tests and Examining the Developer Effect: A Case           CHI’96 Conference on Human Factors in Computing
                                                                    Systems, 376–382.