=Paper= {{Paper |id=Vol-407/paper-8 |storemode=property |title=Problems of Consolidating Usability Problems |pdfUrl=https://ceur-ws.org/Vol-407/paper8.pdf |volume=Vol-407 |dblpUrl=https://dblp.org/rec/conf/iused/LawH08 }} ==Problems of Consolidating Usability Problems== https://ceur-ws.org/Vol-407/paper8.pdf
                Problems of Consolidating Usability Problems
                   Effie Lai-Chong Law                                                       Ebba Thora Hvannberg
            University of Leicester/ ETH Zürich                                                    University of Iceland
             LE1 7RH Leicester/ Institut TIK                                                         107 Reykjavik
                      UK/Switzerland                                                                     Iceland
                     +44 116 2717302                                                                 +354 525 4702
                    law@tik.ee.ethz.ch                                                                 ebba@hi.is

ABSTRACT
The process of consolidating usability problems (UPs) is an                  One concomitant procedure of involving multiple users/analysts
integral part of usability evaluation involving multiple                     in usability evaluation is to consolidate UPs identified by
users/analysts. However, little is known about the mechanism of              different users/analysts to produce a master list. Such a
this process and its effects on evaluation outcomes, which                   consolidation process can serve two purposes: (i) providing a
presumably influence how developers redesign the system of                   design team with neat and clean information to facilitate system
interest. We conducted an exploratory research study with ten                redesign, and (ii) enhancing the validity of comparing the
novice evaluators to examine how they performed when merging                 effectiveness of different (instances of) usability evaluation
UPs in the individual and collaborative setting and how they drew            methods (UEMs). This process consists of two phases [1]: The
consensus. Our findings indicate that collaborative merging                  first step is known as filtering, that is, to eliminate duplicates
causes the absolute number of UPs to deflate, and concomitantly              within a list of UPs identified by a user when performing a certain
the frequency of certain UP types as well as their severity ratings          task with the system under scrutiny or by an analyst when
to inflate excessively. It can be attributed to the susceptibility of        inspecting it. The second step is merging, that is, to combine UPs
novice evaluators to persuasion in a negotiation setting, and thus           between different lists identified by multiple users/analysts, to
they tended to aggregate UPs leniently. Such distorted UP                    retain unique, relevant ones, and to discard unique, irrelevant
attributes may mislead the prioritization of UPs for fixing and              ones. While such consolidation procedures are commonly
thus result in ineffective system redesign.                                  practised by usability professionals and researchers, little is
                                                                             known about how it is exactly done and what impact it can have
Categories and Subject Descriptors                                           on final evaluation outcomes and eventually on system redesigns,
H.5.2 [User Interfaces]: Evaluation/Methodology                              especially when severity ratings play a non-trivial role in the
                                                                             prioritization strategy for UP fixing ([2], [3]).

General Terms                                                                In the HCI literature, the UP consolidation procedure is mostly
Measurement, Performance, Experimentation, Theory                            described at a coarse-grained level. Nielson [9], when addressing
                                                                             the issue of multiple users/analysts, highlighted the significance
                                                                             of merging different UP lists, but he did not specify how this
Keywords                                                                     should be done. Connell and Hammond [1], in comparing the
Usability problems, Merging, Filtering, Consensus building,                  effectiveness of different UEMs, delineated the merging
Downstream utility, Severity, Confidence, Evaluator effect                   procedure at a rather abstract level. Further, Hertzum and
                                                                             Jacobsen [4] coined the notion of evaluator effect that has drawn
1. INTRODUCTION                                                              much attention from the HCI community towards the reliability
The extent to which UPs identified by different users/analysts               and validity issues of usability evaluation. Nonetheless, their
overlap seems unpredictable, despite the persistent research                 work focused on problem extraction on an individual basis rather
efforts of formalizing the cumulative relation between the                   than problem merging on a collaborative basis. More recently, a
numbers of users/analysts and UPs ([7], [8], [10]). The practical            tool for merging and grouping UPs has been developed [5],
implication of these concerns is to recruit as many users/analysts           which, however, supports the work of individual evaluators but
as the project’s resources allow, thereby maximizing the                     neglects the collaborative aspect of usability evaluation.
probability of identifying most, but impossibly all, UPs.                    In summary, the actual practice of UP consolidation is largely
                                                                             open, unstructured and unchecked. With the major goals to
                                                                             examine the impact of the UP consolidation process and to
                                                                             understand the mechanism underlying the consensus building
 Permission to make digital or hard copies of all or part of this work for   process, we have conducted a research study. In this paper we
 personal or classroom use is granted without fee provided that copies are   summarize the main findings on the first issue while leaving out
 not made or distributed for profit or commercial advantage and that         the second one as the data are still being analyzed.
 copies bear this notice and the full citation on the first page. To copy
 otherwise, or republish, to post on servers or to redistribute to lists,
 requires prior specific permission and/or a fee.                            2. RESEARCH METHODS
                                                                             The empirical study was conducted at a university in the UK. Ten
 I-USED’08, September 24, 2008, Pisa, Italy                                  students (one female) majored in computer science were
                                                                             recruited. All have acquired reasonable knowledge of HCI and
experience in user-based evaluation through lectures and projects.       and T2 (Figure 1). In other words, each participant was required
They were grouped into five pairs. An e-learning platform was            to analyse four sets of data (P1-T1, P1-T2, P2-T1 and P2-T2).
usability evaluated (i.e. think aloud) with representative end-users
one year ago. Among different types of data collected, we                2.2 Individual Problem Consolidation
employed for this current study the observational reports written        With the four lists of extracted UPs, the participant was required
by the experimenter who was present throughout the testing               to filter out any duplicate within the lists and then merge similar
sessions and registered the users’ behaviours in very fine detail.       UPs, resulting in two sets of UPs (i.e. P1-T1 and P2-T1 as one set;
We also developed several structured forms to register the               P1-T2 and P2-T2 as another set). Unique UPs identified would be
participants’ findings in the different steps of our study. All the      retained or discarded during this process. The participants were
participants had to attend two testing sessions: In the first one        asked to record the outcomes in the same form for problem
they performed Individual Problem Extraction and Individual              extraction, but they needed to indicate explicitly in the column
Problem Consolidation, and about a week later, they paired up to         UP-identifier which UPs were combined. Severity and confidence
perform Collaborative Problem Consolidation.                             levels could also be adjusted. No time limit was imposed.

2.1 Individual Problem Extraction                                        2.3 Collaborative Problem Consolidation
Each participant was given the narrative observational reports           With a break of several days, two participants of a group came
(printed texts) how the users P1 and P2 performed Task 1 (T1)            together to merge their respective lists of UPs prepared in the
“Browse the Catalogue” and Task 2 (T2) “Provide and Offer a              individual sessions into a master list. They could access all the
Learning Resource”. For each UP extracted, the participant was           materials used in the earlier sessions. They were asked to track
required to record in a structured analysis form five attributes:        every item (i.e., a single UP or combined UPs) in their own
1. Develop UP identifier with a given format;                            consolidated list by recording in a structured form which of the
2. Provide a UP description as detailed as possible;                     three possible changes was made - merged (with which one),
3. Select criteria from a given list to justify the UP;                  retained or discarded. No time limit was imposed on any of the
4. Judge the severity level of UP: minor, moderate, severe;              above procedures. While individual and collaborative problem
5. How confident the evaluator was that the UP identified was            consolidation basically involved similar sub-tasks, the latter was
   true: 1 lowest – 5 highest;                                           conducted to observe how the collaborative setting influenced an
After completing the analysis form for T1, the participant was           individual’s merging strategies.
asked to apply the same procedure to P1’s T2, and then to P2’s T1



   Observational        Observational                    Observational        Observational
      Reports              Reports                          Reports              Reports
    P1-T1, P1-T2         P2-T1, P2-T2                     P1-T1, P1-T2         P1-T1, P1-T2


                                                                                                      Problem
                                                                                                      Extraction
                      E1                                                 E2


       UPs   UPs            UPs   UPs                   UPs   UPs          UPs   UPs                  Individual
      from from            from from                   from from          from from                   Problem
      P1-T1 P2-T1          P1-T2 P2-T2                 P1-T1 P2-T1        P1-T2 P2-T2                 Filtering and
                                                                                                      Merging
      Merged list          Merged list                 Merged list        Merged list
      of UPs for T1        of UPs for T2               of UPs for T1      of UPs for T2
                                                                                                      Collaborative
                                                                                                      Problem
                                                                                                      Filtering and
                                                                                                      Merging
                   Consolidated lists                           Consolidated lists
                     of UPs for T1                                of UPs for T2


                                        Figure 1: The workflow of problem consolidating process
3. RESULTS                                                             corresponding final ratings. Table 4 displays the results for the
                                                                       merged UPs. Similar patterns to Table 1 were observed.
3.1 Individual Problem Consolidation
The ten participants extracted from the observational reports          Table 3. Distribution of outcomes in the collaborative filtering
altogether 98 and 81 UPs for T1 and T2 over the two users (P1                              Merged          Discarded        Retained
and P2), respectively. Furthermore, they individually consolidated
                                                                            T1              81%              10%              9%
their UPs. Table 1 shows the extent to which the participants
                                                                            T2              77%              15%              8%
merged, discarded and retained the UPs extracted.
Table 1. Distribution of outcomes in the individual filtering          Table 4. Severity/confidence changes in merged UPs (collab.)
                                                                                           Severity                Confidence
                    Merged          Discarded         Retained                         T1           T2          T1            T2
     T1              39%              13%              48%             DEC           2 (5%)       2 (7%)      2 (5%)       3 (11%)
     T2              51%              10%              39%             SAME        23 (52%)      16 (57%)    22 (50%)     13 (46%)
For the merged and retained UPs, there were changes in severity        INC         22 (43%)      10 (36%)    19 (45%)     12 (43%)
ratings and/or confidence levels or no changes at all. To simplify
the results, we collapse different degrees of increase/decrease
(e.g. minor Æ moderate/severe or vice versa) into INC or DEC,          4. DISCUSSION
respectively, and denote no change with SAME.                          The empirical findings of this study enable us to draw
                                                                       comparisons between the individual and collaborative UP
Table 2. Severity/confidence changes in merged UPs (Indiv.)            consolidation processes, which presumably involve the core
                    Severity                     Confidence            mechanism of judging similarity among UPs. One notable
                 T1           T2              T1            T2         distinction is the lenience towards merging in the collaborative
                                                                       setting, as shown by the high merging rate. Indeed, quite a
DEC           4 (10%)       3 (7%)         6 (15%)       4 (10%)
                                                                       number of participants combined UPs that had not been merged in
SAME          20 (53%)    29 (71%)        15 (40%)      18 (44%)
                                                                       their individual sessions to merge with their partners’. It may be
INC           14 (37%)     9 (22%)        17 (45%)      19 (46%)
                                                                       attributed to social pressure that coerces them to reach consensus.
The same notations are applied to the confidence level. In             The data indicate that as a result of the merging process, severity
merging the UPs, the participants tended to increase the severity      ratings of UPs tend to inflate and the number of UPs tends to
ratings by one or two degrees (i.e. 37% for T1 and 22% for T2;         deflate excessively in the collaborative setting. In contrast,
Table 2). In contrast, it seemed they did not bother to adjust the     confidence levels, in which personal experience plays a role, do
severity of the UPs retained (i.e., 2% and 6% for T1 and T2,           not fluctuate with the merging process. Previous research studies
respectively). In the post-filtering interviews, most participants     indicate that severity ratings influence how developers and project
explained that when a UP was both identified in P1 and P2, it          managers prioritize which UPs to fix ([3], [6]). Invalid severity
could indicate that the UP was more severe than originally             ratings presumably lead to the fixing of less urgent UPs.
estimated and that it rectified the realness of the problem, thereby   Consequently, the quality of the system may still be undermined
boosting their confidence. Interestingly, the correlation between      by more severe as well as more urgent UPs.
the original severity ratings and confidence levels (r = 0.25, n =
                                                                       The implication for the future work is to look into relevant
179, p = 0.001) was found to be significant, implying that the
                                                                       theories on similarity (an age-old issue), communication, and
participants were more confident that they judged the severe UPs
                                                                       social interaction. Further, we aim to extend our empirical studies
correctly but less so when judging minor or moderate UPs. In
                                                                       by systematically comparing merging through negotiation (i.e. the
contrast, the correlation between the changes in both variables (r
                                                                       consolidation procedure is to be implemented by a group of two
= 0.19, n = 26) was insignificant. In other words, changing the
                                                                       or three usability specialists or a group of developers or an
severity of a UP does not imply that the participant has become
                                                                       integrated team) versus merging through authority (i.e. only one
more (or less) confident about the realness of the UP.
                                                                       person-in-charge is to combine different lists of UPs). The quality
                                                                       of the consolidated usability outcomes will be compared, thereby
3.2 Collaborative Problem Consolidation                                enabling us to identify valid and reliable methods for
In comparison, the participants demonstrated an even stronger          consolidating UPs and to develop objective measures of the cost-
tendency to merge UPs in a collaborative setting (Table 3), which      effectiveness of such methods. Findings thus obtained will also
is higher than that (cf. 39% vs. 81% for T1; 51% vs. 77% for T2)       contribute to our ongoing research endeavour on downstream
observed in an individual session. The participants tended to          utility.
negotiate at a higher abstract level where broad problem types can
accommodate a variety of problem instances, thus mitigating
direct confrontation with partners over controversial similarities.
                                                                       5. REFERENCES
                                                                       [1] Connell, I., & Hammond, N. (1999). Comparing usability
The participants tended to receptive to their partners’ proposals,
                                                                           evaluation principles with heuristics: Problem instances vs.
especially when the agreement thus reached would not cause any
                                                                           problem types. Proc. INTERACT 1999.
actual economic or personal gain (or loss). When negotiating to
merge or retain UPs, the participants adjusted the severity and        [2] Hassenzahl, M. (2000). Prioritizing usability problems: data-
confidence ratings. For each aggregate we averaged the ratings of          driven and judgement-driven severity estimates. Behaviour &
the original set of to-be-merged UPs and compared it with the              Information Technology, 19(1), 29-42.
[3] Hertzum, M. (2006). Problem prioritization in usability         [7] Law, E. L-C., & Hvannberg, E. T. (2004). Analysis of
    evaluation: From severity assessments toward impact on              combinatorial user effect in international usability test. Proc.
    design. International Journal of Human Computer                     CHI 2004
    Interaction (IJHCI), 21(2), 125-146.                            [8] Lewis, J.R. (1994). Sample sizes for usability studies:
[4] Hertzum, M., & Jacobsen, N.E. (2003). The evaluator effect:         Additional considerations. Human Factors, 36(2), 368-378.
    A chilling fact about usability evaluation methods. IJHCI,      [9] Nielsen, J. (1994). Heuristic evaluation. In J. Nielsen & R.L.
    15(1).                                                              Mack (Eds.), Usability inspection methods. New York: Wiley
[5] Howarth, J. (2007). Supporting novice usability practitioners   [10] Virzi, R.A. (1992). Refining the test phase of usability
    with usability engineering tools. PhD thesis (VT).                   evaluation: How many subjects is enough? Human Factors,
[6] Law, E. L.-C. (2006). Evaluating the Downstream Utility of           34(4), 457-468
    User Tests and Examining the Developer Effect: A Case
    Study. International Journal of Human Computer Interaction
    (IJHCI), 21(2), 147-172.