Evaluating the Impact of Clutter in Linear
                      Diagrams

    Mohanad Alqadah1 , Gem Stapleton1 , John Howse1 , and Peter Chapman2
                           1
                              University of Brighton, UK
               {m.alqadah1, g.e.stapleton, john.howse}@brighton.ac.uk
                        2
                          Edinburgh Napier University, UK
                              p.chapman@napier.ac.uk


       Abstract. Linear diagrams are an eﬀective way of visualizing sets and
       their relationships. Sets are visualized by a collection of straight line
       segments and the ways in which the lines overlap indicate subset and
       disjointness relationships. As with many visualization methods, linear
       diagrams can become cluttered. In previous research, we established a
       clutter measure for linear diagrams that was empirically shown to cor-
       relate with perceived clutter. The aim of this paper is to determine the
       impact of linear diagram clutter on user task performance. An empirical
       study was conducted with three levels of clutter. Surprisingly, we found
       that diagrams with a medium level of clutter had significantly slower task
       performance than low and high cluttered diagrams. Moreover, we found
       no significant performance diﬀerence between the low and high clutter.
       We concluded that clutter aﬀects the interpretation of linear diagrams.
       A future research goal is to establish methods for controlling the level of
       clutter in linear diagrams, such as using multiple diagrams instead of a
       single diagram, when visualizing sets.

       Keywords: linear diagrams, clutter, diagram comprehension


1    Introduction
Presenting a lot of information in a diagram and the layout choices made when
drawing a diagram can result in visual clutter, known to sometimes be a barrier
to cognition. It is therefore essential to understand clutter and how to reduce
clutter when needed. Ellis and Dix state that it is “important to have a clear
understanding of clutter reduction techniques in order to design visualizations
that can eﬀectively uncover patterns and trends” [7]. Bertini et al. have a similar
standpoint, devising an approach to analyze and reduce clutter in an information
visualization context [3].
    Visualizing information using diagrams can have huge benefits over textual
notations, provided the diagrams are eﬀective [12]. In the context of visualizing
sets, there are several methods that can be used, including Euler, Venn, and
linear diagrams, with the first two being well-known. In the context of Euler
and Venn diagrams, clutter was shown to have an impact on user comprehen-
sion [1]. Linear diagrams were introduced by Leibniz in 1686 [6] and they are
2       M. Alqadah et al.

        Thriller                                Thriller

        Comedy                                  Comedy

        Period                                  Period

        Drama                                   Drama

         Fig. 1: A linear diagram.            Fig. 2: Altering overlap order.


similar to parallel bargrams [20] and double decker plots [9]. Recently, Chapman
et al. [5] empirically established that linear diagrams more eﬀectively support
task performance than Euler and Venn diagrams. Sato and Mineshima [18] have
also shown that linear diagrams are superior to the linguistic representations of
syllogisms in the context of logical reasoning. These results motivate the need
to understand the role of clutter in linear diagrams to better place us to exploit
their cognitive advantages.
    A linear diagram consists of parallel line segments drawn horizontally. Each
set presented in the diagram is represented by the line segments that share their
y-coordinate. For example, Fig. 1 represents four sets using eight line segments.
The set Thriller is represented by two line segments. Line segments for diﬀerent
sets can occupy the same vertical space, known as an overlap, to represent the
intersection between the sets. For example, in Fig. 1 the left-most overlap con-
tains line segments for the sets Thriller, Comedy, and Drama but not Period. If
two sets do not share any overlaps then these two sets are disjoint. For instance,
in Fig. 1 set Period is disjoint from all the other sets in the diagram. Moreover,
if all of the line segments for a set occur in overlaps with line segments from
another set then the former is a subset of the latter.
    Research has suggested that to use linear diagrams successfully, we need to
understand when they can be eﬀectively interpreted [17]. In this paper we set
out to establish the impact of linear diagram clutter on user task performance, in
term of time and accuracy. The structure of the rest of this paper is as follows.
A theoretical measure of clutter in linear diagrams is given in section 2. We
present the experiment design in section 3. Section 4 describes how we executed
the experiment and section 5 presents the results of the study. In section 6, we
address the threats to validity. In section 7, we oﬀer conclusions and discussion on
future work. The diagrams used in the study, along with the raw data collected,
can be found at https://sites.google.com/site/msapro/phdstudyfour.


2   Clutter in Linear Diagrams

In previous work we defined a clutter measure, called the line score, for linear
diagrams: each set contributes n to the line score, where n is the number of line
segments that are used to represent that set [2]. We established that the line
score clutter measure meets user perceptions: diagrams with a higher line score
were generally perceived as more cluttered.
                                                      Evaluating the Impact of Clutter in Linear Diagrams                                                                                                3


FASHION

PHYSICS

HISTORY

GEOLOGY

RESEARCH

ACCOUNTING

BUSINESS

             Neil    Inez    Ulffr   Yeah     Joel   Nina    Laya     Henry Elea      Bree   Odin   Adam Quinn Park       Gino    Sally   Hetty   Xylia   Sofia   Dyan   Umar Hali     Laya   Yasir   Kyle
             Will    Fred            Amil     Brad   Greg    Aila     Fares Velda Farah Zoa         Yoel    Kale   Zaid   Brook           Tobe            Lailah Rita    Espy   Reba Paul     Juno    Reed
             Tina                             Elin                                    Page                         Dana                   Qusai                                 Cyan   Treva Noah Isis
                                                                                                                   Vera                   Dino                                  Vern
                                                                                                                                                                                Gail


                                     Fig. 3: Low cluttered linear diagram with 7 sets.

RESEARCH

MARKETING

PHYSICS

ECONOMICS

MANAGEMENT

FASHION

ACCOUNTING

              Rico    Nina     Wong Guss Nora         Yeah    Uriel    Xanti   Tina   Becky Luce    Bella   Hetty Imani Kyja      Zed     Gerry Scot      Karl    Jade   Vern   Jere   Fares Xara     Tonya
              Andy Kyah       Nash    Villa    Pat    Paul    Ciara            Ivy    Dylan Ouida Fiona Logan Rafe        Adam Seth               Clara Wahid Xena Dino         Yael   Qays           Syed
                      Hoke            Adin            Tobe                                          Rynn           Paul   Beth    Elea                                   Gino          Cora
                      Lana                                                                                                        Flora                                                Enzo
                      Omer


                             Fig. 4: Medium cluttered linear diagram with 7 sets.


    For example, the diagram in Fig. 1 has a line score of 8; the contributions
of each set to the score are 2, 2, 1, and 3, from top to bottom. We can change
the order of the overlaps to get the semantically equivalent diagram in Fig. 2.
This change in the order of overlaps reduces the clutter score to 5. As a more
complex example, the diagrams in Figs 3 to 5 exhibit three diﬀerent line scores,
with Fig. 3 being the least cluttered and Fig. 5 being most cluttered. In general,
we can permute the order of the overlaps without altering the information being
represented but leading to diﬀerent clutter scores. Therefore, if we understand
the impact of clutter on task performance, we can chose between competing
overlap orders in order to best support task performance.

3         Experiment Design
To answer the research question ‘does linear diagram clutter aﬀect user compre-
hension?’ we designed a within-group empirical study, building on an established
measure of clutter, the line score, for linear diagrams [2]. Each participant was
asked to answer questions about the information in a set of linear diagrams
which had varying levels of clutter. Consistent with other research contribu-
tions towards understanding user comprehension, such as [4, 11, 13, 15], we used
4            M. Alqadah et al.


ACCOUNTING

MARKETING

FINANCE

PHYSICS

DESIGN

HISTORY

BIOLOGY

             Bambi Kris   Neil    Park   Maxx Dixie   Aila   Amy   Vicki   Donia Frank Niko   Hope    Zaid   Ura    Xia     Tyler   Glen   Kipp   Wafi   Waldo Lainy   Yee     Joey   Esme
             Maud Osie    Orene          Zelda Maya          Perry Cathy Gia           Jess   Quint Billy    Elza   Irene   Joni    Fred   Una    Qabil Nick   Fiona Rose      Wafa Kyle
                   Tony   Rona                                     Pearl               Rita   Xenia Esta                            Owen                               Isam    Luna
                          Zetta                                                               Yasin                                                                    Trish
                          Qusai


                                 Fig. 5: High cluttered linear diagram with 7 sets.


research software that was designed to record user performance data while per-
forming the experiment. Two primary variables were recorded: the time taken to
answer the question and whether the question had been answered correctly. We
set a time limit of two minutes to answer each question and all timeouts were
recorded. If clutter level aﬀects user comprehension of linear diagrams then we
would expect to see a significant diﬀerence between the means of the time taken
to answer the questions or the error rates over the diﬀerent levels of clutter.

3.1       Study Details and Question Types
The participants were staﬀ or students from the School of Computing, Engineer-
ing and Mathematics at the University of Brighton; none of them were members
of the authors’ research group. Consistent with Rodgers et al. [16], the diagrams
were drawn with pseudo-real data, conveying information about students’ inter-
ests in university subjects. For example in Figs 3, 4 and 5, the line segments for
each set represent a subject. The student names are written under the overlap
corresponding to the subjects in which they are interested. This design choice
was to ensure the context of the information presented was familiar to the par-
ticipants, removing any learning barrier that could arise through participants
learning a new context. The data presented in the diagrams was hypothetical to
ensure that prior knowledge cannot be used.
    Subject names consisted of one word, did not sound similar to each other and
did not share the same first letter, because that could confuse the participants
as we found in our previous research [1]. Furthermore, each subject name was
placed on the left side of the diagram and aligned with, and took the same colour
as, the associated line segments. The order of the subject names was random
and no two diagrams had the same order. This was important to make sure
the participants had to read the question and the diagram each time, reducing
learning eﬀect that could impact on performance.
    The student names that were presented in the diagrams were a randomized
mixture of both male and female names across a variety of ethnicities, to reduce
any bias that could occur if we chose the names manually. Moreover, the names
                                               Evaluating the Impact of Clutter in Linear Diagrams                                                                                        5


GEOLOGY

ARCHAEOLOGY

BUSINESS

NETWORKING

ECONOMICS

HISTORY

MUSIC

TECHNOLOGY

RESEARCH

PHYSICS

              Taryn Jim    Itzel   Lisa   Zane Ravi   Cesar Kenny Dylan Oisin   Eva     Cheri Betty   Kylie   Mona Gavin Cary       Katy   Lea   Kent    Colm Paris   Kiera   Jody    Jake
              Roni                 Seb                Rhett       Avery Nader Hazel Garth Mia         Erik    Asia   Clark   Zeb                 Axel    Oran         Lacey Kent      Ervin
                                   Tom                Hugo        Jorja   Wyatt Aine    Rylan         Aden           Annie Alton                 Isla    Jalen                Giles
                                                      Uriel                     Kyrin   Joyce                                Jess                Glenn
                                                      Marla


                          Fig. 6: Medium cluttered linear diagram with 10 sets.


were first name only and of a similar length (three, four or five letters) to reduce
the risk of having a name that was more prominent than the others (e.g very long
names). We avoided using names more than once in the same diagram. Moreover,
the names that appear in the same overlap sounded diﬀerent and started with
diﬀerent first letters. This design choice was made to limit any confusion and
make the names easier to remember.
   Three types of multiple choice questions were used in the study: ‘Who’, ‘How
many’, and ‘Which’, consistent with previous research [1, 4, 16] 3 . There were five
choices of answer of which one was correct. Example questions are:

 1. Who of the following is taking PHYSICS, HISTORY, GEOLOGY, RE-
    SEARCH and nothing else?
 2. How many students are taking GEOLOGY, ARCHAEOLOGY, MUSIC,
    RESEARCH, PHYSICS and nothing else?
 3. Which one of the following modules is being taken by 23 students?

    Questions 1, 2, and 3 were asked of the diagrams in Figs 3, 6, and 7 respec-
tively. The set names that appear in the ‘Who’ and ‘How many’ questions, as
well as in the answers, match the same order in the relevant diagram from top
to bottom. This choice should make it easier for the participants to find the
required sets to answer the questions. After choosing an answer for the ‘How
many’ questions, the participants were asked to type the names of the students
they had counted. For their answer to be classed as correct, the names typed in
had to be correct. For ‘Which’ questions, there was only one set in the diagram
that had the number of students’ names specified in the question.
 3
     The empirical studies on linear diagrams in [17] used tasks that required people to
     identify subset, disjointness, and non-empty intersection relationships between sets
     and are, thus, diﬀerent from the tasks we use in this study.
6            M. Alqadah et al.


GEOLOGY

NETWORKING

COMPUTING

MARKETING

TECHNOLOGY

RESEARCH

BUSINESS

FASHION

ACCOUNTING

HISTORY

ECONOMICS

DESIGN

PHYSICS

             Jude    Yeah Isis     Zeb    Neal    Tobe   Logan Shon Umar Zoey      Toby    Josh   Oneta Vera    Jason Pink    Elin   Kyra   Viola   Wong Thea   Gabe Xan   Diana Qays

             Brody         Beth    Lexi   Aniya          Rufus     Kylie   Riley   Kent    Kyah         Kyler   Amari Flora   Clara Cass    Paul    Ouida Omar Matt        Anna Nina

                           Qadir                         Mike      Dora            Lesly                              West    Amil                       Xiao              York   Rick

                           Reine                                   Gail                                               Brook Dyan                         Evie

                           Jade


                             Fig. 7: High cluttered linear diagram with 13 sets.


3.2        Levels of Diagram Clutter

We used three diﬀerent levels of clutter in the study, as measured by the line
score, to establish whether diagram clutter impacts user performance when inter-
preting linear diagrams. It was felt important that the diagrams were suﬃciently
complex to demand notable cognitive eﬀort by the study’s participants. Given
this, the diagrams were chosen so that their line scores (LSs) were: 10 − 20 (low
LS), 30 − 40 (medium LS), and 50 − 60 (high LS). For example, the three dia-
grams in Figs 3, 6 and 7 have LSs of 18, 40, and 60 respectively. For the study, 27
diagrams were drawn (nine for each level of clutter). These 27 diagrams were cre-
ated from nine initial diagrams, which were then redrawn with diﬀerent overlap
orders to create diﬀerent levels of clutter.


3.3        Clutter by Number of Sets

To widen the generality of the results from our study, the number of sets visu-
alized in the diagrams varied: they had either 7, 10 or 13 sets. The distribution
of the number of sets to clutter scores is shown in table 1. For example, the
diagrams in Fig. 3, 6, and 7 represent 7, 10, and 13 sets respectively. The three
types of questions were asked from each set of diagrams. For instance, as can be
seen in table 1, the three types of questions were asked of the information in the
diagrams drawn with seven sets (d1, d2, and d3).
    By including a variety of numbers of sets represented, we will be able to
determine whether there is an interaction with the clutter score. Suppose, for
example, that we have two diagrams, d7 and d10 , each with a clutter score of 20,
where d7 and d10 represent 7 and 10 sets respectively. Then d7 has, on average 2.9
                       Evaluating the Impact of Clutter in Linear Diagrams       7


                  Table 1: The characteristics of the diagrams.
Diagram reference d1    d2     d3   d4     d5        d6 d7    d8     d9
Question type    Who How many Which Who How many Which Who How many Which
Number of sets         7 Sets            10 Sets            13 Sets
Low clutter                          10 − 20 line score
Medium clutter                       30 − 40 line score
High clutter                         50 − 60 line score


lines per set, whereas d10 has just 2 lines per set. In general, when the diagram
clutter is high and the number of sets is small, the diagram has a high number
of line segments in a small (vertical) space, as measured by the number of sets
represented. When the number of sets increases there is more vertical space to
fit the same number of line segments. Our study allows us to explore whether
the measure of clutter relative to the number of sets influences cognition.


3.4   Linear Diagram Layout and Characteristics

As can be seen in table 1, the diagrams were divided into nine characteristic types
(d1 to d9); the number of sets represented appears under each characteristic
type. For each characteristic type there were three diagrams: a diagram for
each diagram clutter level. The three diagrams of each characteristic type were
allocated the same type of question. So, for example, d1 represented 7 sets, and
corresponded to a ‘Who’ question, and was drawn in three diﬀerent ways, to
give low, medium and high levels of clutter. The number of data items (student
names) was chosen to be large enough so that answering the question required
cognitive eﬀort. There were one to five students names within each overlap and
all the diagrams had 60 names. We randomly distributed the data items within
the overlaps in the diagrams. However, the number of items in each overlap was
the same in the three diagrams of each characteristic type. For example, the low
clutter diagram for d1 may have an overlap with, say, three student names. In
the medium clutter diagram for d1, this overlap need not occupying the same
vertical space but will still have three names written under it.
    We were careful to control the layout features for the diagrams drawn for the
study. Therefore, we adopted the following six layout guidelines to ensure consis-
tent layout features that reflected best-practice in linear diagram drawing [17]:

 1. The diagrams were drawn in a horizontal direction.
 2. All diagrams were drawn with coloured straight line segments and no two
    sets appearing in the same diagram had the same colour.
 3. The sets in the diagrams had the same distances between them.
 4. The overlaps in the diagrams had the same length, fixed at 50 pixels.
 5. Vertical grid lines with grey colour’ were used to represent the beginning
    and the end of each overlap.
 6. The line segments height was set to 6 pixels.
8       M. Alqadah et al.

   Guidelines 1-6 are from [17], but they are not suﬃcient to control variability
across diagrams in our study. We stipulate five more layout guidelines:
 7. The order of the colour of sets is the same in all the diagrams.
 8. All the diagrams had 25 overlaps.
 9. The stroke width for the vertical grid lines was set to 2 pixels.
10. The set labels were written using upper case letters in Time New Roman,
    14 point size, font in bold.
11. Data items were presented using lowercase letters, except that the first letter
    was capitalized, with black colour and Ariel 12 point size font.
    Consistent with [5, 8], a palette of thirteen colours was generated using color-
brewer2.org (accessed January 2016). Colour generation using the Brewer colour
palette is recognized as a valid approach for empirical studies, such as in the
context of maps [19]. So that the colours were distinguishable, but not sequen-
tial or suggestive (e.g. increasingly vivid shades of red used to denote heat), they
were generated using the ‘qualitative’ option, based on work by Ihaka [10].


4    Experiment Execution
The initial pilot study was conducted with five participants (4 M, 1 F, ages
18 − 38). The pilot study showed that the two minute time limit to answer
each question was enough for most participants, only one participant had 3
timeouts and 1 error, and another participant had 5 errors. The three remaining
participants answered all the questions correctly in the time limit. However, a
problem was revealed with the number of items that appear under the overlaps in
the diagrams: they were not equally distributed between the three diagrams for
each characteristic type. This unevenness may have caused bias across the clutter
levels, because the speed or accurately of answering questions could depend on
the distribution of items. We fixed this issue by equalizing the number of items
in the same overlaps across the three diagrams for each characteristic type.
For example, in the left-most overlap in the diagram in Fig. 3 there are three
students names (Neil, Will, and Tina), the equal overlap in the middle part of
Fig. 4 contains three students names (Imani, Rafe, and Paul) as well, and in
same overlap in Fig. 5 also has three names (Glen, Fred, and Owen). The main
experiment was carried out with 30 participants (19 M, 11 F, ages 17−34). Each
participant was given £6, in the form of a canteen voucher, to take part.
    There were three phases for the experiment. First is the introduction and the
paper-based training phase, the experimental facilitator used the introduction
script to introduce the participant to linear diagrams and the type of questions
they will see during the experiment. The participants then started the second,
computer-based training, phase where the participants had the chance to use
the experiment software to answer three questions in the two minutes given for
each question without interruption from the experiment facilitator. If a training
question was answered incorrectly, the experimental facilitator went through
the question with the participant. If the participant wished to continue then we
                         Evaluating the Impact of Clutter in Linear Diagrams      9

proceeded to the third (main) study phase where we collected the quantitative
data for our analysis. Each participant signed a consent form to say they would
like to participate in the study. The participant was not interrupted or given any
help during this phase. After answering all the 27 questions, the participant was
given the debrief script, including the contact information of the experimental
facilitator so that they could access the results of the study.

5     Experiment Results
The results are based on the data collected from the 30 participants in the main
study phase, each answering 27 questions, yielding 27 × 30 = 810 observations.

5.1     Overall Impact of Clutter
We analyzed the time and error data to determine whether significant diﬀer-
ences exist, overall, between the three clutter levels. This analysis allowed us to
determine whether clutter level significantly impacts on user task performance.

Analysis of Time Data The observed means are 42.93 (sd: 27.65) seconds for
low diagram clutter, 46.35 (sd: 25.89) seconds for medium diagram clutter, and
42.23 (sd: 24.94) seconds for high diagram clutter. The data are not normally
distributed and have a high skew. To make the data suitable for conducting an
ANOVA, it was subjected to a log transformation (resulting skewness: -0.05).
Therefore, the ANOVA was conducted in this transformed data, yielding a p-
value of 0.002 when testing for diﬀerences between the means for diagram clutter.
We therefore conclude that the mean time taken to interpret linear diagrams
alters significantly as the clutter score changes.
    To reveal how the significant diﬀerences arose, we conducted a Tukey test to
rank the clutter levels. Surprisingly, this test revealed that the diﬀerence between
the mean time taken to interpret linear diagrams with low diagram clutter versus
high diagram clutter is not significant. However, the diﬀerence between the mean
time taken to interpret linear diagrams with medium diagram clutter versus low
and high diagram clutter is significant. In conclusion, medium diagram clutter
has a significantly higher mean time as compared to low and high clutter. The
medium clutter level questions took roughly 3.5 to 4 seconds longer, on average,
to answer than the low and high clutter questions.

Analysis of Errors Of the 810 observations (30 participants × 27 questions)
there were a total of 31 errors and 18 timeouts which gives an error rate of 3.83%
and a timeout rate of 2.22%. Timeouts were counted as neither correct responses
nor errors. Table 2 shows the observed number of correct answers, errors and
timeouts for each clutter level. We performed a χ2 test, giving a p-value of 0.287,
which is not significant 4 . We conclude that diagram clutter does not aﬀect the
comprehension of linear diagrams in terms of accuracy.
4
    Treating timeouts as errors changes the p-value to 0.266.
10     M. Alqadah et al.


Table 2: The observed correct answers, errors and timeouts for each clutter level.
               Clutter Level Correct Errors Timeouts Total
               Low Clutter     249     11      10     270
               Medium Clutter  254     12      4      270
               High Clutter    258     8       4      270
               Total           761     31      18     810


5.2   Interaction between Clutter and Number of Sets
Analysis of Time As discussed earlier, by including a variety of numbers of
sets we are able to determine whether there is an interaction between the number
of sets and the clutter score. The following analysis seeks to address whether the
measure of clutter relative to the number of sets influences cognition. The mean
times for each clutter level (along with the standard deviations) are shown in
Table 3. It can be seen that, for 10 and 13 sets, the medium level mean time is
higher than that for the low and high clutter levels, consistent with our overall
analysis. However, in the case of 7 sets, the mean times are all similar. This may
indicate that the level of cognitive diﬃcult associated with the number of sets
represented in relation to the level of clutter is low in this case, and perhaps
suggests that clutter has more of an eﬀect as the number of sets rises.


            Table 3: Time data by clutter level and number of sets.
                                      Number of Sets
             Clutter Level       7            10            13
             Low           49.37 (30.24) 41.36 (25.29) 38.06 (26.22)
             Medium        47.67 (29.66) 48.20 (25.11) 43.19 (22.37)
             High          46.39 (27.73) 39.81 (25.83) 40.49 (20.42)


    Our ANOVA revealed a significant interaction, with p = 0.001. In table 4, the
statistical results of the pairwise comparisons of the interaction between diagram
clutter and number of sets. This reveals that:
1. Participants took significantly longer to answer questions of the 10 set dia-
   grams with medium clutter than the 10 and 13 set diagrams with high and
   low clutter respectively.
2. Participants took significantly longer to answer questions of the 7 set dia-
   grams with low and medium clutter than the 10 and 13 set diagrams with
   high and low clutter respectively.
3. Participants took significantly longer to answer questions of the 7 set dia-
   grams with high clutter than the 13 set diagrams with low clutter.
    The results suggest that the combination of diagram clutter and the number
of sets impacts on the time taken to interpret linear diagrams.
                       Evaluating the Impact of Clutter in Linear Diagrams     11

Analysis of Errors We will compare the error rates between the number of
sets represented. This will reveal whether increasing the number of sets, whilst
keeping the clutter level constant, impacts on cognition. However, we note that
the following analysis should be treated with caution, as the observed values
in some cases are very low. Firstly, in the case of low clutter, conducting a χ2
test yields a p-value of 0.819, with the data presented in table 5. We conclude
that there is no significant diﬀerence between performance, in terms of errors,
as the number of sets varies when the clutter level is low. However, we saw in
our interaction analysis of the time data that low cluttered diagrams with 7 sets
took significantly longer for participants than the 10 and 13 set diagrams.
    In the case of medium clutter, conducting a χ2 test yields a p-value of 0.068,
with the data presented in table 5. We again conclude that there is no signifi-
cant diﬀerence between performance, in terms of errors, as the number of sets
varies. This is consistent with our time analysis, where we found no significant
performance diﬀerence between the medium clutter diagrams as the number of
sets varied (see table 4).
    Lastly, for high clutter, conducting a χ2 test yields a p-value of 0.823, with
the data presented in table 5. We again conclude that there is no significant
diﬀerence between performance, in terms of errors, as the number of sets varies.
This is again consistent with our time analysis, where we found no significant
performance diﬀerence between the high clutter diagrams as the number of sets
varied (see table 4).


5.3   Discussion

We now seek to explain the findings of the study. We found no impact of clutter
on accuracy. The low error rates could be partially attributed to our design
choices: the diagrams were drawn based on guidelines that have been empirically
established to have positive impact on user comprehension [17].
    The only statistically significant diﬀerences found arose through our analysis
of the time data, which revealed, surprisingly, that a medium level of clutter
generally led to poor task performance. To understand why this might have


Table 4: Grouping information using the Tukey method and 95.0% Confidence.
Clutter Level No. of Sets N Mean (Log Time) Mean (Seconds) Grouping
Medium            10      90     1.627          48.20      A
Low                7      90     1.601          49.37      A
Medium             7      90     1.596          47.67      A
High               7      90     1.585          46.39      AB
Medium            13      90     1.571          43.19      ABC
High              13      90     1.559          40.49      ABC
Low               10      90     1.550          41.36      ABC
High              10      90     1.513          39.81        BC
Low               13      90     1.493          38.06         C
12      M. Alqadah et al.


        Table 5: Observed values for low clutter level by number of sets.
               Number of Sets Correct Errors Timeouts Total
               7                82      4        4     90
               10               82      5        3     90
               13               85      2        3     90
               Total            249     11      10     270


     Table 6: Observed values for medium clutter level by number of sets.
               Number of Sets Correct Errors Timeouts Total
               7                81      5       4      90
               10               87      3       0      90
               13               86      4       0      90
               Total            254     12      4      270


arisen, we begin by making some observations about the relationship between
clutter, the number of sets represented, and the lengths of the corresponding line
                                                                      clutter score
segments. Now, the average number of line segments per set is number          of sets .
Therefore, as the clutter score increases, the length of the line segments used
decreases. As we used a fixed number (25) of overlaps this means that diagrams
with a high clutter score will exhibit more short lines, relatively speaking, than
diagrams with a low clutter score. Diagrams with a medium level of clutter
will fall somewhere in the middle. Consequently, in low clutter diagrams, many
sets are typically represented by just one or two line segments. Medium clutter
diagrams represent sets by around 2 to 3 line segments and High clutter diagrams
represent sets by around 3 to 5 line segments. Therefore, we might expect many
long line segments in the low clutter diagrams and short line segments in the
high clutter diagrams. However, the medium clutter diagrams having will have
more of a mix of the two, and therefore perhaps appear less uniform in layout.
This can be seen in Fig. 3 (low clutter: many long line segments), Fig. 5 (high
clutter: many short line segments) and Fig. 4 (medium clutter: more variability
in line segment length).
    Through our statistical analysis, we have demonstrated that extracting in-
formation takes significantly longer from a medium cluttered diagram than from
a low or high cluttered diagram. We can infer, then, that understanding linear
diagrams with a mixture of short and long line segments is more diﬃcult than
understanding diagrams which predominantly use only long, or short, line seg-
ments. We also found that there is no significant diﬀerence between the high
cluttered diagrams and the low cluttered diagrams, overall. We hypothesize that
diﬀerent reading strategies, which we call reading vertically and reading horizon-
tally, are employed for each type of diagram. The short line segments in highly
cluttered diagrams could make it easier to identify an overlap by reading ver-
tically, since the short line segments produce visually distinct blocks. We can
thus easily associate an overlap with its items. By contrast, when a diagram
                        Evaluating the Impact of Clutter in Linear Diagrams       13


       Table 7: Observed values for high clutter level by number of sets.
               Number of Sets Correct Errors Timeouts Total
               7                87      2       1      90
               10               85      4       1      90
               13               86      2       2      90
               Total            258     8       4      270


is low cluttered it consists of long line segments which could make it easier to
read horizontally, allowing an easy overview of the entire set. It might be that a
mixture of these strategies best helps read medium cluttered diagrams, or that
neither are ideal, increasing cognitive load. Further work is needed to establish
whether people actually employ these reading strategies generally.


6   Threats to Validity

We now discuss the limitations of the study and how threats were managed to
minimize their impact on the results. We categorized the threats to validity as
internal, construct, and external as in [14]. In regard to the internal validity, the
following factors have been considered during the study design:

 1. Motivation eﬀect: this could a threat if the participants did not freely vol-
    unteered to take part. To manage this eﬀect, the participants were invited
    to take part as volunteers and they were given £6 refreshment voucher for
    their time.
 2. Laboratory: it is important to expose all participants to the same environ-
    ment in this type of study. The experiment took place in an usability labo-
    ratory where the environment was free from noise and interruption and all
    participants were treated in the same way.
 3. Fatigue eﬀect: fatigue could be a threat if the participants where exposed to
    repeated tasks for a long time. To control this, the study was designed to
    finish in less than one hour.
 4. Questions order : we presented the questions in a random order for each
    participant to reduce any eﬀect (e.g. tiredness, learning) owing to order.
 5. Layout variation: many choices about diagram layout were made, such as
    the colour and the thickness of the lines [17]. To manage the threat of layout
    variations, we followed drawing guidelines.

    Construct validity considers whether the identified dependent (primary) and
independent (secondary) variables are the correct choices. Consistent with other
researchers who studied user comprehension, for instance [4, 11, 13, 15], compre-
hension is measured by the time taken to answer the questions and the number of
errors made. In addition, we also considered false negatives: these could arise if a
participant selected the wrong answer while reading it to be the correct answer.
To mange this, the similarity of university subject (i.e. set) names and student
14      M. Alqadah et al.

names was controlled. Furthermore, when participants were asked a ‘How many’
question, they were required to type in the names of the students they had
counted to reduce false positives.
    A secondary variable is the choice of the diagrams. It could be a threat
if participants did not spend enough time understanding the diagrams before
answering the questions. To manage this threat we created diversity in the dia-
grams and designed them to convey an appropriate level of information in order
to require cognitive eﬀort when answering questions.
    With regard to external validity, the following factor indicate the limitations
of the results and to which extent we can generalize them. It is a common threat
to the external validity of user studies if the participant have not represent a
wider population. All participants were staﬀ or students from the University
of Brighton with little or no previous experience of using linear diagrams. The
question styles could be a threat if there was no variety in the questions asked
of the linear diagrams. To allow more generalization to the study we used three
styles of questions.

7    Conclusion
In this paper, we determined whether, and how, clutter aﬀects user understand-
ing when interpreting linear diagrams. The results show that clutter does aﬀect
the comprehension of linear diagrams. Surprisingly, increasing clutter does not
always negatively aﬀect comprehension: the medium cluttered diagrams required
significantly longer to interrogate than low or high cluttered diagrams. Moreover,
we found no significant diﬀerence between the time taken to interpret high or
low cluttered diagrams. Furthermore, the error data analysis showed that there
is no significant relationship between error rates and complexity of the diagram.
In summary, we found that diagram clutter aﬀects time taken to interpret in-
formation in linear diagrams, but it does not aﬀect the accuracy with which
participants answered questions about the data visualized in linear diagrams.
    These results inform the direction of further research into the eﬀect of the
complexity of linear diagrams on comprehension. For instance, it would be in-
teresting to establish strategies people employed for reading linear diagrams, as
hypothesized in our discussions above. A further fruitful step for this research
is to determine whether it is beneficial to use multiple linear diagrams to con-
vey information, rather than a single diagram, to control the amount of clutter
present in any individual diagram, informed by the results of this study.

References
 1. Alqadah, M., Stapleton, G., Howse, J., Chapman, P.: Evaluating the impact of
    clutter in Euler diagrams. In: Diagrammatic Representation and Inference, pp.
    108–122. No. 8578 in LNAI, Springer (2014)
 2. Alqadah, M., Stapleton, G., Howse, J., Chapman, P.: The perception of clutter
    in linear diagrams. In: Diagrammatic Representation and Inference (accepted for).
    LNAI, Springer (2016)
                         Evaluating the Impact of Clutter in Linear Diagrams          15

 3. Bertini, E., Dell’Aquila, L., Santucci, G.: Reducing infovis cluttering through non
    uniform sampling, displacement, and user perception. In: Electronic Imaging 2006.
    pp. 60600L–60600L. International Society for Optics and Photonics (2006)
 4. Blake, A., Stapleton, G., Rodgers, P., Cheek, L., Howse, J.: Does the orientation of
    an Euler diagram aﬀect user comprehension? In: The 18th International Conference
    on Distributed Multimedia Systems, International Workshop on Visual Languages
    and Computing (2012)
 5. Chapman, P., Stapleton, G., Rodgers, P., Micallef, L., Blake, A.: Visualizing sets:
    an empirical comparison of diagram types. In: Diagrammatic Representation and
    Inference, pp. 146–160. No. 8578 in LNAI, Springer (2014)
 6. Couturat, L.: Opuscules et fragments inédits de Leibniz. Felix Alcan (1903)
 7. Ellis, G., Dix, A.: A taxonomy of clutter reduction for information visualisation.
    IEEE Transactions on Visualization and Computer Graphics 13(6), 1216–1223
    (2007)
 8. Harrower, M., Brewer, C.: Colorbrewer.org: An online tool for selecting colour
    schemes for maps. Cartographic Journal 40(1), 27–37 (2003)
 9. Hofmann, H., Siebes, A., Wilhelm, A.: Visualizing Association Rules with Interac-
    tive Mosaic Plots. In: 6th ACM SIGKDD international conference on Knowledge
    discovery and data mining. pp. 227–235. ACM (2000)
10. Ihaka, R.: Colour for presentation graphics. In: In 3rd Int. Workshop on Distributed
    Statistical Computing (2003)
11. Isenberg, P., Bezerianos, A., Dragicevic, P., Fekete, J.: A study on dual-scale data
    charts. In: IEEE Transactions on Visualization and Computer Graphics. pp. 2469
    – 2478. IEEE (2011)
12. Larkin, J., Simon, H.: Why a diagram is (sometimes) worth ten thousand words.
    Journal of Cognitive Science 11, 65–99 (1987)
13. Puchase, H.: Which aesthetic has the greatest eﬀect on human understanding? In:
    5th International Symposium on Graph Drawing. pp. 248–261. Springer (1997)
14. Purchase, H.C.: Experimental human-computer interaction: a practical guide with
    visual examples. Cambridge University Press (2012)
15. Riche, N., Dwyer, T.: Untangling Euler diagrams. IEEE Transactions on Visual-
    ization and Computer Graphics 16(6), 1090–1099 (2010)
16. Rodgers, P., Zhang, L., Purchase, H.: Wellformedness properties in Euler diagrams:
    Which should be used? IEEE Transactions on Visualization and Computer Graph-
    ics 18(7), 1089–1100 (2012)
17. Rodgers, P., Stapleton, G., Chapman, P.: Visualizing sets with linear diagrams.
    ACM Transactions on Computer-Human Interaction (TOCHI) 22(6), 27 (2015)
18. Sato, Y., Mineshima, K.: How diagrams can support syllogistic reasoning: An
    experimental study. Journal of Logic, Language and Information 24(4), 409–455
    (2015)
19. Silva, S., Madeira, J., Santos, B.: There is more to color scales than meets the eye:
    A review on the use of color in visualization. In: 11th International Conference on
    Information Visualization. pp. 943–950. IEEE Computer Society (2007)
20. Wittenburg, K., Lanning, T., Heinrichs, M., Stanton, M.: Parallel Bargrams for
    Consumer-based Information Exploration and Choice. In: 14th annual ACM sym-
    posium on User interface software and technology. pp. 51–60. ACM (2001)