<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Context in Spreadsheet Comprehension</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Kohlhase</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Kohlhase</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science, Jacobs University Bremen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Management, University of Applied Sciences Neu-Ulm</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Even though spreadsheet programs traditionally concentrate on exploration and computation of data (the author's view), a non-trivial proportion of spreadsheets are used for communicating data, models, and decisions to humans who assume the role of a spreadsheet reader. The communicative use of spreadsheets gives the spreadsheet context, that is, the background knowledge needed to interpret and make sense of its content, a very important role. Indeed, many 'spreadsheet errors' can be traced to mis-interpretations due to context failures. In this paper we report on a set of experiments we conducted to get a deeper understanding of the context in spreadsheet comprehension, focusing on the different perspectives authors and readers take in understanding spreadsheets. The results confirm missing context information as a likely source for semantic spreadsheet errors. Moreover, they lead to an extension and refinement of already established context dimensions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        In terms of adoption spreadsheet software is extremely
successful. Spreadsheet documents have developed from being
easy-to-use, programmable interfaces into easy-to-understand,
sharable data interfaces (see e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>
        Hence, the user experience does not stop with the creation
of a spreadsheet document, it also involves others reading it
later. The infamous high error-rate in spreadsheet documents
(see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) invoked an intense rush to studies and
solutions over the last couple of years, most of which tend to
address the avoidance of introducing errors when authoring a
spreadsheet. Our scientific interest revolves around the
spreadsheet document as a medium of communication, therefore
we are particularly interested in the reader’s experience of
spreadsheets as a data interface and concerned about the errors
introduced in the comprehension process.
      </p>
      <p>
        A taxonomy of the errors [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] shows that a significant portion
of errors (87%, as calculated in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) are semantic. In this
research, a semantic error is one that is committed when users
have a wrong concept that may be correctly or incorrectly
put into practice. These arise from misunderstanding the
realworld, wrong translation of the real-world to the spreadsheet
representation, or a misunderstanding of the spreadsheet’s
internal logic [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. As semantic errors are made on an individual
document base, there is neither hope for a best-practice guide
to train avoiding them nor for a general software update to
help out. Semantic errors pose a more serious threat for
wideimpact spreadsheets since more and more individual
communication errors might aggregate over the span of distribution.
      </p>
      <p>
        It has been proposed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that a key reason in committing
semantic errors is a missing higher-level abstraction of the
data. Tables, with their grid framework, expose details and
allow manipulation of underlying data. Therefore,
spreadsheets, as a computer-supported realization of tables, turn
one’s attention to data on a micro-level, failing to provide
the big picture. Generally, schematic diagrams or pictures
abstract away and integrate the data, presenting it holistically.
Newer versions of spreadsheet software like “MS Excel 2013”
thus have powered up their visualization features, e.g. with
“Power View”. In particular, Power View provides users with
report features and analytical views, which helps them to
comprehend the spreadsheet data and, thus, in the end increase
the spreadsheet’s readability.
      </p>
      <p>The communicative aspect of spreadsheets allows us to
understand them as knowledge sharing tool. Osterlund and
Carlile describe the semantic issues with knowledge sharing
as follows:
“the relational core of a knowledge sharing theory easily
falters. [...] We end up instead with a perspective that
focuses on the storage and retrieval of explicit knowledge
represented in information systems. Knowledge becomes
an object shared within and across community boundaries
without consequence for the community in which it
originated”. [8, p. 18]
Note that crossing a community boundary leaves the entire
context – the circumstances and settings in which a
document is created and obtains its specific meaning – behind.
Researchers in the field of Human-Computer Interaction (HCI)
have focused in recent years on the context-of-use of software
systems: user experience issues often only arise in the concrete
context in which a product is used. Our approach for tackling
the readability issue of spreadsheets is motivated by this
insight. Therefore we ask: what is the context of a spreadsheet
document and which role does it play for comprehension of
spreadsheets? For an answer, consider the following distinct
contexts:
the context of the data itself,
the information context (implicit knowledge) of the
author or the reader,
the event context of the author (the intention of the
document as communication tool) or the reader (the
expectation towards the usefulness of the document),
the effect context (e.g. decision making based on the
document).</p>
      <p>Note that the clear distinction between authors and readers is
only an analytical one. We are well aware that authors turn
into readers after a short while even for their own documents
and, vice versa, that the motivation of readers might consist in
searching for copy-able parts to author their very own
spreadsheets. Nevertheless, the context can be clearly distinguished
where wide-impact, local boundary-leaving spreadsheets are
concerned.</p>
      <p>
        Probst et al. (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) posit that glyphs, data, information,
and knowledge can be seen as stages of a pipeline (shown in
Fig. 1) being defined by the respective context. In particular,
they argue that glyphs are just a set of characters or symbols
like f0;9;5;,g without any structure. A first set of rules imposed
on the glyphs — the syntax context — then yields data which
can be handled by machines. Spreadsheet values are data and
as such they can be computed by the calculation engine of a
spreadsheet software. For obtaining meaning from such data
we still need another component: the context of meaning.
Usually, we discern data from information by viewing
information as data with a meaning. Davenport and Prusak think of
information “as data that makes a difference” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Data becomes
information if a user can interpret the data in regard to a
specific goal, that is a local meaningful context like using the
string ’0,95’ as number in an equation concerning exchange
rates in our example. In contrast, information becomes
knowledge, if a user can interpret the information in regard to a
global context of meaning like understanding the exchange rate
equation in the area of specific market behavior with respect
to change of exchange rates. Therefore, the role of context for
a spreadsheet consists in turning mere values into content.
We can even say that we get the more content the more
context there is. So far, the context which allows spreadsheets
to communicate information is contained in the table, row and
column headers and sometimes in added comments.
      </p>
      <p>
        Unfortunately, more often than not a spreadsheet can neither
be properly understood nor used, unless one looks beyond the
spreadsheet data itself to the broader background knowledge
within which it is embedded (see for instance [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). This
contextual knowledge is thus a frame and provides means for
the spreadsheet’s appropriate interpretation.
      </p>
      <p>
        In this paper we are aiming at a better understanding of
what spreadsheet context is, refining and extending a previous
study presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Here, we compare spreadsheet authors’
understanding of context to spreadsheet readers’; they turn
out to be very different. As the complexity of a spreadsheet
document might be a relevant distinction with regard to
context, we presented readers and authors with a simple and
a complex spreadsheet. In Section II we survey previous
research concerning semantic spreadsheet errors, in Section III
we present the details, results and interpretation of our study
and we conclude in Section IV.
      </p>
    </sec>
    <sec id="sec-2">
      <title>II. STATE OF THE ART</title>
      <p>
        There is on-going research [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] on ways to reduce
spreadsheet errors. Some approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] focus on
identifying best practices for error prevention, while others [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] tackle the root problem and concentrate on empowering
spreadsheets with better ways of interaction.
      </p>
      <p>
        Regarding error prevention, the approach is in understanding
common errors, so that they may be avoided. Much research
has shown that errors may not be eliminated completely, but
rather, reduced [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Best practices for this are often proposed
based on good software development principles. For instance,
one suggestion is having a full system development life cycle
with requirements analysis and design stages (both of which
are largely skipped). Other tactics, such as layout-planning,
cell-protection, even modular design, have been suggested to
protect against some errors, yet, all of these do not address
another problem - that of the semantic errors committed.
      </p>
      <p>
        Nardi and Miller noticed a relevant feature of spreadsheets
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: they are not “single-user applications”. In particular,
they are used in the work environment as collaboration tool
and as means of communication to exchange and combine
domain knowledge and programming expertise. Even though
“the visual clarity of the spreadsheet table exposes the structure
and intent of users’ models, encouraging the sharing of domain
knowledge” [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], this doesn’t mean that there is no information
loss in the sharing process. As Hendry and Green in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
point out, “the main resources available to spreadsheet users to
improve comprehensibility are to use titles and to arrange the layout
carefully”. These resources may be sufficient for simple
spreadsheets, but for complex ones they fail as the high semantic
error rate in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] indicates.
      </p>
      <p>
        To address this problem, Green et al. argued in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that
a solution might be a “browsable description level” to provide
more context. Essentially, a scheme for attaching descriptions
in which attributes and relationships can be recorded and later
searched for. It is effective in understanding and reusing code,
which in this case is a large and complex body of information.
Hendry and Green [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have imported this idea to
spreadsheet design. “CogMap”, a user assistance tool developed for
this purpose, provides capabilities for simple annotation of
spreadsheets with ’tags’, which can later be filtered in
colorcoded views. Tagging and annotating regions of spreadsheet
data provide simple, off-hand taxonomies, but cannot describe
richer information structures as ontologies can.
      </p>
      <p>
        In fact, by now ontologies have found their way in
many spreadsheet user assistance tools. For example,
“RightField” [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is an open source application that provides a
mechanism for embedding ontology annotation support in Excel
spreadsheets. Another such tool is our own “SACHS” system
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with which ontology-based enhancement of some context
aspects can be developed. A reader, in turn, can be informed
about these aspects later on: the system allows the author to
state his/her domain knowledge in a structured ontology which
SACHS integrates into the spreadsheet in form of adapted help
texts. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] shows that authors describing implicit knowledge,
use a mix of the following context dimensions:
      </p>
      <p>Definition (conceptual) – a thorough description of the
meaning
Purpose (conceptual) – the intention, i.e. why a specific
information is put there
Assessment of Purpose – an interpretation of the
purpose so that it allows drawing conclusions/actions
Assessment of Value – an interpretation of data so that
it allows for making judgement e.g. “if the ratio is close
to 100% everything is fine”
Formula – description of data by specifying how it was
computed (what function)
Provenance – the source of the data i.e. how it was
obtained (direct measurement, computation, import etc.).
History – explanation of how the spreadsheet was
changed over time</p>
      <p>
        On the other end of the communication pipeline stand the
spreadsheet readers. As we observe in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] research so far has
accented usability problems for developers of spreadsheets,
whereas the readers are largely overlooked. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] tries to
remedy this and studies how readers perceive information offered
in spreadsheets. The spreadsheet authors’ context dimensions
in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the readers’ information models in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] are the
starting point of this research.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. THE STUDY</title>
      <p>
        The study we report on is based on a Bachelor Thesis project
conducted by Ana Guseva in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] under the other authors’
close supervision. We use the data collected there, but refine
the interpretation in this paper.
      </p>
      <p>We conducted 6 interviews with three participants
representing each target group:
Authors who have developed the particular or a similar
spreadsheet, thus are experts in the spreadsheet
knowledge domain, and
Readers who are only moderately familiar with the
spreadsheet knowledge domain, but are directly affected
by the spreadsheet and therefore motivated to read it.
All participants were technologically fluent in computer use
and had at least basic experience with spreadsheet
technology. The interviews were then transcribed into 27 pages of
interview material.</p>
      <p>Each interview covered 2 test cases each of which centered
around a distinct spreadsheet. These spreadsheets were
developed in 2010 for budget planning in a research project and are
currently still in use:</p>
      <p>Complex As a complex spreadsheet we used a cluttered
spreadsheet that contains
– numerous data spread over multiple screens,
– complex formulae for calculations making use of the
more advanced functions offered by the computation
engine, and
– multiple nested data dependencies;
Simple Our simple spreadsheet is
– neatly organized on one screen,
– uses only basic formulae drawing on basic functions
like sum and divide, and
– containing only direct and somewhat expected data
dependencies like the ones in a total sum.</p>
      <p>The choice of these different kinds of spreadsheet was
intended to investigate the influence of spreadsheet complexity
on the presence and relevance of the specific context. The
distinctions allow us to judge how successfully the authors’
context knowledge is transferred to the readers via the
spreadsheet document. The data collection procedures involved tape
recorded interviews. The interviews were conversational in
style, they intended to capture the users’ understandings in
their own words and communication style. A fixed set of
open-ended questions was presented to each user, yet the
questions were asked as they arose naturally in the context
of the conversation.</p>
      <p>
        In particular, the exact interview procedure used is a variant
of the Wizard of Oz (WOZ) technique [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], in which
participants are given the impression that they are interacting
with a program, when in fact the program is operated by an
invisible human – the wizard. It is popular in the fields of
experimental psychology, human factors, ergonomics,
linguistics, and interface and usability engineering [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        The approach taken extends the application of the WOZ
technique, by reversing it. Previously used in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the Reverse
Wizard of Oz experiment, puts the participant in the position
of an ideal spreadsheet system, while the investigator interacts
with it, asking for help. It enables the investigator to act as if
she is unfamiliar with the test spreadsheet, and thus, allows her
to ask the participant to thoroughly and in detail explain the
spreadsheet. This in turn, elicits the participant’s contextual
knowledge.
      </p>
      <sec id="sec-3-1">
        <title>A. Data Analysis Method: Card Sorting</title>
        <p>
          The next step in the investigation procedure was the card
sorting. As [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] puts it, “Card sorting is a technique to understand
relationships between items, to group items into dimensions, and to
understand users’ mental models of item organization”.
        </p>
        <p>Traditionally, card sorting is used in designing information
architecture, workflows, menu structure and web-site
navigation paths. It involves many participants sorting a given set of
cards according to their best understanding. In this research
only the investigator sorted the cards. Effectively, here the card
sorting reflects the investigator’s mental model of what the
participants had in mind.</p>
        <p>To obtain cards the interview data was split up into units
called knowledge items, i.e., the smallest, still meaningful
parts of sentences extracted from the transcribed interview.
Here are some examples:
“All the costs [G11-23] are summed up in [G24]”.
“I think that’s a summary table of the above.”
“You can do it and insert another column (if you need to add
more partners).”</p>
        <p>Note that each knowledge item carries one complete piece
of information, so we created a card for each knowledge
item. A total of 319 cards were generated from the interview
material.</p>
        <p>
          As is the practice in open coding [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], the cards
were compared with others for similarities and differences.
In this way, conceptually similar cards were grouped together
to form dimensions and subdimensions. Making use of
constant comparisons guarded against the researcher’s bias and
achieved both greater precision (the grouping of like and only
like phenomena) and consistency (always grouping like with
like) [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The dimensions1 were then given labels,
depending on the conceptual notion they denoted. Once identified,
dimensions and their properties became the basis for sorting
the next cards. In sorting the next card, notice was taken
of similar cards already present in other dimensions. In fact,
during this process comparative questions were used, such as:
1) What is the essence of this card?, 2) How does it compare
to the previous cards?, 3) How does it differ from the previous
cards?, 4) Does it fit in any previous dimension?
        </p>
        <p>It must be noted, that a single card, if necessary, was placed
under one or more dimensions. Additionally, if during the
sorting, a previously sorted card was noticeably no longer fit
for a dimension, it was moved to another dimension where
it belonged. An additional closed card sorting (which verifies
dimensions) followed the open card sorting (which creates new
dimensions) to further consolidate the results.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Data Analysis: Context Dimensions</title>
        <p>
          The intermediate result of a card sorting process consists
of a set of card piles. These piles represent the distinct
dimensions of the context space of spreadsheets. Here, all 319
cards (knowledge items) were sorted into 14 piles.
1Note that we use the term “dimension” to be consistent with [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>In the next step each pile is labeled, so that we get a set
of names for distinct context dimensions. In Table I these
are shown together with the defining question(s) for each
respectively.</p>
        <p>Dimension
STATEMENT
REPHRASING
DEFINITION
BY-EXAMPLE
EVALUATION
FORMULA
PROVENANCE
REASON
HISTORY
ORGANIZATION
USAGE
PURPOSE
SIGNIFICANCE
OTHER</p>
        <p>Question
What is it? (read keyword)
What is it? (rephrase keyword)
What is it? (formal definition)
Example?
Is it good?
How is it calculated? (function)
From where?
How is it calculated? (dependency)
How come?
Why is it so?
Has it changed?
Where is it?
What should we do with it?
For what do we need it?
Is it important?
n/a
The dimensions in grey were so small, that we dropped
them in the following to prevent confusion.</p>
        <p>In Table II we listed the occurrence rate cDIMENSION=c of
cards belonging to a respective context dimension
differentiating by user role and the complexity level of the spreadsheet,
where cDIMENSION = number of cards in resp. context dimension
and c = total number of cards. Note that some cards were
sorted into multiple dimensions, so that the percentages don’t
add up to 100%.</p>
        <p>Dimension
STATEMENT
REPHRASING
DEFINITION
BY-EXAMPLE
EVALUATION
FORMULA
PROVENANCE
HISTORY
ORGANIZATION
PURPOSE
SIGNIFICANCE
OTHER</p>
        <p>Complex
Authors
13.8%
35.9%
13.1%
36.6%
28.3%
8.3%
29.0%
2.8%
20.0%
15.2%
13.8%
0.7%</p>
        <p>Readers
27.6%
25.9%
6.9%
15.5%
1.7%
10.3%
24.1%
0.0%
13.8%
0.0%
6.9%
37.9%</p>
        <p>Simple
Authors
14.1%
22.5%
9.9%
39.4%
31.0%
8.5%
40.8%
0.0%
8.5%
11.3%
12.7%
2.8%</p>
        <p>Readers
30.3%
30.3%
6.1%
24.2%
3.0%
9.1%
27.3%
0.0%
3.0%
3.0%
0.0%
27.3%</p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Data Interpretation: Context for Comprehension</title>
        <p>In this section we interpret the context dimensions found
in the card sorting with respect to their occurrence rate for
authors vs. readers (see Fig. 2) or relative to the complexity of
the spreadsheet (see Fig. 3). Additionally, for each dimension
we provide an exemplary knowledge item.</p>
        <p>1) STATEMENT: This category contains knowledge items
that repeated the facts contained in the spreadsheet that were
obvious to the user, for example,
(a) Readers
(b) Authors
“In this project 90% of the workload is for RTD.”
where the according column header read “RTD”2 and row
header “Project”.</p>
        <p>STATEMENT is used by readers twice as often than by
authors independently of the complexity of the spreadsheet.
This is not surprising as the spreadsheet in question was
unfamiliar to the readers, whereas authors still remembered at
least part of it and so they did not want to state the obvious.
Note that such a statement can be a mere reading aloud process
without any comprehension of what is communicated.</p>
        <p>2) REPHRASING: All knowledge items in this dimension
are stating in their own words the facts contained in the
spreadsheet that were obvious to the user. This kind of action
presumes that the respective fact has been understood. An
example is the following, where the first part is a STATEMENT
and the second part a REPHRASING:
“So 90 is the total person-months for Jacobs, it is what the</p>
      </sec>
      <sec id="sec-3-4">
        <title>EU would have paid us if the project had been granted.”</title>
        <p>Interestingly, REPHRASING was used more by authors than
readers for the complex spreadsheet (35:9% vs. 25:9%),
whereas it was the other way round for the simple one (22:5%
vs. 30:3%). We suspect that authors thought REPHRASING to
be an added-value for complex content, whereas the simple
content doesn’t require this explanation type – being so simple.
The readers viewed it differently, choosing REPHRASING as a
valuable explanation type. Here, we observe that the authors’
and the readers’ contexts differ and thus their assessments do
as well.</p>
        <p>3) DEFINITION: This dimension contains all knowledge
items that specify or define terminology used in the
spreadsheet headers, e.g.,
“FET is about Future Emergent Technologies, so some
visions about future emergent technologies.”
2In EU projects “RTD” abbreviates “Research and Technological
Development”.</p>
        <p>DEFINITIONs were rather rarely given (max = 13:1%), but
if so, then more by the authors than by the readers – especially
for the complex spreadsheet. This indicates that this context
dimension also belongs to the ones that would assist readers
when interpreting a spreadsheet. Moreover, the inclusion of
definitions into a spreadsheet might also be appreciated as
added-value even by authors.</p>
        <p>4) BY-EXAMPLE: Here, the set of knowledge items that
included examples of specific concepts contained in the given
spreadsheets, for instance,</p>
        <p>“Demonstration is, for example, to go on fairs.”</p>
        <p>Authors were significantly more able to provide examples
than readers. We can derive that examples constitute another
context dimension that is missing in spreadsheets. Another
interesting point wrt. BY-EXAMPLE is that authors used
examples to elaborate the content almost as often for the
complex as for the simple spreadsheet. We suspect therefore
that they deem examples as an adequate means for explaining
spreadsheets in general.</p>
        <p>5) EVALUATION: All knowledge items that contained
judgements were collected into this context dimension. Users
drew consequences based on their interpretation of the
spreadsheet content – for example, they assessed values or did
plausibility checks on numbers or said</p>
        <p>“[P21] - if that’s high then we are getting a lot!”
It helped them to make sense of the data, particularly for
decision-making based on this.</p>
        <p>Readers most rarely assessed values (1:7% vs. 3:0%), while
authors did so quite often (28:3% vs. 31:0%). This strongly
demos that readers need help when interacting with
spreadsheets, since they are often unable to make decisions based
on the given values. As we believe that enabling
decisionmaking is often a motivation to distribute spreadsheets, the
addition of support for this context dimension would be a
highly appreciated assistance in spreadsheet use.
(a) Complex
(b) Simple
6) FORMULA: This context dimension encompasses all
explanations concerning the computational aspects of formulae,
for example,
“It [N22] is just multiplied by the person-months, but
again it’s the same formula as [L22].”</p>
        <p>The precise formula to calculate a value was relatively
infrequently selected to explain spreadsheet content (max =
10:3%). This is astonishing as it is supposed to be the most
prominent feature of spreadsheets. We suspect that users
consider this a service feature of the spreadsheet software
which doesn’t carry much weight in terms of comprehending
a spreadsheet.</p>
        <p>7) PROVENANCE: Knowledge items were sorted into this
context dimension, if they referred to the origin of the
spreadsheet content encompassing cell dependencies in formulae, for
instance,
“It is EU guidelines with their cost model, they have
invented for the different entities . . . .”</p>
        <p>It is striking that authors and readers did select
PROVENANCE as context dimension to explain the complex
spreadsheet much alike. On the one hand both kinds of users
are able to explain the provenance. On the other hand, it
is worth noting that they noticed the information about the
data’s provenance either present in the spreadsheet or existing
in shared background knowledge. Obviously, a formula is
relevant for the computation of a certain value (see context
dimension FORMULA). But the presentation of the formula in
a spreadsheet is also relevant for understanding the coherence
of the data. In particular, a print-out of a spreadsheet
document (i.e., a ledger sheet) is much less valuable as it loses
provenance context. Authors especially value this information
facet in formulae.</p>
        <p>8) HISTORY: Knowledge items addressing the historical
context of a spreadsheet, i.e., reflections about the creation
process of a spreadsheet over time, are gathered here as, for
example, this one:
“(It’s) just some formulas . . . it was inserted
afterwards . . . later in the table . . . ”
Since spreadsheets are communication tools, they are often
updated and modified by different users, which may result in
changes in layout or data – sometimes leaving inconsistent, or
even, superfluous information.</p>
        <p>Naturally, this information is not reported in the spreadsheet
itself and is seldom of big interest. Our spreadsheets didn’t
have such a long history, so that the score with respect to
HISTORY was really low (max = 2:8%).</p>
        <p>9) ORGANIZATION: Here, we collected all knowledge
items that concern superficial qualities of the spreadsheet, such
as layout, data-format and data arrangement. In the following
you find a concrete example:
“So the spreadsheet starts with the lower table with all
the details, and the upper table is main summary, the key
figures, and the table on the right is a transition especially
made for the EU.”
This dimension effects the navigation and organizational
usability of the spreadsheet, for example giving different colors
to those cells where the input is expected (green) or red to
those cells that are out of the limit.</p>
        <p>Authors note and point out more organizational cues than
readers do. As the authors are the creators of the spreadsheet’s
organization they might want to point out the fine details of
their design. Another interesting fact consists in the much
higher rate of explanations in the ORGANIZATION context
dimension by authors and readers alike for the complex
spreadsheet. This means that this kind of organization also
represents a valuable context dimension if the content is
difficult to understand at once.</p>
        <p>10) PURPOSE: We sorted all knowledge items that included
(correct) intentional aspects of spreadsheet content into this
context dimension, e.g.</p>
        <p>“For each partner we need to set up a budget.”</p>
        <p>The PURPOSE context is implicitly hidden or explicitly
missing in the spreadsheets as readers do not select this kind
of explanation for the complex spreadsheet and only seldom
for the simple one (max = 3:0%).</p>
        <p>11) SIGNIFICANCE: Here, knowledge items were gathered
that bear information about what the significant aspects of the
spreadsheet data are, for example,</p>
        <p>“It’s just a remark, not to forget . . . ”</p>
        <p>SIGNIFICANCE was much more pronounced with the
authors than with the readers. This is probably due to the fact that
readers rarely could discern important aspects of the
spreadsheet content. SIGNIFICANCE was as important for authors for
the complex as it was for the simple spreadsheet, indicating
that they thought of the spreadsheet as a communication
medium with a message when designing it. Readers didn’t
notice at all differences in relevance for the simple spreadsheet.</p>
        <p>12) OTHER: This final context dimension contains all the
bits and pieces that couldn’t be sorted further. Additionally, it
also includes utterances of wrong assumptions and definitions,
doubts, guesses, and indications of ignorance like
“’Special Transition Flat Rate’ - no idea what it is!”</p>
        <p>This card pile really shouldn’t be considered a context
dimension, but it shows that readers even with a simple
spreadsheet frequently make semantic errors .</p>
      </sec>
      <sec id="sec-3-5">
        <title>D. Refinement of Context Dimensions</title>
        <p>
          In Table III we compare the context dimensions obtained
from the card sorting exercise reported on above, with those
from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Our new dimensions constitute an extension and
refinement: The Definition can be refined to STATEMENT,
REPHRASING, and DEFINITION. Similarly we now
differentiate Assessment of Value into BY-EXAMPLE and
EVALUATION. The OTHER pile cannot be considered a context
dimension, therefore we cannot consider this an extension.
        </p>
        <p>Context Dimensions
STATEMENT
REPHRASING</p>
        <p>DEFINITION
BY-EXAMPLE
EVALUATION
SIGNIFICANCE</p>
        <p>PURPOSE
ORGANIZATION</p>
        <p>PROVENANCE</p>
        <p>FORMULA
HISTORY</p>
        <p>OTHER</p>
        <p>Definition
Definition
Definition
Assessment of Value
Assessment of Value
Assessment of Purpose
Purpose
Purpose, Provenance
Provenance
Formula
History
n/a</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. CONCLUSION</title>
      <p>
        In this paper we have reported on a study we conducted
to deepen our understanding of spreadsheet context. By
interviewing authors and readers of spreadsheets and
differentiating between a simple and a complex spreadsheet, we could
observe clear differences between these two user roles. In
general, the readers missed out on a lot of context
dimensions, therefore making the case for assistance systems for
spreadsheet comprehension. Moreover, we could refine the set
of context dimensions given in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>This work has partially been supported by the German
Research Council under Grant KO 2428/13-1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Nardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Miller</surname>
          </string-name>
          , “
          <article-title>An ethnographic study of distributed problem solving in spreadsheet development,” in Proceedings of the 1990 ACM conference on Computer-supported cooperative work</article-title>
          . ACM Press,
          <year>1990</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T. R. G.</given-names>
            <surname>Green and M. Petre</surname>
          </string-name>
          , “
          <article-title>Usability analysis of visual programming environments: a 'cognitive dimensions' framework,”</article-title>
          <source>JOURNAL OF VISUAL LANGUAGES AND COMPUTING</source>
          , vol.
          <volume>7</volume>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>174</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Panko</surname>
          </string-name>
          , “
          <article-title>What we know about spreadsheet errors</article-title>
          ,
          <source>” Journal of Organizational and End User Computing (JOEUC)</source>
          , vol.
          <volume>10</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Clermont</surname>
          </string-name>
          , “
          <article-title>A scalable approach to spreadsheet visualization</article-title>
          ,” Klagenfurt University,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>K. M. Consulting</surname>
          </string-name>
          , “
          <article-title>Executive summary: Financial model review survey</article-title>
          ,
          <source>” KPMG, London, Tech. Rep.</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rajalingham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Chadwick</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Knight</surname>
          </string-name>
          , “Classification of spreadsheet errors,
          <source>” arXiv preprint arXiv:0805.4224</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          , “
          <article-title>Semantic transformation of spreadsheets,” Electronic Communications of the EASST</article-title>
          , vol. X,
          <year>2010</year>
          . [Online]. Available: http://www.easst.org/eceasst/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Osterlund</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Carlile</surname>
          </string-name>
          , “
          <article-title>How practice matters: A relational view of knowledge sharing,” in Communities</article-title>
          and Technologies,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huysmann</surname>
          </string-name>
          , E. Wenger, and V. Wulf, Eds. Kluwer Academic Publishers,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Probst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raub</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Romhardt</surname>
          </string-name>
          , Wissen managen, 4th ed. Gabler Verlag,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Davenport</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Prusak</surname>
          </string-name>
          , Working Knowledge, 2000th ed. Harvard Business School Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <article-title>Studying context: a comparison of activity theory, situated action models, and distributed cognition</article-title>
          . Cambridge, MA, USA: Massachusetts Institute of Technology,
          <year>1995</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>102</lpage>
          . [Online]. Available: http://dl.acm.org/citation.cfm?id=
          <volume>223826</volume>
          .
          <fpage>223830</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          , “
          <article-title>Compensating the computational bias of spreadsheets with MKM techniques</article-title>
          ,” in Intelligent Computer Mathematics. Springer,
          <year>2009</year>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>372</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mittermeir</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Clermont</surname>
          </string-name>
          , “
          <article-title>Finding high-level structures in spreadsheet programs</article-title>
          ,” in Reverse Engineering,
          <year>2002</year>
          .
          <source>Proceedings. Ninth Working Conference on. IEEE</source>
          ,
          <year>2002</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Nardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Miller</surname>
          </string-name>
          , “
          <article-title>Twinkling lights and nested loops: Distributed problem solving and spreadsheet development</article-title>
          ,”
          <source>International Journal of Man-Machine Studies</source>
          , vol.
          <volume>34</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>184</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Hendry</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. R. G.</given-names>
            <surname>Green</surname>
          </string-name>
          , “
          <article-title>Cogmap: a visual description language for spreadsheets</article-title>
          ,
          <source>” Journal of Visual Languages &amp; Computing</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>54</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Gilmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Blumenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Davies</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Winder</surname>
          </string-name>
          , “
          <article-title>Towards a cognitive browser for oops</article-title>
          ,”
          <source>International Journal of Human-Computer Interaction</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. K. O</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoep</surname>
          </string-name>
          , F. du
          <string-name>
            <surname>Preez</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Goble</surname>
          </string-name>
          , “Rightfield:
          <article-title>Embedding ontology annotation in spreadsheets</article-title>
          ,” Bioinformatics, vol.
          <volume>24</volume>
          , no.
          <issue>14</issue>
          , pp.
          <fpage>2021</fpage>
          -
          <lpage>2022</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          , “
          <article-title>Human-spreadsheet interaction,” in Human-Computer Interaction-INTERACT</article-title>
          <year>2013</year>
          . Springer,
          <year>2013</year>
          , pp.
          <fpage>571</fpage>
          -
          <lpage>578</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guseva</surname>
          </string-name>
          , “
          <article-title>Towards understanding context dimensions of spreadsheet knowledge</article-title>
          ,”
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dahlba</surname>
          </string-name>
          <article-title>¨ck, A</article-title>
          . Jo¨nsson, and L. Ahrenberg, “
          <article-title>Wizard of oz studies: Why and how,” Knowledge-based systems</article-title>
          , vol.
          <volume>6</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>258</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Spencer</surname>
          </string-name>
          ,
          <article-title>Card sorting: Designing usable categories</article-title>
          .
          <source>Rosenfeld Media</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Capra</surname>
          </string-name>
          , “
          <article-title>Factor analysis of card sort data: an alternative to hierarchical cluster analysis</article-title>
          ,
          <source>” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting</source>
          , vol.
          <volume>49</volume>
          , no. 5.
          <string-name>
            <given-names>SAGE</given-names>
            <surname>Publications</surname>
          </string-name>
          ,
          <year>2005</year>
          , pp.
          <fpage>691</fpage>
          -
          <lpage>695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Corbin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Strauss</surname>
          </string-name>
          , “
          <article-title>Grounded theory research: Procedures, canons, and evaluative criteria,” Qualitative sociology</article-title>
          , vol.
          <volume>13</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>A. Salmoni.</surname>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>Open card sort analysis 101</article-title>
          . [Online]. Available: http://www.uxbooth.com/articles/open-card
          <article-title>-sort-analysis-101/</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>