<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>What is Lost in Translation from Visual Graphics to Text for Accessibility</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Coppin (pcoppin@faculty.ocadu.ca)</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Adapted from “Web Accessibility Best Practices: Graphs” by Campus Information Technologies and Educational Services (CITES) and Disability Resources and Educational Services (DRES), University of Illinois at Urbana/Champaign. Copyright 2005 by University of Illinois at Urbana/Champaign</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Industrial Design, Faculty of Design, OCAD University</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <institution>ON M5T 1W1 CANADA Dept. of Mechanical and Industrial Engineering, University of Toronto</institution>
          ,
          <addr-line>Toronto, ON M5S 3G8</addr-line>
          <country country="CA">CANADA</country>
        </aff>
      </contrib-group>
      <fpage>276</fpage>
      <lpage>281</lpage>
      <abstract>
        <p>Many blind and low-vision individuals are unable to access digital graphics visually. Currently, the solution to this accessibility problem is to produce text descriptions of visual graphics, which are then translated via text-to-speech screen reader technology. However, if a text description can accurately convey the meaning intended by an author of a visualization, then why did the author create the visualization in the first place? This essay critically examines this problem by comparing the so-called graphic-linguistic distinction to similar distinctions between the properties of sound and speech. It also presents a provisional model for identifying visual properties of graphics that are not conveyed via text-tospeech translations, with the goal of informing the design of more effective sonic translations of visual graphics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Consider the experience of a blind or low-vision individual
who uses a screen reader to access pictures, diagrams,
charts, and graphs. Unlike a user who accesses graphical
media through visual perception, the screen reader user
usually accesses these graphics via text-to-speech
“descriptions,” essentially interpretations of what was
deemed most relevant by the person who produced the text
descriptions of the author’s intended meaning. For example,
Figure 1a presents a financial chart with rising and falling
stock prices over time, where time is shown on the
horizontal axis and monetary value is shown on the vertical
axis. Figure 1d presents a text description of the chart
compliant with the Web Content Accessibility Guidelines
(WCAG), using text to describe the rising and falling
monetary values over time. The next sections compare and
contrast how these presentations are experienced.</p>
      <p>In a text description of a visual graphic (Figure 1d), all of
the information is conveyed via text (or text-to-speech,
when conveyed via screen reader technology). But in the
original chart (Figure 1a), only some of the information is
conveyed via text, predominantly numerical values and
labels (Figure 1c); the shape of the shaded contour
(Figure 1b) is not conveyed via text: the visually perceived
shapes are picked up “more directly” and the features of
shapes are translated to text descriptions. However,
important properties of visually perceived shape information
(Figure 1b) are lost in translation and are instead conveyed
via text (Figure 1e). This shape information is needed to
provide the unique affordances that are often associated
with “visual” representations relative to text.</p>
      <p>
        Many scholars have explored the differences between
graphics and text, often referred to as the so-called
“graphic–linguistic distinction”
        <xref ref-type="bibr" rid="ref19">(Shimojima, 1999)</xref>
        . In
addition, researchers have investigated how so-called
“nonlinguistic sonification” can be employed to make charts and
graphs more accessible
        <xref ref-type="bibr" rid="ref11">(e.g., Edwards, 2010)</xref>
        . This essay
examines the graphic–linguistic distinction in order to better
understand how it could correspond to a similar distinction
between properties of non-linguistic sonification compared
to speech to provide a means to identify what is lost when
graphics are translated to text-to-speech. An increased
understanding could inform the design of new approaches
for conveying properties of graphically represented shapes
via sound.
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Graphic–Linguistic Distinction:</title>
    </sec>
    <sec id="sec-3">
      <title>Implications for Sonic Interface Design</title>
      <p>
        The graphic–linguistic distinction has been described in
various ways: analogical versus Fregean; analog versus
propositional; graphical versus sentential; and
diagrammatical versus linguistic
        <xref ref-type="bibr" rid="ref19">(Shimojima, 1999)</xref>
        .
According to
        <xref ref-type="bibr" rid="ref13">Larkin and Simon (1987)</xref>
        , a diagrammatic
representation can be defined as a “data structure in which
information is indexed by two-dimensional location”
whereas a sentential representation can be defined as “a data
structure in which elements appear in a single sequence”.
An advantage of diagrams is they “preserve explicitly the
information about the topographical and geometric relations
among the components of the problem.” For the purposes of
this essay, the text description in Figure 1e are classified as
sentential because the text is composed of marks arranged in
a linear sequence and the marks are taken to refer to words
with linguistic meanings (linguistically conveyed elements).
In contrast, Figure 1a is classified as a diagram because the
financial values are indicated via (textually) labeled points
or lines (elements) that are indexed to a graphical grid. The
visually processed spatial relations among these labeled
marks yield powerful affordances, because by processing
the contours of lines or the relative positions of marks
scattered across the two-dimensional graphical surface, the
viewer can infer values and trends that are not explicitly
conveyed via labels
        <xref ref-type="bibr" rid="ref5">(cf. Barwise &amp; Etchemendy, 1990)</xref>
        .
      </p>
      <sec id="sec-3-1">
        <title>Implications for sonic charts and graphs</title>
        <p>Sonic sentential properties. Text-to-speech (the current
standard for WCAG accessibility) would seem to be the
obvious candidate for the sonic version of what Larkin and
Simon referred to as a sentential structure, where elements
are arranged in a linear sequence. In the case of visually
processed written sentences composed of word forms
printed on a page, the sequential properties result from the
linear arrangement of characters and word forms on the
printed surface. In the case of sonic sentential structures, the
sequential properties are temporal, presented as a sequence
of sounds that are perceptually processed as words that refer
to intended meanings. Larkin and Simon did not define what
the elements (that are arranged in sequence) are composed
of. For the purpose of this subsection, let us assume that the
elements are some combination of properties that, when
sequentially processed as words, refer to intended items.</p>
        <p>
          Sonic diagrammatic properties. To present diagrammatic
properties in a way that can be perceived aurally, designers
would need to exploit properties of sound that can convey
topological and geometric relations. People use stereo, echo,
and the Doppler effect to determine the spatial locations of
sound-producing objects in physical environments
          <xref ref-type="bibr" rid="ref15">(cf. Nasir
&amp; Roberts, 2007)</xref>
          . Designers could exploit these cues to
convey geometric and topological relations among elements
that are indexed to a 2D plane
          <xref ref-type="bibr" rid="ref6">(cf. Brown, Ramloll, Burton,
&amp; Riedel, 2003; Hermann, Hunt, &amp; Neuhoff, 2011)</xref>
          .
Figure 2 shows how left and right arrow keys could move
an “audio cursor” to different positions on an x-axis of a
computationally generated 2D space. The position of the
sonically conveyed cursor on the x-axis could be indicated
via stereo
          <xref ref-type="bibr" rid="ref22">(cf. Zhao, Plaisant, Shneiderman, &amp; Lazar, 2008)</xref>
          .
For a simple spark line graph, the sonic cursor can alter the
pitch of the sound if “scrubbed” to different points on the
xaxis, so that higher pitches correspond to points that
intersect with the cursor at higher elevations (Figure 2,
right) and lower pitches correspond to points that intersect
with the cursor at lower elevations, thereby allowing blind
or low-vision users to perceive the contours of the graph
          <xref ref-type="bibr" rid="ref6">(cf.
Brown, Ramloll, Burton, &amp; Riedel, 2003)</xref>
          .
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Relation Symbols and Object Symbols</title>
        <p>
          According to
          <xref ref-type="bibr" rid="ref18">Russell (1923)</xref>
          , in sentences “words which
mean relations are not themselves relations,” whereas in
graphical representations like maps, “a relation is
represented by a relation.” An example of the latter is the
financial chart (e.g., Figure 1a), where higher monetary
values are conveyed via marks at higher elevations of the
graphic, whereas lower monetary values are conveyed via
marks at lower elevations. This convention allows the
visually perceived spatial relationships among the marks to
represent relationships among monetary values over time.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Implications for sonic charts and graphs</title>
        <p>Graphical relations could be conveyed sonically. Consider
two tones with different pitches: Tone A and Tone B
(Figure 2, right). If Tone A is at a lower frequency than
Tone B, then the sonic relation between the two tones is the
perceptible difference in pitch between the tones. For
example, if Tone A refers to a stock price at an earlier point
in time, and Tone B refers to a stock price at a later point in
time, then the perceptible difference between the pitches of
the tones can convey the difference in price over time.
Moving the sonic cursor from left to right woul dPitch correspond
to a change (increase) in pitch, conveying the cStehreoange in
stock price over time via a sonic relation.</p>
        <p>“A is lower than B and
B is to the right of A”
Pitch</p>
        <p>A</p>
        <p>B</p>
        <p>D</p>
        <p>E</p>
        <p>C
Stereo</p>
        <p>Cursor Moves
to Right</p>
        <p>Pitch
Increases</p>
      </sec>
      <sec id="sec-3-4">
        <title>Analog Versus Digital</title>
        <p>
          The classic distinction between analog versus digital, where
analog refers to visual properties of a graphic and digital
refers to linguistic properties, is most commonly associated
with
          <xref ref-type="bibr" rid="ref10">Goodman (1968)</xref>
          .
          <xref ref-type="bibr" rid="ref19">Shimojima (1999)</xref>
          illustrated this
distinction using the example of a speedometer dial. The
analog aspect of the dial is the perceived orientation of the
speedometer needle relative to the numerically labeled
marks on the dial. The digital aspect is the numerical
magnitude (speed) that the user extrapolates by perceptually
processing the orientation of the needle relative to the marks
representing numerical values.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>Implications for sonic charts and graphs</title>
        <p>
          The analog versus digital distinction appears to involve two
interrelated capabilities: lower-level perceptual capabilities
to process geometric and topological properties (e.g., those
shown on the speedometer dial); and higher-level
capabilities to process, filter, and interpret how those
perceptually processed features fall into conceptual
categories (e.g., the numerically represented velocity)
          <xref ref-type="bibr" rid="ref14">(Mandler, 2006; Figure 3)</xref>
          . For instance, to discern the
values shown on a visual financial chart, a user must
perceptually process the light reflected from the surface of
the chart, observing lines in relation to dots that are labeled
using textually conveyed numerical values and/or company
names. To discern topological and geometric features using
sound perception, a user would need the same set of
interrelated capabilities: lower-level capabilities to process
varying sound frequencies, timbre, etc., as well as
higherlevel capabilities to identify the linguistic meanings of the
sounds. The current text-to-speech approach only exploits
the digital properties of language – but designers could
produce more effective translations by recruiting
“precategorized” analog properties of sound such as pitch, echo,
stereo, and timbre to convey geometric and topological
properties.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Intrinsic Versus Extrinsic Constraints</title>
        <p>
          For brevity, the following discussion will use the classic
characterization provided by
          <xref ref-type="bibr" rid="ref5">Barwise and Etchemendy
(1990)</xref>
          because it is compact and intuitive:
        </p>
        <p>Diagrams are physical situations. They must be, since we
can see them. As such, they obey their own set of constraints
. . . By choosing a representational scheme appropriately,
so that the constraints on the diagrams have a good match
with the constraints on the described situation, the diagram
can generate a lot of information that the user never need
infer. Rather, the user can simply read off facts from the
diagram as needed. This situation is in stark contrast to
sentential inference, where even the most trivial
consequence needs to be inferred explicitly.</p>
        <p>To illustrate how “diagrams are physical situations,”
consider the illustration shown in Figure 2 (left). A text (or
text-to-speech) description might go as follows: “A is below
B and both A and B are to the left of C.” Another textual
description might read: “B is between A and C and is above
both A and C.” Each text description conveys a different
interpretation of what is shown visually and therefore
affords different inferences. In contrast, a diagram can
convey many other relationships because of how it conveys
topological and geometric information through visual
perception: Barwise and Etchemendy referred to this as a
diagram’s ability to present “countless facts.”</p>
      </sec>
      <sec id="sec-3-7">
        <title>Implications for sonic charts and graphs</title>
        <p>
          When
          <xref ref-type="bibr" rid="ref5">Barwise and Etchemendy (1990)</xref>
          referred to diagrams
as “physical situations,” they were referring to the properties
(and affordances) of diagrams that emerge through
interaction via a human visual perception system. The
challenge for designers who seek to extend the affordances
of visual diagrams to the sonic domain is to identify
properties or dimensions of sound that similarly (i.e., using
human perceptual processing of sound) make use of
“physical situations” to present “countless facts.”
        </p>
        <p>Thus, a hybrid stereo–varying frequency interface (see
Figure 3) should enable a user to “hear the shape” of a
contour. Indexing text-to-speech labels to contours should
allow users to form multiple sentences (countless facts)
about the geometric and/or topological relations among the
labeled elements.</p>
      </sec>
      <sec id="sec-3-8">
        <title>Extending the Graphic–Linguistic Distinction into the Sonic Domain</title>
        <p>Let us now extend on the various graphic–linguistic
distinctions to consider sonic versions of visual charts and
graphs.</p>
        <p>1. Extending on the diagrammatic versus sentential
distinction, text-to-speech can be considered a sonic version
of what Larkin and Simon referred to as a sentential
structure and is the current WCAG approach to web
accessibility. In contrast, spatial sound can be exploited to
convey 2D sonic diagrammatic external representations.</p>
        <p>2. Extending on the analog versus digital distinction,
textto-speech uses language to convey digital properties
sonically. The analog properties of sound, such as tone,
timbre, stereo, and echo could afford the communication of
spatial, geometric, or topological information.</p>
        <p>3. Extending on the distinction between relation symbols
and object symbols, the current text-to-speech approach
uses words to convey relations. Because relations among
elements represented by analog and spatial properties of
sound are themselves relations, analog and spatial properties
of sound could be recruited to map numerical values to
perceptual dimensions.</p>
        <p>4. Extending on the distinction between intrinsic and
extrinsic constraints, producing sonic versions of visual
graphics would require identifying “physical situations” that
naturally emerge during human perceptual processing of
sound to present “countless facts.”</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Perceptual and Conceptual Graphic Relations</title>
      <p>This section integrates these extensions and proposes how
the graphic–linguistic distinction could be extended to sonic
external representations. First, let us recruit and expand on
the distinction between lower-level perceptually processed
topological and geometric features of an environment versus
the recognition, categorization, and linguistic
communication of those features.</p>
      <p>Visual and aural sentential structures and relations are
detected and perceptually processed via lower-level sensory
receptors and perceptual categories (Figure 3, left). In
written text or text-to-speech, what is most relevant is the
higher-level conceptual category (Figure 3, right) that a
given feature (such as perceptually processed printed text on
a page or text-to-speech) is taken to fall under. What is
needed is a way to convey topological and geometric
relations among elements by exploiting lower-level
perceptually processed features of a visual graphic or sonic
structure (Figure 3, left). Let us refer to these perceptually
processed features as perceptual properties. Let us refer to
these perceptually processed relations among elements as
perceptual relations. Let us refer to relations that are
communicated via text as text-described relations.</p>
      <sec id="sec-4-1">
        <title>Perceptual Relations vs. Text-Described Relations</title>
        <p>
          We are now ready to build on previous work by
          <xref ref-type="bibr" rid="ref7">Coppin
(2014)</xref>
          to provide a theoretical foundation for distinguishing
perceptual relations versus text-described relations.
        </p>
        <p>
          The model is based on the idea that an individual’s
perception–reaction loop
          <xref ref-type="bibr" rid="ref9">(cf. Gibson, 1986)</xref>
          enables survival
and prosperity within a dynamic environment composed of
change and variation. This requires capabilities to predict,
anticipate, and simulate
          <xref ref-type="bibr" rid="ref1">(Barsalou, 1999)</xref>
          dynamic change
and variation. For example, reaching for and grasping an
item such as a cup requires capabilities to perceptually
process features from the proximal surface of the item and
also to predict, anticipate, and simulate features of the distal
surface of the item.
        </p>
        <p>
          These simulations are constructed from the memory
traces of past perception–reactions (conjunctive neurons), so
simulation involves many of the same neural systems used
during perception
          <xref ref-type="bibr" rid="ref12">(Kosslyn, Ganis, &amp; Thompson, 2001)</xref>
          .
For example, as I perceive the cup, I am also informing
potential action (reaching for and grasping the proximal and
distal sides of the cup). Thus, perception and simulation are
integrated aspects of perception–reaction within a physical
environment, and each act of perception–reaction leaves
memory traces in the form of conjunctive neurons across
lower-level association areas (Figure 3).
        </p>
        <p>
          At lower-level association areas, which are more tightly
coupled with sensory receptors, simulated prototypes fall
under perceptual categories. At higher-level association
areas (see Figure 3, right), conjunctive neurons converge in
zones across multiple sensory modes. These “convergence
zones”
          <xref ref-type="bibr" rid="ref2 ref20 ref8">(Damasio, 1989; Simmons &amp; Barsalou, 2003)</xref>
          enable
simulated prototypes of possible perception–reactions that
are not as easily described in terms of a specific perceptual
mode or a reenactment of a specific prior perception–action.
Instead, these simulated prototypes fall under more general
categories of possible perception–actions
          <xref ref-type="bibr" rid="ref2 ref20">(Barsalou, 2003)</xref>
          .
These are not only more amodal, but have been described as
more filtered, interpreted
          <xref ref-type="bibr" rid="ref17">(Pylyshyn, 1973)</xref>
          , conceptual
          <xref ref-type="bibr" rid="ref2 ref20 ref3">(Barsalou, 2003, 2005)</xref>
          , or abstract
          <xref ref-type="bibr" rid="ref2 ref20">(Barsalou, 2003)</xref>
          . For
example, a child who takes a bite out of what turns out to be
a rotten apple might later reenact this experience when she
perceives another rotten apple with common properties.
Over time, she will develop an understanding of ‘rotten’ as a
category that can include apples, as well as many other
objects and experiences.
        </p>
        <p>Similarly, a child can learn to associate sounds with
certain intended meanings (learning a language), or to
associate marks with intended meaning (learning to read).
The abstract concept of ‘square’ can apply to a shape on a
raised surface that is touched but not seen, as well as to a
drawing on a piece of paper that is seen and not touched.
These “less modally specific” simulations have been
described as more “interpreted” or “conceptual,” while more
perceptually based simulations are considered to be more
“concrete.” The next section applies this interpretation to
external graphic representation.</p>
        <p>Back to charts and graphs. In a financial chart (and
many other kinds of diagrams), relations are conveyed via
lower-level perceptual processing of the geometrical and
topological properties of the marked physical surface
(Table 1). In contrast, in text descriptions (sentential
structures), relations are conceptual (and conveyed
linguistically; see Table 2); although visual properties of
printed text or aural properties of text-to-speech are also
picked up by sensory receptors, what is meaningful about
them is the conceptual relation that is conveyed
linguistically.
The idea of “specificity” is central to understanding what is
lost in translation, so let us begin by clarifying what is
meant by “more or less specific” in this context. Consider
the line shown in Figure 4b. Relative to the line of Figure
4c, we have more knowledge about the location of a point in
a one-dimensional space, due to the shaded red marker. This
means we have more certainty (or more information) about
the specified location of the point in Figure 4b than we do
about the location of the point in Figure 4c.</p>
        <p>Extending the line example to discuss perceptual
relations, Figure 4b refers to intentionally configured marks
or sounds from an author to cause intended audience
percepts (the diagram in Figure 4a). However, the
perceptual relations of Figure 4a can be processed, filtered,
and interpreted to fall under a range of possible relational
categories (that can be text-described), indicated by the
highlighted segment of the right line in Figure 4c (as shown
in Figure 4d: “A is below B and both A and B are to the left
of C” or “B is between A and C and is above both A and
C”). In other words, although perceptual specificity is high,
conceptual specificity of the intended relation is low
because the perceptual relations can fall under numerous
conceptual categories. However, the reverse is also true and
this reversal exposes the heart of what is lost during the
translation process.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Conceptual Specificity is Perceptually Ambiguous</title>
        <p>Extending the line example to discuss the perceptual
ambiguity of text-described (conceptual) relations, the right
highlighted line in Figure 5c refers to a specific (sentential)
text description authored to convey intended conceptual
relations (Figure 5d). However, numerous perceptual
relations (Figure 5a) can fall under the text-described
conceptual relations, indicated by the highlighted segment
of the left line in Figure 5b. In other words, although
conceptual specificity is high, perceptual specificity of the
intended relations is low, because numerous perceptual
relations can fall under the text-described conceptual
relations.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Application to an Example Design Problem</title>
        <p>Let us now return to the WCAG text description example
from Figure 1 in order to demonstrate what is lost in
translation and how what is lost could be conveyed via
nonlinguistic sound. In the text description (Figure 1d), the
problem is that all content is conveyed conceptually (via
text-to-speech) whereas the original visual graphic that the
text description is based on conveys much of the content
(the contour of the shape) perceptually: Perceptual relations
are lost and replaced by conceptual relations, generating
perceptual ambiguity. If the objective is to present Figure 1a
sonically, how can a designer decide which aspects should
be conveyed via conceptual properties (text-to-speech) and
which aspects should be conveyed via perceptual sonic
properties (such as spatial sound)?</p>
        <p>Recall the perceptual distinction, where perceptual
properties are predicted to afford the communication of
concrete structures more effectively compared with
conceptual properties, and an aspect of a graphic can be
identified as “more concrete” if it produces a perceptual
structure that corresponds to what could be picked up and
perceptually processed from a physical environment. In this
account, the graphically represented shape contour
(Figure 1b) is primarily perceptual, and is therefore more
appropriate for translation to sonic properties that can use
spatial sound to convey geometric and topological relations
among conceptually conveyed objects.</p>
        <p>To determine which aspects of a graphic should be
conveyed via text-to-speech, recall the conceptual
distinction: text is predicted to afford the communication of
abstract conceptual categories more effectively compared
with perceptual properties, and a concept can be identified
as more abstract if it is more amodal. In other words, it is
less easily mapped back to a structure that could be picked
up and perceptually processed from a physical environment.
Under this account, the numbers that label increments on the
x and y axes (Figure 1a) are more conceptual because they
cannot be mapped back to a perceptual structure that could
be picked up from a physical environment.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This essay proposes a provisional model to underpin the
various accounts of the graphic–linguistic distinction
described in the literature as a means to extend the graphic–
linguistic distinction into aural domains. The model makes
the distinction in terms of lower level perceptual capabilities
that enable perceivers to perceptually process concrete
structures (e.g., geometric and topological features) on the
one hand, and higher level capabilities that enable
perceivers to process and interpret how those perceptually
processed structures fall under more abstract conceptual
categories on the other.</p>
      <p>Due to these distinctions, the model predicts that
perceptual relations (conveyed via graphics or non-linguistic
sonification) afford the communication of concrete relations
(conveyed via text or text-to-speech) more effectively
compared to conceptual relations conveyed via text or
textto-speech. In addition, the model predicts that conceptual
relations (conveyed via text or text-to-speech) afford the
communication of abstract relations more effectively
compared to perceptual relations conveyed via graphics or
non-linguistic sonification. This could be tested, for
example, by observing whether perceivers can identify
visual data sets more accurately using sonification or text
descriptions.</p>
      <p>In addition, the model streamlines accounts that
distinguish diagrammatic from sentential structures to
(1) characterize sentential structures as composed of
conceptual relations among conceptual objects on the one
hand, and (2) diagrammatic structures as perceptually
represented relations among conceptual objects on the other.
Under this account, (3) a sonic diagram is conceptualized as
sonically conveyed relations among linguistically conveyed
(via text-to-speech) objects.</p>
      <p>This model is useful within a design context because
designers lack clear models or guidelines for converting
visual graphics into non-visual perceptual modes. This can
be seen in the WCAG text description example, which
ignores the pictorial properties of graphics.</p>
      <p>By reverse engineering the classic graphic–linguistic
distinction to more fundamental perceptual principles, this
model provides a way to understand how the distinction
applies to sonic representations. This approach can also be
applied to haptic representations but the focus of this paper
was on sound for its ubiquity in the consumer market.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was supported in part by grants from the
Centre for Innovation in Data-Driven Design and the
Graphics Animation and New Media Centre for Excellence.
I would like to thank Research Assistant Ambrose Li for his
assistance in the preparation of this essay and Dr. David
Steinman for the many fruitful conversations that helped
inform the ideas explored in the work described here.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L. W.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Perceptual symbol systems</article-title>
          .
          <source>Behavioral &amp; Brain Sciences</source>
          ,
          <volume>22</volume>
          ,
          <fpage>577</fpage>
          -
          <lpage>660</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L. W.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>Abstraction in perceptual symbol systems</article-title>
          .
          <source>Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences</source>
          ,
          <volume>358</volume>
          (
          <issue>1435</issue>
          ),
          <fpage>1177</fpage>
          -
          <lpage>1187</lpage>
          . doi:
          <volume>10</volume>
          .1098/rstb.
          <year>2003</year>
          .1319
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L. W.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>Abstraction as dynamic interpretation in perceptual symbol systems</article-title>
          . In L.
          <string-name>
            <surname>Gershkoff-Stowe</surname>
          </string-name>
          &amp; D. Rakison (Eds.), Carnegie Symposium Series: Building object categories (pp.
          <fpage>389</fpage>
          -
          <lpage>431</lpage>
          ). Majwah, NJ: Erlbaum.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L. W.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Simulation, situated conceptualization, and prediction</article-title>
          .
          <source>Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences</source>
          ,
          <volume>364</volume>
          (
          <issue>1521</issue>
          ):
          <fpage>1281</fpage>
          -
          <lpage>1289</lpage>
          . doi:
          <volume>10</volume>
          .1098/rstb.
          <year>2008</year>
          .0319
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Barwise</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Etchemendy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1990</year>
          .
          <article-title>Visual information and valid reasoning</article-title>
          . In W. Zimmerman (Ed.), Visualization in mathematics (pp.
          <fpage>8</fpage>
          -
          <lpage>23</lpage>
          ). Washington, DC: Mathematical Association of America.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brewster</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramloll</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>Design guidelines for audio presentation of graphs and tables</article-title>
          .
          <source>International Conference on Auditory Display.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Coppin</surname>
            ,
            <given-names>P. W.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Perceptual-cognitive properties of pictures, diagrams, and sentences: Toward a science of visual information design (Doctoral dissertation</article-title>
          , University of Toronto, Toronto, Canada). Retrieved from https:// tspace.library.utoronto.ca/handle/1807/44108
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Damasio</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          <year>1989</year>
          .
          <article-title>The brain binds entities and events by multiregional activation from convergence zones</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>123</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <year>1986</year>
          .
          <article-title>The ecological approach to visual perception</article-title>
          . Hillsdale, NJ: Lawrence Erlbaum.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>1968</year>
          .
          <article-title>Languages of art: An approach to a theory of symbols</article-title>
          . Indianapolis, IN: Bobbs-Merrill Company.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>A. D. N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Auditory display in assistive technology</article-title>
          .
          <source>In T. Hermann &amp; A. Hunt (Eds.)</source>
          ,
          <source>The Sonification Handbook</source>
          (
          <volume>431</volume>
          -
          <fpage>453</fpage>
          ). Berlin: Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Kosslyn</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>W. L.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Neural foundations of imagery</article-title>
          .
          <source>Nature Reviews Neuroscience</source>
          ,
          <volume>2</volume>
          (
          <issue>9</issue>
          ),
          <fpage>635</fpage>
          -
          <lpage>642</lpage>
          . doi:
          <volume>10</volume>
          .1038/35090055
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Larkin</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          <year>1987</year>
          .
          <article-title>Why a diagram is (sometimes) worth ten thousand words</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>11</volume>
          ,
          <fpage>65</fpage>
          -
          <lpage>99</lpage>
          . doi:
          <volume>10</volume>
          .1111/j.1551-
          <fpage>6708</fpage>
          .
          <year>1987</year>
          .tb00863.x
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Mandler</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Categorization, development of</article-title>
          .
          <source>In Encyclopedia of Cognitive Science. doi:10</source>
          .1002 /0470018860.s00516
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Nasir</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Sonification of spatial data</article-title>
          .
          <source>In 13th International Conference on Auditory Display (ICAD</source>
          <year>2007</year>
          )
          <article-title>(pp</article-title>
          .
          <fpage>112</fpage>
          -
          <lpage>119</lpage>
          ). ICAD.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <year>1978</year>
          .
          <article-title>Fundamental aspects of cognitive representation</article-title>
          . In E. Rosch &amp;
          <string-name>
            <surname>B. B. Llyod</surname>
          </string-name>
          (Eds.) Cognition and Categorization,
          <volume>259</volume>
          -
          <fpage>303</fpage>
          . Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Pylyshyn</surname>
            ,
            <given-names>Z. W.</given-names>
          </string-name>
          <year>1973</year>
          .
          <article-title>What the mind's eye tells the mind's brain: A critique of mental imagery</article-title>
          .
          <source>Psychological Bulletin</source>
          ,
          <volume>80</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>1923</year>
          . Vagueness.
          <source>Australasian Journal of Psychology and Philosophy</source>
          ,
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>84</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .1080 /00048402308540623
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Shimojima</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>The graphic-linguistic distinction: Exploring alternatives</article-title>
          .
          <source>Artificial Intelligence Review</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ),
          <fpage>313</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Simmons</surname>
            ,
            <given-names>W. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L. W.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>The similarityin-topography principle: reconciling theories of conceptual deficits</article-title>
          .
          <source>Cognitive neuropsychology</source>
          ,
          <volume>20</volume>
          ,
          <fpage>451</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Spence</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Crossmodal correspondences: A tutorial review</article-title>
          . Attention, Perception, &amp;
          <string-name>
            <surname>Psychophysics</surname>
          </string-name>
          ,
          <volume>73</volume>
          (
          <issue>4</issue>
          ),
          <fpage>971</fpage>
          -
          <lpage>995</lpage>
          . doi:
          <volume>10</volume>
          .3758/s13414-010-0073-7
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaisant</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shneiderman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lazar</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Data sonification for users with visual impairment: a case study with georeferenced data</article-title>
          .
          <source>ACM Transactions on Computer-Human Interaction (TOCHI)</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>