<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Occurrence and Distribution of Spatial Reference Relative to Discourse Relations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Blake Stephen Howald</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgetown University, Department of Linguistics</institution>
          ,
          <addr-line>ICC 479, 37th and O Streets, NW Washington, DC 20057-1051</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>I present a descriptive analysis of reference to physical space in the Penn Discourse TreeBank. In particular, I analyze the occurrence of spatial prepositional phrases relative to the discourse relations and semantic senses that hold between two adjacent clauses. The purpose of this investigation is twofold: (1) to better understand how often spatial reference occurs in discourse and (2) to investigate possible relationships between spatial reference and discourse semantics. Overall, the distribution of spatial prepositional phrases and relationsense pairs are similar. However, statistical evidence suggests that the inclusion of spatial reference in a given clause is independent of the relation-sense of that clause and adjacent clauses. While these results, as applied to the PDTB, indicate the absence of a default pattern of occurrence and discourse semantic function of spatial information, they can nonetheless be extrapolated to provide crucial insights for fully understanding models of spatial representation and interpretation in discourse generally.</p>
      </abstract>
      <kwd-group>
        <kwd>Spatial Reference</kwd>
        <kwd>Discourse Relations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The semantic and pragmatic functions of discourse relations, which hold between two
clauses, contribute to a text’s coherence [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example, in the two-line discourse
(a) Lucy is not hungry (b) Cati fed her, (b) is an EXPLANATION for (a) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
inclusion of spatial reference, while accounted for in definitions of discourse relations
(e.g., BACKGROUND), is not strictly necessary. However, recent research, grounded in
spatial cognitive psychology (e.g., cognitive maps), has suggested that space plays a
larger role in discourse structure; in particular, spatial reference organizes narrative
discourse into spatially defined groups of events that are temporally linked [
        <xref ref-type="bibr" rid="ref3 ref4">3-4</xref>
        ].
While this research presents a new analytical perspective, before it can be fully
exploited, it is first necessary to better understand what relationships may exist
between spatial reference and discourse relations generally.
1 I would like to thank two anonymous reviewers, my dissertation advisor E. Graham Katz,
James Pustejovsky and David Herman for beneficial insights and discussion.
      </p>
      <p>
        This paper presents the results of a descriptive analysis that evaluates the interface
of spatial information and discourse. The particular research question addressed is:
Does the occurrence of spatial reference in discourse pattern relative to discourse
relations? A negative answer, which is suggested by existing definitions of discourse
relations, indicates that spatial reference is independent of discourse relations. An
affirmative answer indicates that spatial reference is dependent on (certain) discourse
relations. This paper is arranged as follows: Section 2 discusses spatial information
(as defined by The Preposition Project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), discourse relations (as defined by Penn
Discourse TreeBank (PDTB) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and the methodology employed. Section 3 presents
the distribution of spatial prepositional phrases relative to discourse relations. Section
4 concludes.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background, Data and Methodology</title>
      <p>
        In this paper, “spatial reference” refers to physical relationships arranged in figure
and ground relationships. For example, the cup is on the table locates the figure the
cup relative to the ground the table.2 A search algorithm was developed to
automatically extract 334 different prepositions defined in the Preposition Project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
(based on a hierarchical network of dictionary entries). 107 of the 334 prepositions
have a distinct “spatial” sense. Because prepositions are highly ambiguous (e.g.,
numerous non-spatial senses), the prepositions extracted from the PDTB were
disambiguated by hand.
      </p>
      <p>
        The PDTB includes annotations of discourse relations in the Penn Treebank II
version of the Wall Street Journal (WSJ) corpus [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Discourse relations in the PDTB
(which hold between pairs of syntactically classified arguments from Penn TreeBank
II) (“ArgPairs”) are a confluence of connective words, content of the ArgPairs and
semantic senses. ArgPairs are either: Explicit – a syntactically classified connective
word exists in the text (but, and); Implicit – a connective word does not exist in the
text but can be inferred; EntRel – no relation holds, but the second clause in the
ArgPair includes more information about the first clause; AltLex – there is no
connective word, but a non-connective expression can capture an inferred relation;
and NoRel – no relation holds. Explicit, Implicit and AltLex ArgPairs co-occur with
one of four senses: Temporal, Contingency, Comparison and Expansion. The PDTB
includes 2159 annotated documents, 40,600 relations and 34,877 senses in total. The
overall distribution of the relations and senses in the PDTB provide a baseline of
relation-senses. The occurrence or non-occurrence of spatial reference overall, and
relative to particular relation-senses and pairs of relation-senses, can then be
compared to this baseline to determine relevant (statistically significant) differences
and potential patterns.
2 For sake of brevity, I am restricting the discussion to figure and ground relationships indexed
by spatial prepositions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Other sources include motion verbs (run, follow), deictic verbs
(come, go) and deictic adverbs (here, there).
      </p>
      <p>200 documents (approximately 10% of the total PDTB), consisting of 5000
relations and 4388 senses, were selected for analysis. If one or both of the arguments
in an ArgPair contained one or more spatial prepositions, then these are referred to as
Spatial ArgPairs.3 The occurrence of Spatial ArgPairs is roughly equally distributed
between each argument (Arg1 – 54.15%; Arg2 – 45.84%). The average percentage of
Spatial ArgPairs per document is 28.90%. The sample selected for analysis conforms
to the general relation and sense distributions in the PDTB (Table 1).
There does not seem to be any independent pattern demonstrated by the Spatial, as
compared to Non-Spatial, ArgPairs. This is supported by Χ2. H0 is that the occurrence
or non-occurrence of spatial reference is independent of a given relation-sense. For
the top six relation-senses occurring in the sample (Explicit-Expansion (EE),
ExplicitComparison (EP), ENT, and Implicit-Contingency (IC)), H0 can be accepted as the
pvalue is greater than .05 and rejected for the Implicit-Expansion (IE) and
ExplicitTemporal (ET) relation-senses as the p-value is less than .05 (Table 2).
3 40 of the 107 Preposition Project prepositions are represented in the analyzed sample (N =
2214) with common prepositions making up the majority (82.92%): in – 880 (39.74%); at –
335 (15.13%); to – 250 (11.29%); on – 142 (6.41%); from – 130 (5.07%); of – 117 (5.28%).</p>
      <p>The remaining 36 prepositions account for the 17.08% complement.</p>
      <p>However, the effect that is being exhibited by the IE and ET relation-senses arguably
has more to do with the occurrence of Non-Spatial ArgPairs because of the
comparative number (1073 Spatial vs. 384 Non-Spatial for IE and 796 vs. 197 for
EE). For pairs of relation-sense s, H0 can be accepted in all cases (the top six pairs of
relation-senses in Table 2) as the p-value is greater than .05. This indicates that, even
in greater local context, the occurrence or non-occurrence of spatial information is
independent of a given pair of relation-senses.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions and Limitations</title>
      <p>
        In sum, as applied to the PDTB, for the studied sample, there is statistical evidence
to support a negative answer to the posed research question: whether or not a figure
and ground relationship occurs, indexed by a spatial preposition, is independent of the
type of discourse relation. This insight may prove useful in interpreting the results of
computational tasks that interpret, represent and analyze spatial information in
discourse. The main limitations in this study are the amount of data and scope. Future
research will focus on more linguistic spatial phenomenon and larger corpora with
varied genres (the WSJ corpus consists of Essays, Summaries, Letters and News; the
latter of which accounts for roughly 90% of all text in the corpus [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). Nonetheless,
the present results facilitate a more complete understanding of spatial reference in
discourse structure. The occurrence of spatial reference does not appear to be biased
by inherent discourse patterning.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hobbs</surname>
          </string-name>
          , J.:
          <article-title>On the Coherence and Structure of Discourse</article-title>
          .
          <source>Technical Report CSLI-85-37</source>
          ,
          <article-title>Center for the Study of Language and Information</article-title>
          , Stanford University (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Asher</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lascarides</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          : Logics of Conversation. Cambridge University Press, Cambridge, UK (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Spatial Reference in Narrative Domains</article-title>
          .
          <source>Text</source>
          <volume>21</volume>
          (
          <issue>4</issue>
          ),
          <fpage>515</fpage>
          --
          <lpage>541</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Howald</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Granularity Contours and Event Domain Classifications in Spatially Rich Narratives of Crime</article-title>
          .
          <source>COSIT 2009 Workshop on Presenting Spatial Information: Granularity</source>
          , Relevance, and
          <string-name>
            <surname>Integration</surname>
          </string-name>
          , Aber Wrac'h, France, http://repository.unimelb.edu.au/10187/5516 (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Litkowski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Digraph analysis of dictionary preposition definitions</article-title>
          .
          <source>In: Proceedings of the SIGLEX/ SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions</source>
          , pp.
          <fpage>9</fpage>
          --
          <lpage>16</lpage>
          . Association for Computational Linguistics (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miltsakaki</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinesh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robaldo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The Penn Discourse Treebank 2.0 Annotation Manual</article-title>
          . The PDTB Research Group (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Asbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehrke</surname>
            , B., van Riemsdijk,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zwarts</surname>
          </string-name>
          , J.:
          <article-title>Introduction: Syntax and Semantics of Spatial P</article-title>
          . In: Asbury,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Dotlacˇil</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Gehrke</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nouwen</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.) Syntax and Semantics of Spatial P, pp.
          <fpage>1</fpage>
          --
          <lpage>32</lpage>
          . John Benjamins, Amsterdam &amp; Philadelphia (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Marcus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santorini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcinkiewicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Building a Large Annotated Corpus of English: The Penn Treebank</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <fpage>313</fpage>
          --
          <lpage>330</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pitler</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghupathy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Easily Identifiable Discourse Relations</article-title>
          .
          <source>COLING</source>
          <year>2008</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>