<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Syntactic Disambiguation for the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Knowledge Representation Formalisms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Methods - Semantic networks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Learning - Knowledge acquisition</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Humanities - Linguistics</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>General Terms Economics</institution>
          ,
          <addr-line>Experimentation, Human Factors, Languages</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>I.7.2 Documentation Preparation - Markup languages</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jonathan Pool Turing Center, University of Washington Seattle</institution>
          ,
          <addr-line>Washington</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>S. M. Colowick Utilika Foundation Seattle</institution>
          ,
          <addr-line>Washington</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Are people willing and able to disambiguate content for the Semantic Web? We asked subjects to use two methods (paraphrasal and truth-conditional selection) to disambiguate sentences from the Web. Native speakers did better with the paraphrasal method, and non-native speakers with the truth-conditional method. Unpaid volunteers performed better than paid subjects. Subjects' average disambiguation time was about 20 seconds per sentence.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ambiguity</kwd>
        <kwd>Annotation</kwd>
        <kwd>Disambiguation</kwd>
        <kwd>Distributed Human Computation</kwd>
        <kwd>Metadata</kwd>
        <kwd>Semantic Web</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Ambiguity and vagueness pervade the unstructured Web.
The Semantic Web initiative proposes to rely on humans to
create unambiguous content, metadata, and queries, but
people have limited ability to recognize and prevent
ambiguity in what they express [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ]. While machine
understanding of unannotated text may become feasible [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
researchers are working to develop practical interfaces for
human disambiguation of Web content [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To investigate
methods of resolving one of the more difficult kinds of
ambiguity, we conducted an experiment in which subjects
disambiguated English sentences that contained
syntactically ambiguous quantification [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <p>We selected 25 sentences from the Web (a small sample
designed to encourage completion in an online,
unmonitored testing environment). For each sentence, we
identified two possible meanings and wrote a pair of paraphrases
and an equivalent pair of truth conditions (situation
descriptions) for them. For example, “Drinking almost always
followed a dinner-party” had these restatements:
Paraphrases: (1) “Almost all drinking followed
dinnerparties.” (2) “Drinking followed almost all dinner-parties.”
Truth conditions: (1) “In the activity diaries, 900 episodes
of drinking were reported, and 875 of them followed
dinner-parties.” (2) “In the activity diaries, 900 dinner-parties
were reported, and drinking followed 875 of them.”
We asked some subjects (for method comparison) to
choose between the paraphrases or between the truth
conditions, and others (for consistency measurement) to choose
both a paraphrase and a truth condition for each sentence.
These two-task subjects might see the equivalent
restatements in the same or in the opposite order.</p>
      <p>
        We recruited 386 subjects: 208 through a Web contracting
service [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], paid $0.75 each; and 178 through Internet
discussion groups on language and writing, unpaid.
The ability to read and write English was the only
participation requirement; 88% of the subjects had English as a
native language. Subjects had opportunities to give us
comments after each trial, after each block of 5 trials, and
at the end of the experiment.
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
    </sec>
    <sec id="sec-4">
      <title>Satisfaction</title>
      <p>Satisfaction was measured both by questionnaire responses,
which indicated moderate satisfaction for all subjects (on
three dimensions: ease, interest, and usefulness), and by
completion rate. There were slight differences in
satisfaction favoring paraphrasal over truth-conditional
disambiguation and one-task over two-task conditions. For
example, 90% of one-task subjects, compared with only 83%
of two-task subjects, completed the experiment (p &lt; 0.04).</p>
    </sec>
    <sec id="sec-5">
      <title>Consistency, Speed, and Agreement</title>
      <p>The choices made by a two-task subject in a trial were
consistent if the chosen truth condition was equivalent to the
chosen paraphrase. Choices were consistent in 82% of the
trials, regardless of whether the paraphrasal or the
truthconditional task appeared first. But opposite-order trials
(with the first paraphrase equivalent to the second truth
condition and vice versa) showed less consistency (76%)
than same-order trials (86%). Of 159 subjects whose
consistency rates differed between same- and opposite-order trials,
69% (109) were less consistent on opposite-order trials
(twotailed p &lt; 0.00001).</p>
      <p>The median time to perform a disambiguation was 20
seconds on one-task trials and 31 seconds on two-task trials.
Truth-conditional selection typically took 23 percent longer
than paraphrasal selection, perhaps because of the greater
length and complexity of the truth conditions. Overall, the
speed of disambiguation increased with experience.
The fastest subject to achieve 100% consistency finished in
a total of 709 seconds. Others achieved 90% consistency in
about 500 seconds, or 20 seconds per trial (see Figure 1).</p>
    </sec>
    <sec id="sec-6">
      <title>Subsample Analysis</title>
      <p>By most measures, the unpaid volunteers performed better
than the paid subjects. Of 79 two-task volunteers, 42 were
more consistent than the overall median, vs. 37 of 95 paid
subjects (2-tailed p = 0.0608). Of 178 volunteers, 87 made
more than 1 comment, vs. 45 out of 208 paid subjects
(2tailed p &lt; 0.0002). However, volunteers took longer: 84 of
178 volunteers took more than the overall median time to
finish, vs. 52 of 208 paid subjects (2-tailed p &lt; 0.0002).
Native and non-native speakers of English differed most
strikingly in the disambiguation method that worked better
for them. Most native speakers (202 of 340) agreed more
often with the majority when using the paraphrasal method,
but most (25 of 45) non-native speakers did so when using
the truth-conditional method (2-tailed p = 0.0561). The
truth conditions’ emphasis on numerical rather than verbal
reasoning may explain some of this difference.</p>
    </sec>
    <sec id="sec-7">
      <title>DISCUSSION</title>
      <p>One-task subjects resolved ambiguities in 15-25 seconds,
with approximately 80% inter-method consistency and 80%
majority agreement. Volunteers performed even better than
paid subjects, reaching 99% agreement on the most
consensual sentence. Many subjects, particularly in the volunteer
subsample, described the disambiguation tasks as both
challenging and enjoyable.</p>
      <p>Our subjects guessed others’ intended meanings, with no
context but with the opportunity to choose between
carefully crafted restatements. In future experiments, we intend
to study disambiguation by authors, rather than readers,
with more scalable methods of interactive disambiguation.
We surmise that authors will be motivated to limit their
ambiguity, just as our volunteers demonstrated their
enthusiasm for disambiguation. Thus, we anticipate that the
barriers to author disambiguation will be more technical than
motivational. Our focus will be on developing methods that
help motivated authors to recognize and reduce ambiguity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Amazon</surname>
          </string-name>
          .com, “
          <source>Amazon Mechanical Turk” (Web site)</source>
          ,
          <year>2007</year>
          ; http://www.mturk.com/mturk/welcome.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Arnold</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wasow</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asudeh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Alrenga</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , “Avoiding Attachment Ambiguities:
          <article-title>The Role of Constituent Ordering”</article-title>
          ,
          <source>Journal of Memory and Language</source>
          ,
          <volume>51</volume>
          ,
          <year>2004</year>
          ,
          <fpage>55</fpage>
          -
          <lpage>70</lpage>
          ; http://www-csli.stanford.edu/ ~wasow/AWAA_final.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          , “Machine Reading”,
          <source>2007 AAAI Spring Symposium on Machine Reading</source>
          ,
          <year>2007</year>
          ; http://turing.cs.washington.edu/papers/ SS06EtzioniO.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Kaufmann</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>How Useful are Natural Language Interfaces to the Semantic Web for Casual End-Users?”, 6th</article-title>
          <source>International Symantic Web Conference (ISWC</source>
          <year>2007</year>
          ),
          <year>2007</year>
          ; http://www.ifi.uzh.ch/ ddis/staff/goehring/btw/files/ Kaufmann_Bernstein_ISWC2007.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Pool</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Colowick</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <article-title>(in press), “Disambiguating for the Web: A Test of Two Methods,”</article-title>
          <source>Proc. 4th Intl. Conf. on Knowledge Capture</source>
          (ACM Press,
          <year>2007</year>
          ); http://http://turing.cs.washington.edu/papers/ disambweb.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Wasow</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perfors</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Beaver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , “
          <article-title>The Puzzle of Ambiguity”, in Morphology and the Web of Grammar: Essays in Memory of Steven G</article-title>
          . Lapointe, ed. O. Orgun and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sells (Stanford: CSLI Publications</surname>
          </string-name>
          ,
          <year>2005</year>
          ); http://montague.stanford.edu/~dib/Publications/ lapointe_paper_
          <fpage>9</fpage>
          -
          <lpage>4</lpage>
          .pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>