Syntactic Disambiguation for the Semantic Web
                   Jonathan Pool                                                  S. M. Colowick
       Turing Center, University of Washington                                   Utilika Foundation
              Seattle, Washington, USA                                        Seattle, Washington, USA
              pool@cs.washington.edu                                               smc@utilika.org


ABSTRACT                                                      METHOD
Are people willing and able to disambiguate content for the   We selected 25 sentences from the Web (a small sample
Semantic Web? We asked subjects to use two methods            designed to encourage completion in an online, unmoni-
(paraphrasal and truth-conditional selection) to disambigu-   tored testing environment). For each sentence, we identi-
ate sentences from the Web. Native speakers did better with   fied two possible meanings and wrote a pair of paraphrases
the paraphrasal method, and non-native speakers with the      and an equivalent pair of truth conditions (situation descrip-
truth-conditional method. Unpaid volunteers performed         tions) for them. For example, “Drinking almost always
better than paid subjects. Subjects’ average disambiguation   followed a dinner-party” had these restatements:
time was about 20 seconds per sentence.                       Paraphrases: (1) “Almost all drinking followed dinner-
                                                              parties.” (2) “Drinking followed almost all dinner-parties.”
Categories and Subject Descriptors
                                                              Truth conditions: (1) “In the activity diaries, 900 episodes
H.1.2 User/Machine Systems – Human factors, human
                                                              of drinking were reported, and 875 of them followed din-
information processing
                                                              ner-parties.” (2) “In the activity diaries, 900 dinner-parties
H.5.2 User Interfaces – Natural language                      were reported, and drinking followed 875 of them.”
I.2.4 Knowledge Representation Formalisms and Methods         We asked some subjects (for method comparison) to
– Semantic networks                                           choose between the paraphrases or between the truth condi-
I.2.6 Learning – Knowledge acquisition                        tions, and others (for consistency measurement) to choose
I.7.2 Documentation Preparation – Markup languages            both a paraphrase and a truth condition for each sentence.
                                                              These two-task subjects might see the equivalent restate-
J.5 Arts and Humanities – Linguistics
                                                              ments in the same or in the opposite order.
General Terms                                                 We recruited 386 subjects: 208 through a Web contracting
Economics, Experimentation, Human Factors, Languages          service [1], paid $0.75 each; and 178 through Internet dis-
                                                              cussion groups on language and writing, unpaid.
Keywords                                                      The ability to read and write English was the only partici-
Ambiguity, Annotation, Disambiguation, Distributed Hu-        pation requirement; 88% of the subjects had English as a
man Computation, Metadata, Semantic Web                       native language. Subjects had opportunities to give us
                                                              comments after each trial, after each block of 5 trials, and
INTRODUCTION                                                  at the end of the experiment.
Ambiguity and vagueness pervade the unstructured Web.
The Semantic Web initiative proposes to rely on humans to     RESULTS
create unambiguous content, metadata, and queries, but
people have limited ability to recognize and prevent ambi-    Satisfaction
guity in what they express [2, 6]. While machine under-       Satisfaction was measured both by questionnaire responses,
standing of unannotated text may become feasible [3], re-     which indicated moderate satisfaction for all subjects (on
searchers are working to develop practical interfaces for     three dimensions: ease, interest, and usefulness), and by
human disambiguation of Web content [4]. To investigate       completion rate. There were slight differences in satisfac-
methods of resolving one of the more difficult kinds of       tion favoring paraphrasal over truth-conditional disam-
ambiguity, we conducted an experiment in which subjects       biguation and one-task over two-task conditions. For ex-
disambiguated English sentences that contained syntacti-      ample, 90% of one-task subjects, compared with only 83%
cally ambiguous quantification [5].                           of two-task subjects, completed the experiment (p < 0.04).

                                                              Consistency, Speed, and Agreement
                                                              The choices made by a two-task subject in a trial were con-
                                                              sistent if the chosen truth condition was equivalent to the
                                                              chosen paraphrase. Choices were consistent in 82% of the
                                                              trials, regardless of whether the paraphrasal or the truth-
                                                              conditional task appeared first. But opposite-order trials
                                                              (with the first paraphrase equivalent to the second truth
condition and vice versa) showed less consistency (76%)         often with the majority when using the paraphrasal method,
than same-order trials (86%). Of 159 subjects whose consis-     but most (25 of 45) non-native speakers did so when using
tency rates differed between same- and opposite-order trials,   the truth-conditional method (2-tailed p = 0.0561). The
69% (109) were less consistent on opposite-order trials (two-   truth conditions’ emphasis on numerical rather than verbal
tailed p < 0.00001).                                            reasoning may explain some of this difference.
The median time to perform a disambiguation was 20 sec-
onds on one-task trials and 31 seconds on two-task trials.      DISCUSSION
Truth-conditional selection typically took 23 percent longer    One-task subjects resolved ambiguities in 15-25 seconds,
than paraphrasal selection, perhaps because of the greater      with approximately 80% inter-method consistency and 80%
length and complexity of the truth conditions. Overall, the     majority agreement. Volunteers performed even better than
speed of disambiguation increased with experience.              paid subjects, reaching 99% agreement on the most consen-
                                                                sual sentence. Many subjects, particularly in the volunteer
The fastest subject to achieve 100% consistency finished in
                                                                subsample, described the disambiguation tasks as both
a total of 709 seconds. Others achieved 90% consistency in
                                                                challenging and enjoyable.
about 500 seconds, or 20 seconds per trial (see Figure 1).
                                                                Our subjects guessed others’ intended meanings, with no
                                                                context but with the opportunity to choose between care-
                                                                fully crafted restatements. In future experiments, we intend
                                                                to study disambiguation by authors, rather than readers,
                                                                with more scalable methods of interactive disambiguation.
                                                                We surmise that authors will be motivated to limit their
                                                                ambiguity, just as our volunteers demonstrated their enthu-
                                                                siasm for disambiguation. Thus, we anticipate that the bar-
                                                                riers to author disambiguation will be more technical than
                                                                motivational. Our focus will be on developing methods that
                                                                help motivated authors to recognize and reduce ambiguity.

                                                                REFERENCES
                                                                [1] Amazon.com, “Amazon Mechanical Turk” (Web site),
                                                                    2007; http://www.mturk.com/mturk/welcome.
                                                                [2] Arnold, J. E., Wasow, T., Asudeh, A., and Alrenga, P.,
            Figure 1. Consistency by Duration                       “Avoiding Attachment Ambiguities: The Role of Con-
Insofar as the majority correctly guesses intended mean-            stituent Ordering”, Journal of Memory and Language,
ings, the size of the majority is a measure of the subjects’        51, 2004, 55-70; http://www-csli.stanford.edu/
collective success. We define a method-majority choice as           ~wasow/AWAA_final.pdf.
the choice made by the majority of subjects (in all treat-      [3] Etzioni, O., Banko, M., and Cafarella, M. J., “Machine
ment groups) who disambiguated the same sentence with               Reading”, 2007 AAAI Spring Symposium on Machine
the same method in any trial. Of 13,859 choices made by             Reading, 2007; http://turing.cs.washington.edu/papers/
all subjects, 77% were method-majority choices. This pro-           SS06EtzioniO.pdf.
portion was larger for paraphrasal selection (79%) than for     [4] Kaufmann, E., and Bernstein, A., “How Useful are
truth-conditional selection (75%). Paraphrasing was the             Natural Language Interfaces to the Semantic Web for
better method (it had higher method-majority rates) for 223         Casual End-Users?”, 6th International Symantic Web
subjects, while truth-conditional selection was better for          Conference (ISWC 2007), 2007; http://www.ifi.uzh.ch/
only 116 subjects (p < 0.00000001).                                 ddis/staff/goehring/btw/files/
                                                                    Kaufmann_Bernstein_ISWC2007.pdf.
Subsample Analysis
By most measures, the unpaid volunteers performed better        [5] Pool, J., and Colowick, S. M. (in press), “Disambiguat-
than the paid subjects. Of 79 two-task volunteers, 42 were          ing for the Web: A Test of Two Methods,” Proc. 4th
more consistent than the overall median, vs. 37 of 95 paid          Intl. Conf. on Knowledge Capture (ACM Press, 2007);
subjects (2-tailed p = 0.0608). Of 178 volunteers, 87 made          http://http://turing.cs.washington.edu/papers/
more than 1 comment, vs. 45 out of 208 paid subjects (2-            disambweb.pdf.
tailed p < 0.0002). However, volunteers took longer: 84 of      [6] Wasow, T., Perfors, A., and Beaver, D., “The Puzzle
178 volunteers took more than the overall median time to            of Ambiguity”, in Morphology and the Web of Gram-
finish, vs. 52 of 208 paid subjects (2-tailed p < 0.0002).          mar: Essays in Memory of Steven G. Lapointe, ed. O.
Native and non-native speakers of English differed most             Orgun and P. Sells (Stanford: CSLI Publications,
strikingly in the disambiguation method that worked better          2005); http://montague.stanford.edu/~dib/Publications/
for them. Most native speakers (202 of 340) agreed more             lapointe_paper_9-4.pdf.