Syntactic Disambiguation for the Semantic Web Jonathan Pool S. M. Colowick Turing Center, University of Washington Utilika Foundation Seattle, Washington, USA Seattle, Washington, USA pool@cs.washington.edu smc@utilika.org ABSTRACT METHOD Are people willing and able to disambiguate content for the We selected 25 sentences from the Web (a small sample Semantic Web? We asked subjects to use two methods designed to encourage completion in an online, unmoni- (paraphrasal and truth-conditional selection) to disambigu- tored testing environment). For each sentence, we identi- ate sentences from the Web. Native speakers did better with fied two possible meanings and wrote a pair of paraphrases the paraphrasal method, and non-native speakers with the and an equivalent pair of truth conditions (situation descrip- truth-conditional method. Unpaid volunteers performed tions) for them. For example, “Drinking almost always better than paid subjects. Subjects’ average disambiguation followed a dinner-party” had these restatements: time was about 20 seconds per sentence. Paraphrases: (1) “Almost all drinking followed dinner- parties.” (2) “Drinking followed almost all dinner-parties.” Categories and Subject Descriptors Truth conditions: (1) “In the activity diaries, 900 episodes H.1.2 User/Machine Systems – Human factors, human of drinking were reported, and 875 of them followed din- information processing ner-parties.” (2) “In the activity diaries, 900 dinner-parties H.5.2 User Interfaces – Natural language were reported, and drinking followed 875 of them.” I.2.4 Knowledge Representation Formalisms and Methods We asked some subjects (for method comparison) to – Semantic networks choose between the paraphrases or between the truth condi- I.2.6 Learning – Knowledge acquisition tions, and others (for consistency measurement) to choose I.7.2 Documentation Preparation – Markup languages both a paraphrase and a truth condition for each sentence. These two-task subjects might see the equivalent restate- J.5 Arts and Humanities – Linguistics ments in the same or in the opposite order. General Terms We recruited 386 subjects: 208 through a Web contracting Economics, Experimentation, Human Factors, Languages service [1], paid $0.75 each; and 178 through Internet dis- cussion groups on language and writing, unpaid. Keywords The ability to read and write English was the only partici- Ambiguity, Annotation, Disambiguation, Distributed Hu- pation requirement; 88% of the subjects had English as a man Computation, Metadata, Semantic Web native language. Subjects had opportunities to give us comments after each trial, after each block of 5 trials, and INTRODUCTION at the end of the experiment. Ambiguity and vagueness pervade the unstructured Web. The Semantic Web initiative proposes to rely on humans to RESULTS create unambiguous content, metadata, and queries, but people have limited ability to recognize and prevent ambi- Satisfaction guity in what they express [2, 6]. While machine under- Satisfaction was measured both by questionnaire responses, standing of unannotated text may become feasible [3], re- which indicated moderate satisfaction for all subjects (on searchers are working to develop practical interfaces for three dimensions: ease, interest, and usefulness), and by human disambiguation of Web content [4]. To investigate completion rate. There were slight differences in satisfac- methods of resolving one of the more difficult kinds of tion favoring paraphrasal over truth-conditional disam- ambiguity, we conducted an experiment in which subjects biguation and one-task over two-task conditions. For ex- disambiguated English sentences that contained syntacti- ample, 90% of one-task subjects, compared with only 83% cally ambiguous quantification [5]. of two-task subjects, completed the experiment (p < 0.04). Consistency, Speed, and Agreement The choices made by a two-task subject in a trial were con- sistent if the chosen truth condition was equivalent to the chosen paraphrase. Choices were consistent in 82% of the trials, regardless of whether the paraphrasal or the truth- conditional task appeared first. But opposite-order trials (with the first paraphrase equivalent to the second truth condition and vice versa) showed less consistency (76%) often with the majority when using the paraphrasal method, than same-order trials (86%). Of 159 subjects whose consis- but most (25 of 45) non-native speakers did so when using tency rates differed between same- and opposite-order trials, the truth-conditional method (2-tailed p = 0.0561). The 69% (109) were less consistent on opposite-order trials (two- truth conditions’ emphasis on numerical rather than verbal tailed p < 0.00001). reasoning may explain some of this difference. The median time to perform a disambiguation was 20 sec- onds on one-task trials and 31 seconds on two-task trials. DISCUSSION Truth-conditional selection typically took 23 percent longer One-task subjects resolved ambiguities in 15-25 seconds, than paraphrasal selection, perhaps because of the greater with approximately 80% inter-method consistency and 80% length and complexity of the truth conditions. Overall, the majority agreement. Volunteers performed even better than speed of disambiguation increased with experience. paid subjects, reaching 99% agreement on the most consen- sual sentence. Many subjects, particularly in the volunteer The fastest subject to achieve 100% consistency finished in subsample, described the disambiguation tasks as both a total of 709 seconds. Others achieved 90% consistency in challenging and enjoyable. about 500 seconds, or 20 seconds per trial (see Figure 1). Our subjects guessed others’ intended meanings, with no context but with the opportunity to choose between care- fully crafted restatements. In future experiments, we intend to study disambiguation by authors, rather than readers, with more scalable methods of interactive disambiguation. We surmise that authors will be motivated to limit their ambiguity, just as our volunteers demonstrated their enthu- siasm for disambiguation. Thus, we anticipate that the bar- riers to author disambiguation will be more technical than motivational. Our focus will be on developing methods that help motivated authors to recognize and reduce ambiguity. REFERENCES [1] Amazon.com, “Amazon Mechanical Turk” (Web site), 2007; http://www.mturk.com/mturk/welcome. [2] Arnold, J. E., Wasow, T., Asudeh, A., and Alrenga, P., Figure 1. Consistency by Duration “Avoiding Attachment Ambiguities: The Role of Con- Insofar as the majority correctly guesses intended mean- stituent Ordering”, Journal of Memory and Language, ings, the size of the majority is a measure of the subjects’ 51, 2004, 55-70; http://www-csli.stanford.edu/ collective success. We define a method-majority choice as ~wasow/AWAA_final.pdf. the choice made by the majority of subjects (in all treat- [3] Etzioni, O., Banko, M., and Cafarella, M. J., “Machine ment groups) who disambiguated the same sentence with Reading”, 2007 AAAI Spring Symposium on Machine the same method in any trial. Of 13,859 choices made by Reading, 2007; http://turing.cs.washington.edu/papers/ all subjects, 77% were method-majority choices. This pro- SS06EtzioniO.pdf. portion was larger for paraphrasal selection (79%) than for [4] Kaufmann, E., and Bernstein, A., “How Useful are truth-conditional selection (75%). Paraphrasing was the Natural Language Interfaces to the Semantic Web for better method (it had higher method-majority rates) for 223 Casual End-Users?”, 6th International Symantic Web subjects, while truth-conditional selection was better for Conference (ISWC 2007), 2007; http://www.ifi.uzh.ch/ only 116 subjects (p < 0.00000001). ddis/staff/goehring/btw/files/ Kaufmann_Bernstein_ISWC2007.pdf. Subsample Analysis By most measures, the unpaid volunteers performed better [5] Pool, J., and Colowick, S. M. (in press), “Disambiguat- than the paid subjects. Of 79 two-task volunteers, 42 were ing for the Web: A Test of Two Methods,” Proc. 4th more consistent than the overall median, vs. 37 of 95 paid Intl. Conf. on Knowledge Capture (ACM Press, 2007); subjects (2-tailed p = 0.0608). Of 178 volunteers, 87 made http://http://turing.cs.washington.edu/papers/ more than 1 comment, vs. 45 out of 208 paid subjects (2- disambweb.pdf. tailed p < 0.0002). However, volunteers took longer: 84 of [6] Wasow, T., Perfors, A., and Beaver, D., “The Puzzle 178 volunteers took more than the overall median time to of Ambiguity”, in Morphology and the Web of Gram- finish, vs. 52 of 208 paid subjects (2-tailed p < 0.0002). mar: Essays in Memory of Steven G. Lapointe, ed. O. Native and non-native speakers of English differed most Orgun and P. Sells (Stanford: CSLI Publications, strikingly in the disambiguation method that worked better 2005); http://montague.stanford.edu/~dib/Publications/ for them. Most native speakers (202 of 340) agreed more lapointe_paper_9-4.pdf.