<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Syntactic Disambiguation for the Semantic Web</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Jonathan</forename><surname>Pool</surname></persName>
							<email>pool@cs.washington.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Turing Center</orgName>
								<orgName type="institution">University of Washington Seattle</orgName>
								<address>
									<settlement>Washington</settlement>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Colowick</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Utilika Foundation Seattle</orgName>
								<address>
									<settlement>Washington</settlement>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Syntactic Disambiguation for the Semantic Web</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FB7BE0152F3BE3524F70EE83ECB61BBF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.1.2 User/Machine Systems -Human factors</term>
					<term>human information processing H.5.2 User Interfaces -Natural language I.2.4 Knowledge Representation Formalisms and Methods -Semantic networks I.2.6 Learning -Knowledge acquisition I.7.2 Documentation Preparation -Markup languages J.5 Arts and Humanities -Linguistics Economics</term>
					<term>Experimentation</term>
					<term>Human Factors</term>
					<term>Languages Ambiguity</term>
					<term>Annotation</term>
					<term>Disambiguation</term>
					<term>Distributed Human Computation</term>
					<term>Metadata</term>
					<term>Semantic Web</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Are people willing and able to disambiguate content for the Semantic Web? We asked subjects to use two methods (paraphrasal and truth-conditional selection) to disambiguate sentences from the Web. Native speakers did better with the paraphrasal method, and non-native speakers with the truth-conditional method. Unpaid volunteers performed better than paid subjects. Subjects' average disambiguation time was about 20 seconds per sentence.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>Ambiguity and vagueness pervade the unstructured Web. The Semantic Web initiative proposes to rely on humans to create unambiguous content, metadata, and queries, but people have limited ability to recognize and prevent ambiguity in what they express <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b5">6]</ref>. While machine understanding of unannotated text may become feasible <ref type="bibr" target="#b2">[3]</ref>, researchers are working to develop practical interfaces for human disambiguation of Web content <ref type="bibr" target="#b3">[4]</ref>. To investigate methods of resolving one of the more difficult kinds of ambiguity, we conducted an experiment in which subjects disambiguated English sentences that contained syntactically ambiguous quantification <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>METHOD</head><p>We selected 25 sentences from the Web (a small sample designed to encourage completion in an online, unmonitored testing environment). For each sentence, we identified two possible meanings and wrote a pair of paraphrases and an equivalent pair of truth conditions (situation descriptions) for them. For example, "Drinking almost always followed a dinner-party" had these restatements: Paraphrases: (1) "Almost all drinking followed dinnerparties." (2) "Drinking followed almost all dinner-parties." Truth conditions: (1) "In the activity diaries, 900 episodes of drinking were reported, and 875 of them followed dinner-parties." (2) "In the activity diaries, 900 dinner-parties were reported, and drinking followed 875 of them." We asked some subjects (for method comparison) to choose between the paraphrases or between the truth conditions, and others (for consistency measurement) to choose both a paraphrase and a truth condition for each sentence. These two-task subjects might see the equivalent restatements in the same or in the opposite order. We recruited 386 subjects: 208 through a Web contracting service <ref type="bibr" target="#b0">[1]</ref>, paid $0.75 each; and 178 through Internet discussion groups on language and writing, unpaid. The ability to read and write English was the only participation requirement; 88% of the subjects had English as a native language. Subjects had opportunities to give us comments after each trial, after each block of 5 trials, and at the end of the experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Satisfaction</head><p>Satisfaction was measured both by questionnaire responses, which indicated moderate satisfaction for all subjects (on three dimensions: ease, interest, and usefulness), and by completion rate. There were slight differences in satisfaction favoring paraphrasal over truth-conditional disambiguation and one-task over two-task conditions. For example, 90% of one-task subjects, compared with only 83% of two-task subjects, completed the experiment (p &lt; 0.04).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Consistency, Speed, and Agreement</head><p>The choices made by a two-task subject in a trial were consistent if the chosen truth condition was equivalent to the chosen paraphrase. Choices were consistent in 82% of the trials, regardless of whether the paraphrasal or the truthconditional task appeared first. But opposite-order trials (with the first paraphrase equivalent to the second truth condition and vice versa) showed less consistency (76%) than same-order trials (86%). Of 159 subjects whose consistency rates differed between same-and opposite-order trials, 69% (109) were less consistent on opposite-order trials (twotailed p &lt; 0.00001). The median time to perform a disambiguation was 20 seconds on one-task trials and 31 seconds on two-task trials. Truth-conditional selection typically took 23 percent longer than paraphrasal selection, perhaps because of the greater length and complexity of the truth conditions. Overall, the speed of disambiguation increased with experience. The fastest subject to achieve 100% consistency finished in a total of 709 seconds. Others achieved 90% consistency in about 500 seconds, or 20 seconds per trial (see Figure <ref type="figure">1</ref>). Insofar as the majority correctly guesses intended meanings, the size of the majority is a measure of the subjects' collective success. We define a method-majority choice as the choice made by the majority of subjects (in all treatment groups) who disambiguated the same sentence with the same method in any trial. Of 13,859 choices made by all subjects, 77% were method-majority choices. This proportion was larger for paraphrasal selection (79%) than for truth-conditional selection (75%). Paraphrasing was the better method (it had higher method-majority rates) for 223 subjects, while truth-conditional selection was better for only 116 subjects (p &lt; 0.00000001).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Subsample Analysis</head><p>By most measures, the unpaid volunteers performed better than the paid subjects. Of 79 two-task volunteers, 42 were more consistent than the overall median, vs. 37 of 95 paid subjects (2-tailed p = 0.0608). Of 178 volunteers, 87 made more than 1 comment, vs. 45 out of 208 paid subjects (2tailed p &lt; 0.0002). However, volunteers took longer: 84 of 178 volunteers took more than the overall median time to finish, vs. 52 of 208 paid subjects (2-tailed p &lt; 0.0002). Native and non-native speakers of English differed most strikingly in the disambiguation method that worked better for them. Most native speakers (202 of 340) agreed more often with the majority when using the paraphrasal method, but most (25 of 45) non-native speakers did so when using the truth-conditional method (2-tailed p = 0.0561). The truth conditions' emphasis on numerical rather than verbal reasoning may explain some of this difference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DISCUSSION</head><p>One-task subjects resolved ambiguities in 15-25 seconds, with approximately 80% inter-method consistency and 80% majority agreement. Volunteers performed even better than paid subjects, reaching 99% agreement on the most consensual sentence. Many subjects, particularly in the volunteer subsample, described the disambiguation tasks as both challenging and enjoyable. Our subjects guessed others' intended meanings, with no context but with the opportunity to choose between carefully crafted restatements. In future experiments, we intend to study disambiguation by authors, rather than readers, with more scalable methods of interactive disambiguation. We surmise that authors will be motivated to limit their ambiguity, just as our volunteers demonstrated their enthusiasm for disambiguation. Thus, we anticipate that the barriers to author disambiguation will be more technical than motivational. Our focus will be on developing methods that help motivated authors to recognize and reduce ambiguity.</p></div>		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Amazon Mechanical Turk</title>
		<author>
			<persName><surname>Amazon</surname></persName>
		</author>
		<author>
			<persName><surname>Com</surname></persName>
		</author>
		<ptr target="http://www.mturk.com/mturk/welcome" />
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>Web site</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Avoiding Attachment Ambiguities: The Role of Constituent Ordering</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Arnold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wasow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Asudeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Alrenga</surname></persName>
		</author>
		<ptr target="http://www-csli.stanford.edu/~wasow/AWAA_final.pdf" />
	</analytic>
	<monogr>
		<title level="j">Journal of Memory and Language</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="55" to="70" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Machine Reading</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<ptr target="http://turing.cs.washington.edu/papers/SS06EtzioniO.pdf" />
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium on Machine Reading</title>
				<imprint>
			<date type="published" when="2007">2007. 2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">How Useful are Natural Language Interfaces to the Semantic Web for Casual End-Users?</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kaufmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernstein</surname></persName>
		</author>
		<ptr target="http://www.ifi.uzh.ch/ddis/staff/goehring/btw/files/Kaufmann_Bernstein_ISWC2007.pdf" />
	</analytic>
	<monogr>
		<title level="m">6th International Symantic Web Conference (ISWC 2007)</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Disambiguating for the Web: A Test of Two Methods</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Colowick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 4th Intl. Conf. on Knowledge Capture</title>
				<meeting>4th Intl. Conf. on Knowledge Capture</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>in press</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Puzzle of Ambiguity</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wasow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Perfors</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Beaver</surname></persName>
		</author>
		<ptr target="http://montague.stanford.edu/~dib/Publications/lapointe_paper_9-4.pdf.Figure1.ConsistencybyDuration" />
	</analytic>
	<monogr>
		<title level="m">Morphology and the Web of Grammar: Essays in Memory of</title>
				<editor>
			<persName><forename type="first">O</forename><surname>Orgun</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Sells</surname></persName>
		</editor>
		<meeting><address><addrLine>Stanford</addrLine></address></meeting>
		<imprint>
			<publisher>CSLI Publications</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
