<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Confirming the Generalizability of a Chain-Based Animacy Detector</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Labiba</forename><surname>Jahan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Sciences</orgName>
								<orgName type="institution">Florida International University</orgName>
								<address>
									<postCode>33199</postCode>
									<settlement>Miami</settlement>
									<region>FL</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">W</forename><surname>Victor</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Sciences</orgName>
								<orgName type="institution">Florida International University</orgName>
								<address>
									<postCode>33199</postCode>
									<settlement>Miami</settlement>
									<region>FL</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">H</forename><surname>Yarlott</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Sciences</orgName>
								<orgName type="institution">Florida International University</orgName>
								<address>
									<postCode>33199</postCode>
									<settlement>Miami</settlement>
									<region>FL</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rahul</forename><surname>Mittal</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Sciences</orgName>
								<orgName type="institution">Florida International University</orgName>
								<address>
									<postCode>33199</postCode>
									<settlement>Miami</settlement>
									<region>FL</region>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Mark</forename><forename type="middle">A</forename><surname>Finlayson</surname></persName>
							<email>markaf@fiu.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Sciences</orgName>
								<orgName type="institution">Florida International University</orgName>
								<address>
									<postCode>33199</postCode>
									<settlement>Miami</settlement>
									<region>FL</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Confirming the Generalizability of a Chain-Based Animacy Detector</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DABA09BE54E3F0E51FBED02F41156B1F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Animacy is the characteristic of a referent being able to independently carry out actions in a story world (e.g., movement, communication). It is a necessary property of characters in stories, and so detecting animacy is an important step in automatic story understanding; it is also potentially useful for many other natural language processing tasks such as word sense disambiguation, coreference resolution, character identification, and semantic role labeling. Recent work by Jahan et al. [2018]  demonstrated a new approach to detecting animacy where animacy is considered a direct property of coreference chains (and referring expressions) rather than words. In Jahan et al., they combined hand-built rules and machine learning (ML) to identify the animacy of referring expressions and used majority voting to assign the animacy of coreference chains, and reported high performance of up to 0.90 F 1 . In this short report we verify that the approach generalizes to two different corpora (OntoNotes and the Corpus of English Novels) and we confirmed that the hybrid model performs best, with the rule-based model in second place. Our tests apply the animacy classifier to almost twice as much data as Jahan et al.'s initial study. Our results also strongly suggest, as would be expected, the dependence of the models on coreference chain quality. We release our data and code to enable reproducibility.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Animacy is the characteristic of a referent being able to independently carry out actions in a story world (e.g., movement, communication). For example, human beings are animate because they can move or communicate in a realistic story world but a chair or a table cannot accomplish those actions independently, so they are considered inanimate. Because animacy is a necessary quality of characters in stories (that is, all characters, traditionally conceived, must be animate), animacy is useful to story understanding. Further, animacy is potentially useful in many natural language processing tasks including word sense disambiguation, semantic role labeling, coreference resolution, and character identification.</p><p>Most prior approaches assigned animacy as a property of individual words; by contrast, Jahan et al. <ref type="bibr">[2018]</ref> introduced a new approach to animacy detection that reconceived of animacy as a property of referring expressions and coreference chains. In the work by Jahan et al., they demonstrated their approach on 142 stories, comprising 156,154 words, that included Russian folktales and Islamist Extremists stories. That work left some questions as to the generalizability of the detector to other story forms. Here we test the generalizability of Jahan et al.'s detector on two new corpora, a news subset of OntoNotes <ref type="bibr" target="#b7">[Weischedel et al., 2013]</ref> and the subset of the Corpus of English Novels (CEN) <ref type="bibr" target="#b3">[De Smet, 2008]</ref>. We test all three of Jahan et al.'s models, specifically, an SVMbased ML, a rule-based model, and a hybrid model combining both. We show, in agreement with Jahan et al.'s results, that the hybrid model performs best, followed by the rule-based model. Our results also suggest that the animacy models have a strong dependence on the quality of coreference chains; in particular, the performance of the models on the CEN data (with automatically computed chains) is much poorer than on OntoNotes and the ProppLearner corpus (with manually corrected chains).</p><p>In this paper first we discuss our corpora ( §2), followed by the models ( §3) created by Jahan et al. <ref type="bibr">[2018]</ref>. We then outline the experimental setup ( §4) and describe our results ( §5). We briefly discuss related work ( §6), before finishing with a discussion of the contributions of the paper ( §7).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data</head><p>We annotated animacy on two new corpora. First, 94 news texts drawn from the OntoNotes Corpus <ref type="bibr" target="#b7">[Weischedel et al., 2013]</ref>. Second, 30 chapters from 30 novels drawn from CEN. We performed this manual annotation by following the same guidelines described by Jahan et al. <ref type="bibr">[2018]</ref>. In accordance with their procedure, we have annotated the coreference chains of these two corpora as to whether each coreference chain head acted as an animate being in the text. Be-  <ref type="bibr">et al., 2013]</ref> is a large corpus containing a variety of genres, e.g., news, conversational telephone speech, broadcast, talk show transcripts, etc., in English, Chinese, and Arabic. We extracted 94 English broadcast news texts that had coreference chain annotations. The first author annotated the animacy of the coreference chains.</p><p>Corpus of English Novels (CEN) [De <ref type="bibr" target="#b3">Smet, 2008]</ref> contains 292 English novels written between 1881 and 1922 comprising various genres including drama, romance, fantasy, etc. We selected 30 novels and listed the characters of these novels from the online resources. Then we extracted a single chapter of each novel that contains a significant number of characters. We computed coreference chains using Stanford CoreNLP <ref type="bibr" target="#b7">[Manning et al., 2014]</ref>, and the first author annotated those chains for animacy.  <ref type="bibr" target="#b1">and Lin, 2011]</ref> for assigning animacy to referring expressions, with a Radial Basis Function Kernel where SVM parameters were set at = 1, C = 0.5 and p = 1. The features of the best performing model are boolean values of whether a given referring expression contained a noun, a grammatical or a semantic subject. Jahan et al. chose these features because animate references tend to appear as nouns, grammatical subjects, or semantic subjects. When training and testing on the same dataset, we used ten-fold cross validation, and reported the micro-averages across the performance on test folds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Models</head><p>Rule-Based Model The second approach is a rule-based classifier that marks a referring expression as animate if its last word was: (a) a gendered personal, reflexive, or possessive pronoun (i.e., excluding it, its, itself, etc.); (b) the seman-tic subject to a verb; (c) a proper noun (i.e., excluding namedentity types of LOCATION, ORGANIZATION, MONEY); or, (d) a descendant of LIVING BEING in WordNet. If the last word of a referring expression is a descendant of ENTITY but not a descendant of LIVING BEING in WordNet, the model considers it inanimate.</p><p>Hybrid Model is the third approach where hand-built rules are applied first, followed by the ML classifier to those referring expressions not covered by the rules.</p><p>Majority Vote Model The coreference model applies majority voting to combine the results of the referring expression animacy model to obtain a coreference animacy prediction. For ties, the chain was marked inanimate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>We investigated four training setups for the SVM and Hybrid referring expression models: first, training the model each data set individually, and also training on all three datasets together. For all models (SVM, Hybrid, Rule-Based) we also varied the test corpus. Where the test data was a subset of the training data, we applied ten-fold cross-validation. In all approaches, we used the majority vote classifier to identify the animacy of the coreference chains. These experiments are used to compare the performance of Jahan et al.'s referring expression model on our new corpora, as well as determine the performance for determining coreference chain animacy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results &amp; Discussion</head><p>The results in Table <ref type="table">2</ref> show that the hybrid model outperformed all of the other models in detecting referring expression animacy, which is the same result reported in Jahan et al. <ref type="bibr">[2018]</ref>. It performed the best on Jahan et al.'s original data, achieving an F 1 of 0.88, and is the most useful model when applying as input to the majority vote model to identify the animacy of coreference chains, achieving an F 1 of 0.77.</p><p>The rule-based model performs second-best. It performed best on Jahan et al.'s original data for referring expressions, achieving an F 1 of 0.88. But the majority vote model achieved the best result (F 1 of 0.76) on OntoNotes when the rule-based results are used to detect the chain animacy. We developed a baseline for chain animacy where we considered the first referring expression only instead of majority vote and achieved an F 1 of 0.69 and 0.43 on <ref type="bibr">OntoNotes and CEN.</ref> The SVM model performed worse in most of the cases, especially when the outputs are used for the majority vote model. It performed worst when it trained on the Corpus of English Novels and tested on Jahan et al.'s original data, achieving an F 1 of only 0.56 for the referring expressions and achieved an F 1 of 0.37 when the results of the referring expressions are used for the majority vote model.</p><p>The majority vote model performed best when tested on OntoNotes. It performed worst when tested on the Corpus of English Novels (CEN). Besides the text genre, the major difference between these corpora is the quality of the coreference chains. For OntoNotes, they are manually corrected, while we automatically computed those on CEN. This strongly suggests that the quality of coreference chains is a major factor in the performance of the animacy classifier.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Jahan</head><label></label><figDesc>et al.'s animacy model first classifies the animacy of referring expressions, and second classifies each coreference chain as animate or not by taking the majority vote of it's constituting referring expressions. In our experiments we ran Jahan et al.'s three referring expression animacy detection models and the single coreference chain animacy detection model. (majority vote backed by the different referring expression models, which were determined by to be the best coreference model). Jahan et al. released the code so the models are identical to their work. SVM Model is a simple supervised SVM classifier [Chang</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The data and code may be downloaded from https://doi.org/10. 34703/gzx1-9v95/FCYIPW</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work was supported by NSF CAREER Award IIS-1749917 and DARPA Contract FA8650-19-C-6017. We would also like to thank the members of the FIU Cognac Lab for their discussions and assistance.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Referring Expression Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Corerference</head> <ref type="bibr" target="#b2">[Cohen, 1960]</ref><p>, a statistical measure that takes into account the possibility of the agreement occurring by chance <ref type="bibr" target="#b5">[Glasser, 2008]</ref> Finally, the results on the combined corpus are reasonable for the referring expression models but performed poorly for the majority vote coreference chain model. This is perhaps to be expected because CEN is the largest corpus among the three and the coreference chains are poor in quality.</p><p>Overall, these results strongly suggest that the features used in Jahan et al. <ref type="bibr">[2018]</ref> are generalizable to domains outside the Russian folklore corpus used as long as high quality coreference chains are available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>Most prior work classifies animacy as a word or noun level property using different supervised and unsupervised approaches. For example, <ref type="bibr" target="#b7">Orasan and Evans [2007]</ref> performed animacy classification of senses and nouns and achieved the best performance by the supervised ML method (F 1 of 0.94). Similarly, <ref type="bibr">Bowman and Chopra [2012]</ref> used a maximum entropy classifier to classify noun phrases into a most probable class (human, animal, place, etc.), which was used to mark animacy, achieving 94% accuracy. Again, <ref type="bibr" target="#b7">Karsdorp et al. [2015]</ref> employed a maximum entropy classifier to label the animacy of Dutch words using different combinations of lemmas, POS tags, dependency tags, and word embeddings. Their best result reported an F 1 of 0.93. However, the work is language-bound and hasn't been tested on other natural languages. <ref type="bibr" target="#b6">Ji and Lin [2009]</ref> leveraged gender and animacy properties to detect person mentions with an unsupervised learning model. They reported an F 1 of 0.85 which is marginally lower than a supervised learning approach, but has higher coverage of low frequency mentions. More recently, <ref type="bibr" target="#b0">Ardanuy et al. [2020]</ref> proposed an unsupervised approach to atypical animacy detection using contextualized word embeddings. Using a masking approach with context, they achieved the best performance of F 1 of 0.78 on one dataset, while reported an F 1 of 0.94 on another dataset using a simple BERT classifier on the target expressions in a sentence. <ref type="bibr" target="#b8">Zhu et al. [2019]</ref> proposed an animacy detector based on a bi-directional Long Short-term Memory (bi-LSTM) network with a conditional random field (CRF) layer to mark a word in a text sequence with the animal attribute (animate). The work was done in Chinese and they reported an F 1 of 0.38.</p><p>There are some works based on ontologies or other external resources. As an example, <ref type="bibr" target="#b4">Declerck et al. [2012]</ref> augmented an existing ontology using nominal phrases found in folktales. They reported an F 1 of 0.80 with 79% accuracy. <ref type="bibr" target="#b7">Moore et al. [2013]</ref> assigned animacy to words, where multiple model (including WordNet and WordSim) votes between Animal, Person, Inanimate or abstains, and then the results are combined using various interpretable voting models. They reported an accuracy of 89% under majority voting and 95% under an SVM scheme.</p><p>Generally, however, compared to all other prior work on animacy, only Jahan et al. <ref type="bibr">[2018]</ref> demonstrated an approach where animacy is considered a direct property of coreference chains (and referring expressions) rather than words or nouns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Contributions</head><p>This paper makes two contributions. First, we have demonstrated the generalizability of a previously reported approach in animacy detection <ref type="bibr" target="#b6">[Jahan et al., 2018]</ref> by testing the approach on twofold more data comprising two additional types of story genres (news and novels). We release this data for use by the community 1 . These results confirm the best performing models, and also strongly suggest the dependence of the models of the quality of coreference chain annotations.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Living machines: A study of atypical animacy</title>
		<author>
			<persName><surname>Ardanuy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Automatic animacy classification</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Samuel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Harshit</forename><surname>Bowman</surname></persName>
		</editor>
		<editor>
			<persName><surname>Chopra</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2012">2020. 2020. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">LIBSVM: A library for support vector machines</title>
		<author>
			<persName><forename type="first">Chang</forename></persName>
		</author>
		<author>
			<persName><forename type="first">; Chih-Chung</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Jen</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop (NAACL HLT&apos;12)</title>
				<meeting>the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop (NAACL HLT&apos;12)<address><addrLine>Montréal, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2012. 2011</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">27</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A coefficient of agreement for nominal scales</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><surname>Cohen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Educational and Psychological Measurement</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="37" to="46" />
			<date type="published" when="1960">1960. 1960</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Corpus of English novels</title>
		<author>
			<persName><forename type="first">De</forename><surname>Smet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Hendrik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">De</forename><surname>Smet</surname></persName>
		</author>
		<ptr target="https://perswww.kuleuven.be/⇠u0044428/" />
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Ontology-based incremental annotation of characters in folktales</title>
		<author>
			<persName><surname>Declerck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities</title>
				<meeting>the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities<address><addrLine>Avignon, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="page" from="30" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Stephen Glasser</title>
		<author>
			<persName><surname>Glasser</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research Methodology for Studies of Diagnostic Tests</title>
				<meeting><address><addrLine>Dordrecht</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Netherlands</publisher>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="245" to="257" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Gender and Animacy knowledge discovery from web-scale n-grams for unsupervised person mention detection</title>
		<author>
			<persName><forename type="first">Jahan</forename></persName>
		</author>
		<ptr target="https://dspace.mit.edu/handle/1721.1/116172" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation</title>
				<meeting>the 23rd Pacific Asia Conference on Language, Information and Computation<address><addrLine>Santa Fe, NM; Hong Kong</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2018. 2018. 2009. 2009</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="220" to="229" />
		</imprint>
	</monogr>
	<note>Proceedings of the 27th International Conference on Computational Linguistics (COLING)</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The Stanford CoreNLP natural language processing toolkit</title>
		<author>
			<persName><surname>Karsdorp</surname></persName>
		</author>
		<idno>No. LDC2013T19</idno>
		<ptr target="https://catalog.ldc.upenn.edu/LDC2013T19" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations</title>
				<meeting>the 52nd Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations<address><addrLine>Atlanta, GA; Baltimore, MD; Seattle, Washington, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2015. 2015. 2014. 2014. 2013. 2013. 2007. 2007. 2013. 2013</date>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="79" to="103" />
		</imprint>
	</monogr>
	<note type="report_type">LDC Catalog</note>
	<note>OntoNotes Release 5</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Improving anaphora resolution by animacy identification</title>
		<author>
			<persName><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)</title>
				<meeting>the 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)<address><addrLine>Dalian, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="48" to="51" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
