<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Linking Entities in #Microposts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Romil</forename><surname>Bansal</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sandeep</forename><surname>Panem</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Priya</forename><surname>Radhakrishnan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Manish</forename><surname>Gupta</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vasudeva</forename><surname>Varma</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Linking Entities in #Microposts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">43C3711E0DB1160AE552D300134D8787</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Algorithms</term>
					<term>Experimentation Named Entity Extraction and Linking (NEEL) Challenge</term>
					<term>Entity Linking</term>
					<term>Entity Disambiguation</term>
					<term>Social Media</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Social media has emerged to be an important source of information. Entity linking in social media provides an effective way to extract useful information from microposts shared by the users. Entity linking in microposts is a difficult task as they lack sufficient context to disambiguate the entity mentions. In this paper, we do entity linking by first identifying entity mentions and then disambiguating the mentions based on three different features: (1) similarity between the mention and the corresponding Wikipedia entity pages; (2) similarity between the mention and the tweet text with the anchor text strings across multiple webpages, and (3) popularity of the entity on Twitter at the time of disambiguation. The system is tested on the manually annotated dataset provided by Named Entity Extraction and Linking (NEEL) Challenge 2014, and the obtained results are on par with the state-of-the-art methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Social media networks like Twitter have emerged to be major platforms for sharing information in form of short messages (tweets). Analysis of tweets can be useful for various applications like ecommerce, entertainment, recommendations, etc. Entity linking is the one such analysis task which deals with finding correct referent entities in the knowledge base for various mentions in the tweet. Entity linking in social media is important as it helps in detecting, understanding and tracking information about an entity shared across social media.</p><p>Entity linking consists of two different tasks, mention detection and entity disambiguation. Entity linking from general text is a well explored problem. Existing entity linking tools are intended for use over news corpora and similar document-based corpora with relatively long length. But as microposts lack sufficient context, these context-based approaches fail to perform well on microposts.</p><p>In this paper we describe our system proposed for the NEEL Challenge 2014 <ref type="bibr" target="#b0">[1]</ref>. The proposed system disambiguates the entity mentions in the tweets based on three different measures: (1) Wikipedia's context based measure ( §2.2.1); ( <ref type="formula">2</ref>) anchor text based measure ( §2.2.2); and (3) Twitter popularity based measure ( §2.2.3).</p><p>The mention detection is done using existing Twitter part-ofspeech (POS) taggers <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b4">5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">OUR APPROACH</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Mention Detection</head><p>Mention detection is the task of finding entity mentions in the given text. We assumed mentions as named entities present inside the tweets. Various approaches for named entity recognition in tweets have been proposed recently <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b4">5]</ref>. This includes spotting continuous sequence of proper nouns as named entities in the tweet. But sometimes named entities like 'Statue of Liberty', 'Game of Thrones' etc. also includes tokens other than nouns. To detect such mentions, Ritter et al. <ref type="bibr" target="#b4">[5]</ref> proposed a machine learning based system for named entity detection in tweets. Gimpel et al. <ref type="bibr" target="#b1">[2]</ref> present yet another approach for POS tagging of tweets. We tried both of these POS taggers to extract proper noun sequences. In our experiments Ritter et al.'s tagger gave an accuracy of 77% while Gimpel et al.'s tagger gave an accuracy of 92%. So we merged the results from both as shown in Fig. <ref type="figure">1</ref>. The tweet text is fed to the system and the longest continuous sequences of proper noun tokens detected using the above approach are extracted as the entity mentions from the given tweet. The merged system provided an accuracy of 98% in predicting mentions.</p><p>ARK POS Tagger Gimpel et al. <ref type="bibr" target="#b1">[2]</ref> Anchor text based measure</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Wikipedia based measure Merge Mentions</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>T-NER POS Tagger</head><p>Ritter et al. <ref type="bibr" target="#b4">[5]</ref> Twitter </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Entity Disambiguation</head><p>Entity disambiguation is the task of assigning the correct referent entity from the knowledge base to the given mention. We disambiguate the entity mention using three measures as described below. The scores from these three measures are combined using Lamb-daMART <ref type="bibr" target="#b6">[7]</ref> model to arrive at the final disambiguated entity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Wikipedia's Context based Measure (M1)</head><p>This measure disambiguates a mention by calculating the frequency of occurrence of the mention in the Wikipedia corpus. Wikipedia's context based measure has been used in various approaches for disambiguating mentions in tweets <ref type="bibr" target="#b3">[4]</ref>. We query MediaWiki API <ref type="foot" target="#foot_0">1</ref>with the entity mention. MediaWiki API returns the candidate entities in the ranked order. Each candidate entity is assigned its reciprocal rank as score. Thus, a ranked list of candidate entities with their scores are created using M1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Anchor Text based Measure (M2)</head><p>Google Cross-Wiki Dictionary (GCD) <ref type="bibr" target="#b5">[6]</ref> is a string to concept mapping, created using anchor text from various web pages. A concept is an individual Wikipedia article, identified by its URL. The text strings constitute the anchor hypertexts that refer to these concepts. Thus, anchor text strings represent a concept. We query the GCD with a mention along with the tweet text. Based on the similarity to the query string, a ranked list of probable candidate entities are created (which is the ranked list using M2). The ranking criteria is based on Jaccard similarity between the anchor text and the query. So if the mention is highly similar to the anchor text, then the corresponding concept will have a high score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Twitter Popularity based Measure (M3)</head><p>Tweets about entities follow a bursty pattern. Bursty patterns are the bursts of tweets that appear after an event relating to an entity happens. We exploited this fact and tried to measure the number of times the given mention refers to a particular entity on Twitter recently. The mention is queried on Twitter API<ref type="foot" target="#foot_1">2</ref> and the resultant tweets are analyzed. All the tweets along with the mention are then queried on the GCD and the candidate entities are taken. Based on the scores returned using GCD, all the candidate entities are ranked (which is the ranked list using M3). As Twitter popularity based measure captures the people's interests at a particular time, it works well for entity disambiguation on recent tweets. In essence, the methods M2 and M3 are similar but with different inputs. Both use GCD, and produce candidate mentions and score as output. However, M2 takes mention and single tweet text as input whereas M3 takes mention and multiple tweets as input.</p><p>We have three rankings available using M1, M2, M3. Now the task is to arrive at the final ranking of the candidate entities by combining the rankings of the three different models. The rankings of different models should be combined such that the overall F1 score is maximized. For this, we use LambdaMART which combines LambdaRank and MART models. LambdaMART creates boosted regression trees for combining the rankings of the three different systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">RESULTS AND EVALUATION</head><p>The dataset comprises of 2.3K tweets each annotated with the entity mention and its corresponding DBpedia URL. We divided the dataset into the 7:3 (train:test) ratio. Table <ref type="table" target="#tab_1">1</ref> shows the results obtained using the NEEL Challenge evaluation framework. The best results are obtained when a combination of all the measures were used for disambiguation 3 . A 5-fold cross validation on the dataset gave an average F1 of 0.52 for M1+M2+M3. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">CONCLUSION</head><p>For effective entity linking, mention detection in tweets is important. We improve the accuracy of detecting mentions by combining various Twitter POS taggers. We resolve multiple mentions, abbreviations and spell variations of a named entity using the Google Cross-Wiki Dictionary. We also use popularity of an entity on Twitter for improving the disambiguation. Our system performed well with a F1 score of 0.512 on the given dataset.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Results: M1 represents Wikipedia's Context based Measure ( §2.2.1), M2 represents Anchor Text based Measure ( §2.2.2) and M3 represents Twitter Popularity based Measure ( §2.2.3)</figDesc><table><row><cell>Measure M1</cell><cell>F1-measure 0.355</cell></row><row><cell>M2</cell><cell>0.100</cell></row><row><cell>M3</cell><cell>0.194</cell></row><row><cell>M1+M2</cell><cell>0.355</cell></row><row><cell>M2+M3</cell><cell>0.244</cell></row><row><cell>M1+M3 M1+M2+M</cell><cell>0.405 0.512</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.mediawiki.org/wiki/API:Search</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://dev.twitter.com/docs/api/1.1/get/search/tweets</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Making Sense of Microposts (#Microposts2014) Named Entity Extraction &amp; Linking Challenge</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Cano Basave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Varga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stankovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-S</forename><surname>Dadzie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc., 4th Workshop on Making Sense of Microposts (#Microposts2014)</title>
				<meeting>4th Workshop on Making Sense of Microposts (#Microposts2014)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="54" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Part-of-speech Tagging for Twitter: Annotation, Features, and Experiments</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>O'connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Eisenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Heilman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yogatama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Flanigan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 49 th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers -Volume 2 (NAACL-HLT)</title>
				<meeting>of the 49 th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers -Volume 2 (NAACL-HLT)</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="42" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">To Link or Not to Link? A Study on End-to-End Tweet Entity Linking</title>
		<author>
			<persName><forename type="first">S</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kıcıman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)</title>
				<meeting>of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1020" to="1030" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Entity Linking for Tweets</title>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 51 th Annual Meeting of the Association for Computational Linguistics (ACL)</title>
				<meeting>of the 51 th Annual Meeting of the Association for Computational Linguistics (ACL)</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1304" to="1311" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Named Entity Recognition in Tweets: An Experimental Study</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Mausam</surname></persName>
		</author>
		<author>
			<persName><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>of the 2011 Conference on Empirical Methods in Natural Language essing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A Cross-Lingual Dictionary for English Wikipedia Concepts</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Spitkovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">X</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 8 th Intl. Conf. on Language Resources and Evaluation (LREC)</title>
				<meeting>of the 8 th Intl. Conf. on Language Resources and Evaluation (LREC)</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Adapting Boosting for Information Retrieval Measures</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Burges</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Svore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="254" to="270" />
			<date type="published" when="2010-06">Jun 2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
