<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Microblog Retrieval for Disaster Relief: How To Create Ground Truths?</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ribhav</forename><surname>Soni</surname></persName>
							<email>ribhav.soni.cse13@iitbhu.ac.in</email>
							<affiliation key="aff0">
								<orgName type="department">IIT(BHU)</orgName>
								<address>
									<settlement>Varanasi</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sukomal</forename><surname>Pal</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IIT(BHU)</orgName>
								<address>
									<settlement>Varanasi</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Microblog Retrieval for Disaster Relief: How To Create Ground Truths?</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">99FAA2E3B79806E6019AE4437C3734A6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:26+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Crisis Informatics</term>
					<term>Disaster</term>
					<term>Emergency</term>
					<term>Hazards</term>
					<term>Microblog Retrieval</term>
					<term>Social Media</term>
					<term>Text Categorization</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Microblogging services like Twitter are an important source of real-time information during disasters and can be utilized to aid rescue, relief and rehabilitation efforts. The focus of this work is on the creation of gold standard data for automatic retrieval of helpful tweets. Using various experiments on the gold standard data prepared in the FIRE 2016 Microblog Track [3], we show that the gold standard data prepared in [3] missed many relevant tweets. We also demonstrate that using a machine learning model can help in retrieving the remaining relevant tweets by training an SVM model on a subset of the data and using it to get the most useful tweets in the entire dataset. We obtain high precision and recall even with very little training data, which makes such a model suitable for use in a real-time disaster situation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Social media is a very useful resource for obtaining real-time information during disasters. Traditional media like television, newspaper, etc. have limited use for aiding in disaster relief due to their slow updates, and may even be unavailable due to the disaster event. In such situations, social media presents valuable information to aid in disaster relief and rehabilitation with very little time overhead <ref type="bibr">[1]</ref>.</p><p>Twitter in particular is especially suited for extracting details and first-hand accounts within moments of an event, anywhere in the world <ref type="bibr" target="#b4">[6]</ref>, and can thus be exploited for help in relief work. However, it also involves challenges of filtering out information about the crisis situation that is not useful for relief efforts, including tweets expressing shock, condolences, opinion, etc. Some tweets that are not useful for disaster relief efforts are shown in Table <ref type="table">1</ref>.</p><p>The FIRE 2016 Microblog Track <ref type="bibr" target="#b1">[3]</ref> focused on comparing different IR methodologies for retrieval in such scenario, and led to the creation of a benchmark collection of ground truth data for such tasks. However, based on our experiments, we argue that the ground truth annotation exercise missed up to four times as many tweets as were found. This represents a significant loss of information that could potentially be very useful in a disaster situation. Also, since the accuracy of Table <ref type="table">1</ref>. Some examples of tweets that are not useful for disaster relief efforts Tweet Text RT @tarsem insan:,@Gurmeetramrahim Guru ji #MSGHelpEarthquakeVictims I m also Shocked!!!,hearing #earthquake #MSGHelpEarthquakeVictims RT @vrinda 90:,really sad to hear about d earthquake. praying for all the ppl who suffered,&amp; lost their loved ones. hope they get all the h The Government is,so quick to help earthquake victims but why are they so reluctant to our own,farmers needs? Haven't studied anything coz of earthquake and have to go for exam. RT @guthali2:,Imagine Kejriwal were the PM in Nepal Earthquake situation, " Hum kuch,nai kar sakte hai jee, army president ke neeche hai". gold standard data is crucial for evaluation and comparison of retrieval systems, it may lead to weaker systems being ranked above better systems.</p><p>First, we manually labeled a small, random subset of the data and found that many relevant tweets were missing from the gold standard in <ref type="bibr" target="#b1">[3]</ref>. We then proceeded to train an SVM model on a subset of the data, and used it to retrieve 100 tweets with the highest confidence score of the trained model. We found that, averaged across all topics, only less than half of the relevant tweets among those were identified in the gold standard in <ref type="bibr" target="#b1">[3]</ref>.</p><p>We also performed bootstrapping on the labeled random subset to estimate the number of relevant tweets in the entire collection, and obtained about 5 times the relevant tweets from the gold standard in <ref type="bibr" target="#b1">[3]</ref>. Also, we trained our SVM model on small fractions of the training data, and obtained high precision and recall even with very little training data, which shows that such a model can be used effectively in disaster situations with very low time overhead.</p><p>The rest of this paper is organized as follows. We first describe the data used in Section 2, our experiments and results in Section 3, and discussion and future work in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data</head><p>We used the dataset provided by the organizers of the FIRE 2016 Microblog Track <ref type="bibr" target="#b1">[3]</ref>. The data was a collection of 50,068 tweets posted during the earthquake in Nepal in 2015<ref type="foot" target="#foot_0">1</ref> .</p><p>Organizations involved in relief work during disasters need specific, actionable information to help in the relief efforts. Thus, a set of seven specific information needs were identified by the authors in <ref type="bibr" target="#b1">[3]</ref> after consulting members of such organizations.</p><p>The task in <ref type="bibr" target="#b1">[3]</ref> involved retrieving tweets relevant to each of these seven information needs, expressed as topics in TREC format. The seven topics are listed in Table <ref type="table">2</ref>.</p><p>Table <ref type="table">2</ref>. The seven topics (information needs) used in FIRE 2016 Microblog Track <ref type="bibr" target="#b1">[3]</ref> &lt; num&gt;Number: FMT1 &lt; title&gt;What resources were available &lt; desc&gt;Identify the messages which describe the availability of some resources. &lt; narr&gt;A relevant message must mention the availability of some resource like food, drinking water, shelter, clothes, blankets, human resources like volunteers, resources to build or support infrastructure, like tents, water filter, power supply and so on. Messages informing the availability of transport vehicles for assisting the resource distribution process would also be relevant. However, generalized statements without reference to any resource or messages asking for donation of money would not be relevant. &lt; num&gt;Number: FMT2 &lt; title&gt;What resources were required &lt; desc&gt;Identify the messages which describe the requirement or need of some resources. &lt; narr&gt;A relevant message must mention the requirement / need of some resource like food, water, shelter, clothes, blankets, human resources like volunteers, resources to build or support infrastructure like tents, water flter, power supply, and so on. A message informing the requirement of transport vehicles assisting resource distribution process would also be relevant. However, generalized statements without reference to any particular resource, or messages asking for donation of money would not be relevant. &lt; num&gt;Number: FMT3 &lt; title&gt;What medical resources were available &lt; desc&gt;Identify the messages which give some information about availability of medicines and other medical resources. &lt; narr&gt;A relevant message must mention the availability of some medical resource like medicines, medical equipments, blood, supplementary food items (e.g., milk for infants), human resources like doctors/staff and resources to build or support medical infrastructure like tents, water filter, power supply, ambulance, etc. Generalized statements without reference to medical resources would not be relevant. &lt; num&gt;Number: FMT4 &lt; title&gt;What medical resources were required &lt; desc&gt;Identify the messages which describe the requirement of some medicine or other medical resources. &lt; narr&gt;A relevant message must mention the requirement of some medical resource like medicines, medical equipments, supplementary food items, blood, human resources like doctors/staff and resources to build or support medical infrastructure like tents, water filter, power supply, ambulance, etc. Generalized statements without reference to medical resources would not be relevant. &lt; num&gt;Number: FMT5 &lt; title&gt;What were the requirements / availability of resources at specific locations &lt; desc&gt;Identify the messages which describe the requirement or availability of resources at some particular geographical location. &lt; narr&gt;A relevant message must mention both the requirement or availability of some resource, (e.g., human resources like volunteers/medical staff, food, water, shelter, medical resources, tents, power supply) as well as a particular geographical location. Messages containing only the requirement / availability of some resource, without mentioning a geographical location would not be relevant. &lt; num&gt;Number: FMT6 &lt; title&gt;What were the activities of various NGOs / Government organizations &lt; desc&gt;Identify the messages which describe on-ground activities of different NGOs and Government organizations. &lt; narr&gt;A relevant message must contain information about relief-related activities of different NGOs and Government organizations in rescue and relief operation. Messages that contain information about the volunteers visiting different geographical locations would also be relevant. However, messages that do not contain the name of any NGO / Government organization would not be relevant. &lt; num&gt;Number: FMT7 &lt; title&gt;What infrastructure damage and restoration were being reported &lt; desc&gt;Identify the messages which contain information related to infrastructure damage or restoration. &lt; narr&gt;A relevant message must mention the damage or restoration of some specific infrastructure resources, such as structures (e.g., dams, houses, mobile tower), communication infrastructure (e.g., roads, runways, railway), electricity, mobile or Internet connectivity, etc. Generalized statements without reference to infrastructure resources would not be relevant.</p><p>The gold standard preparation in <ref type="bibr" target="#b1">[3]</ref> involved three phases, which can be briefly summarized as follows.</p><p>1. Three annotators independently tried to search for relevant tweets using intuitive keywords, after all tweets were indexed using Indri.</p><p>2. All tweets identified by at least one of the three annotators in Phase 1 were considered and their relevance annotation finalized by mutual discussion among the annotators.</p><p>3. Standard pooling was employed, taking the top 30 results from each run and deciding on their relevance.</p><p>The initial collection by the authors of <ref type="bibr" target="#b1">[3]</ref> consisted of about 100,000 tweets, and the final dataset of 50,068 tweets was obtained by removing duplicate tweets (tweets with similarity greater than a threshold). The collection still included many tweets that were not duplicates but expressed almost the same information. All such instances were classified as relevant in the annotation exercise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments and Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Exhaustive labeling on a small, random subset</head><p>A set of 700 tweets was randomly chosen, and relevance was judged for each tweet in the set separately for each of the seven topics. Within the random sample, the number of relevant tweets identified in the gold standard in <ref type="bibr" target="#b1">[3]</ref> and those identified by exhaustive labeling are given in Table <ref type="table" target="#tab_0">3</ref>. As we can see, within the random sample, the number of relevant tweets identified by our exhaustive annotation was about 5 times of that identified in the gold standard in <ref type="bibr" target="#b1">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Bootstrapping to estimate the number of relevant documents in the entire collection</head><p>After exhaustively labeling the random sample of 700 tweets, we used Bootstrapping <ref type="bibr" target="#b0">[2]</ref> for estimating the number of relevant tweets in the whole collec-tion. Bootstrapping is a resampling method that involves random sampling with replacement, so we generated 1000 samples, each of size 700 tweets, from our sample of 700 tweets with replacement. The number of relevant tweets in each sample was computed, and then its average was taken across all 1000 samples. The resulting number of tweets, divided by the sample size, was taken to be an estimate for the fraction of relevant tweets in the entire collection. We thus estimated the number of relevant tweets in the collection of 50,068 tweets to be about 7,520 tweets (i.e., 15.02% of the tweets).</p><p>On the contrary, only 1,565 relevant tweets (3.13% of the tweets) were identified in the gold standard in <ref type="bibr" target="#b1">[3]</ref>. This represents a loss of about 6,000 useful tweets missed by the annotators in <ref type="bibr" target="#b1">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Machine Learning for automatic filtering of tweets</head><p>We trained machine learning models for automatic classification of tweets into topics, with the aim of automatically retrieving the most useful tweets that may have been missed in the annotation exercise in <ref type="bibr" target="#b1">[3]</ref>. As one tweet can be relevant to multiple topics, we applied supervised machine learning models separately for each topic, thus training a total of seven binary classifiers.</p><p>We used Support Vector Machines (SVM) for our classification task, as they have been found to be among the best models for text classification <ref type="bibr" target="#b2">[4]</ref>  <ref type="bibr" target="#b3">[5]</ref>. We used the implementation of LinearSVC (SVM with linear kernel) in the scikitlearn machine-learning library <ref type="bibr" target="#b5">[7]</ref>.</p><p>Training data As seen in Table <ref type="table" target="#tab_0">3</ref>, we could identify at most only 53 relevant tweets for one topic out of a sample of 700 tweets. Thus, the classification task is highly skewed, with non-relevant tweets forming a large majority.</p><p>To overcome the problems associated with such skewed classification, we used undersampling, i.e., we balanced the training data by taking only as many non-relevant tweets as we had relevant tweets.</p><p>Besides the positively labeled tweets that we labeled from our sample of 700 tweets, we also had the set of relevant gold standard tweets from <ref type="bibr" target="#b1">[3]</ref> to use for our machine learning task. Table <ref type="table" target="#tab_1">4</ref> lists the final number of labeled tweets that we used for each of the topics. (Our number of gold standard tweets are slightly less than in the original gold standard because we could not download about 500 tweets from the original collection from twitter due to those tweets getting deleted in the meantime. Also, the number of relevant tweets from the two sources, manual labeling by us of the sample of 700 tweets and gold standard in <ref type="bibr" target="#b1">[3]</ref>, do not add up perfectly, because some tweets are common between them.)</p><p>We applied minimal preprocessing on the tweets. The only operation that we applied was the removal of hashtag symbols (retaining the attached text).</p><p>We randomly divided the available training data into 70% for training and 30% for testing, for each topic. Feature Extraction Scikit-learn's CountVectorizer was used to extract token counts with a bag-of-words model. We experimented using (1) unigram features only, and (2) both unigram and bigram features, and got better results using unigram features only. We thus used only unigram features for all our remaining experiments. Also, no stemming or stopword removal was done, and tokenization of tweets was done by extracting words of at least 2 letters.</p><p>Then, TfidfTransformer was used to convert the raw counts to tf-idf weights. Thus, a bag-of-words model with unigram features of tf-idf weights was used.</p><p>Each experiment was carried out for 100 iterations with random partitions of the data in each iteration to training (70%) and test sets (30%), and the average of all performance metrics for the 100 iterations was taken.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>The performance of the classifiers based on various metrics are shown in Table <ref type="table" target="#tab_2">5</ref>. The precision-recall curve of the classifier for topic FMT1 is also shown. Precision-Recall curve for the SVM classifier for topic FMT1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Classification performance with number of examples</head><p>We tested the performance of our classifiers when using only a fraction of the available data. For each classifier and each given fraction of data, we randomly took a subset of the usable data for 100 iterations, and took the average of the performance scores for the classifier on the 100 iterations. The F1 scores of the classifiers with varying fractions of the data are shown in Table <ref type="table" target="#tab_3">6</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Retrieving most relevant tweets in the entire collection</head><p>We used the trained classifiers to retrieve the 100 most relevant tweets for each topic in the entire dataset by taking the 100 tweets with the maximum confidence scores of each classifier. We manually checked the sets of 100 tweets corresponding to the seven topics to determine how many of them were actually relevant, and how many of the relevant ones were identified by the gold standard in <ref type="bibr" target="#b1">[3]</ref>. The results of this exercise are shown in Table <ref type="table" target="#tab_4">7</ref>. We showed that the gold standard annotation exercise in <ref type="bibr" target="#b1">[3]</ref> missed many relevant tweets, even with a three-phase approach. Some major reasons why this happened may be: 1. Tweets are very short and noisy, and often relevant tweets do not contain the terms/keywords that one might intuitively expect for a given topic. Thus, the annotators could not find all relevant tweets using keyword searches in Phase 1.</p><p>2. Pooling works only when the number of participating systems is large, and the systems are diverse. Unlike tracks on TREC, the number of participants in <ref type="bibr" target="#b1">[3]</ref> was not large, and so standard pooling employed in Phase 3 also failed to find all relevant tweets. ( <ref type="bibr" target="#b7">[9]</ref> studies the reliability of pooling, and concludes that it is reliable if the depth of the pool is deep enough, i.e., many of the top results from all systems are taken into account, which is true for TREC with a depth of top 100 documents from each participating system, but taking only top 30 documents as was done in <ref type="bibr" target="#b1">[3]</ref> may not have been enough.) Since exhaustive annotation is not possible for the complete collection, to find relevant tweets in the remaining collection, a machine learning model as presented in this paper can be trained and used on the remaining data to retrieve the tweets with the highest confidence scores, and then manual confirmation of the relevance can be carried out for as many tweets as annotator time permits.</p><p>Another approach could be to exhaustively annotate a small random subset of the data, and then use keywords of the relevant-marked tweets to query into the entire collection, to retrieve relevant tweets in the remaining collection. This is one future possibility for us to experiment with. Some of the relevant tweets that were missed in the creation of gold standard in <ref type="bibr" target="#b1">[3]</ref> are listed in Table <ref type="table" target="#tab_5">8</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FMT4</head><p>RT @FocusNewsIndia: #NepalEarthquake -#Nepal PM Sushil Koirala requests for urgent blood donation for victims rescued from #earthquake htt #Nepal #Earthquake: Death toll could reach 10,000, says PM Sushil Koirala -Appeals for foreign supplies of tents and medicines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FMT5</head><p>Tomorrow, We are moving to Hansapur VDC of Gorkha District to provide relief materials to the earthquake... http://t.co/GYZiT3eyip At Shanupalati village, Barabise district. Please retweet. Free Clinic Nepal earthquake Relief.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FMT6</head><p>RT @HDLindiaOrg: #RSS sends 20k swamsewaks to Nepal. GOI sent 4 Tonnes relief material, Team of doctors, NDRF, JCBs, food, water, medicines #ArtofLiving Nepal Centre providing shelter to 100's of ppl. Volunteers providing food &amp; water #NepalEarthquakeRelief http://t.co/15RmABe2vO FMT7 RT @PDChina: The rubble of Hanumndhoka Durbar Square, a @UNESCO world #heritage site, was badly damaged by earthquake in Kathmandu http://t Historic Dharahara tower collapses in Kathmandu after earthquake http://t.co/ZeovAnQESi</p><p>We were able to achieve reasonably high F1 scores for our classifiers even with a training size of a few hundred examples (Table <ref type="table" target="#tab_3">6</ref>). This shows that automatic text classification is a viable approach to extract useful information from tweets during times of disasters, since a few hundred examples can easily be annotated in a short amount of time. It may also be fruitful to train supervised machine learning models in advance for different types of disaster situations, and use them in times of disaster until newly annotated data is obtained.</p><p>To improve on the machine learning model, some avenues to explore are:</p><p>using more features, including word embeddings, spatio-temporal features, linguistic features (as used in <ref type="bibr" target="#b6">[8]</ref>), etc. employing better preprocessing techniques, like using twitter-specific spelling correction, expanding common twitter abbreviations, better data cleaning, etc.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 3 .</head><label>3</label><figDesc>Number of tweets in the sample of 700 tweets identified in the gold standard in<ref type="bibr" target="#b1">[3]</ref> and in manual labeling by us</figDesc><table><row><cell>Topic</cell><cell cols="2">Gold Standard Manual Labeling</cell></row><row><cell>FMT1</cell><cell>7</cell><cell>43</cell></row><row><cell>FMT2</cell><cell>4</cell><cell>12</cell></row><row><cell>FMT3</cell><cell>5</cell><cell>10</cell></row><row><cell>FMT4</cell><cell>1</cell><cell>4</cell></row><row><cell>FMT5</cell><cell>4</cell><cell>9</cell></row><row><cell>FMT6</cell><cell>5</cell><cell>53</cell></row><row><cell>FMT7</cell><cell>3</cell><cell>28</cell></row><row><cell>Any of the topics</cell><cell>22</cell><cell>105</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 4 .</head><label>4</label><figDesc>Number of relevant tweets for each topic (from a combination of manual labeling and the gold standard), with the same number of non-relevant tweets added to make the data balanced</figDesc><table><row><cell cols="4">Topic Manual Labeling Gold Standard Total relevant</cell><cell>Non-relevant tweets added</cell><cell>Total labeled examples used</cell></row><row><cell>FMT1</cell><cell>43</cell><cell>579</cell><cell>615</cell><cell>615</cell><cell>1230</cell></row><row><cell>FMT2</cell><cell>12</cell><cell>290</cell><cell>298</cell><cell>298</cell><cell>596</cell></row><row><cell>FMT3</cell><cell>10</cell><cell>334</cell><cell>336</cell><cell>336</cell><cell>672</cell></row><row><cell>FMT4</cell><cell>4</cell><cell>110</cell><cell>113</cell><cell>113</cell><cell>226</cell></row><row><cell>FMT5</cell><cell>9</cell><cell>187</cell><cell>192</cell><cell>192</cell><cell>384</cell></row><row><cell>FMT6</cell><cell>53</cell><cell>373</cell><cell>421</cell><cell>421</cell><cell>842</cell></row><row><cell>FMT7</cell><cell>28</cell><cell>253</cell><cell>278</cell><cell>278</cell><cell>556</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 5 .</head><label>5</label><figDesc>Peformance of the seven binary classifiers based on various metrics (all in percentage)</figDesc><table><row><cell cols="5">Classifier for Accuracy Accuracy for +1 Accuracy for -1 Precision Recall F1 score</cell></row><row><cell>FMT1</cell><cell>92.72</cell><cell>92.83</cell><cell>92.64</cell><cell>92.56 92.83 92.67</cell></row><row><cell>FMT2</cell><cell>93.15</cell><cell>92.81</cell><cell>93.55</cell><cell>93.45 92.81 93.09</cell></row><row><cell>FMT3</cell><cell>95.23</cell><cell>93.99</cell><cell>96.48</cell><cell>96.35 93.99 95.14</cell></row><row><cell>FMT4</cell><cell>91.94</cell><cell>90.68</cell><cell>93.32</cell><cell>93.06 90.68 91.74</cell></row><row><cell>FMT5</cell><cell>90.91</cell><cell>88.47</cell><cell>93.46</cell><cell>92.95 88.47 90.57</cell></row><row><cell>FMT6</cell><cell>90.01</cell><cell>89.06</cell><cell>91.05</cell><cell>90.88 89.06 89.91</cell></row><row><cell>FMT7</cell><cell>91.22</cell><cell>90.49</cell><cell>92.04</cell><cell>91.89 90.49 91.13</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 6 .</head><label>6</label><figDesc>F1 scores of the classifiers with varying percentage of available labeled examples used (all in percentage)</figDesc><table><row><cell>Percentage of labeled examples used</cell><cell>FMT1 FMT2 FMT3 FMT4 FMT5 FMT6 FMT7</cell></row><row><cell>10</cell><cell>81.58 76.18 79.7 69.96 64.24 73.35 66.12</cell></row><row><cell>20</cell><cell>85.89 84.34 87.9 70.33 74.47 82.37 77.54</cell></row><row><cell>30</cell><cell>88.03 86.78 90.64 78.71 80.72 85.72 82.48</cell></row><row><cell>40</cell><cell>89.15 88.67 92.08 82.79 83.28 86.59 84.32</cell></row><row><cell>50</cell><cell>90.19 90.51 93.07 85.89 86.26 87.92 86.83</cell></row><row><cell>60</cell><cell>90.64 90.66 93.56 87.99 87.32 88.08 87.72</cell></row><row><cell>70</cell><cell>91.52 91.42 94.26 88.84 88.64 89.13 88.32</cell></row><row><cell>80</cell><cell>91.74 92.22 94.43 90.49 89.52 89.71 89.67</cell></row><row><cell>90</cell><cell>92.36 92.68 94.77 91.32 89.96 89.54 90.65</cell></row><row><cell>100</cell><cell>92.67 93.09 95.14 91.74 90.57 89.91 91.13</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 7 .</head><label>7</label><figDesc>Number of tweets out of 100 that were actually relevant, and among them the number of tweets that were identified in the gold standard in<ref type="bibr" target="#b1">[3]</ref> </figDesc><table><row><cell cols="3">Topic Actually relevant Marked in Gold Standard</cell><cell>Percentage of relevant tweets marked in Gold Standard</cell></row><row><cell>FMT1</cell><cell>80</cell><cell>43</cell><cell>53.75</cell></row><row><cell>FMT2</cell><cell>73</cell><cell>48</cell><cell>65.75</cell></row><row><cell>FMT3</cell><cell>92</cell><cell>57</cell><cell>61.96</cell></row><row><cell>FMT4</cell><cell>62</cell><cell>33</cell><cell>53.23</cell></row><row><cell>FMT5</cell><cell>65</cell><cell>22</cell><cell>33.85</cell></row><row><cell>FMT6</cell><cell>84</cell><cell>23</cell><cell>27.38</cell></row><row><cell>FMT7</cell><cell>94</cell><cell>32</cell><cell>34.04</cell></row><row><cell></cell><cell>78.57 %</cell><cell></cell><cell>47.14 %</cell></row><row><cell cols="3">4 Discussion and Future Work</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 8 .</head><label>8</label><figDesc>Some tweets that were missed in the gold standard in<ref type="bibr" target="#b1">[3]</ref> but were found by our ML models</figDesc><table><row><cell>Topic</cell><cell>Tweet Text</cell></row><row><cell></cell><cell>Earthquake Relief Distribution: Distributed Relief materials to the earthquake</cell></row><row><cell>FMT1</cell><cell>victims of Tukcha-1 (Pandy-Rai... http://t.co/0VlGHeFF4p</cell></row><row><cell></cell><cell>Delhi Govt has decided to send 25,000 packets of food and 25,000 pouches of</cell></row><row><cell></cell><cell>drinking water as immediate relief for the people in Nepal</cell></row><row><cell></cell><cell>RT @worldtoiletday: Nepal earthquake: Urgent need for water, #sanitation</cell></row><row><cell>FMT2</cell><cell>and food: http://t.co/uOb6Hq81pY #NepalEarthquake @UNICEF @UN Water</cell></row><row><cell></cell><cell>UN agency stresses urgent funding needs to get food to earthquake victims</cell></row><row><cell></cell><cell>http://t.co/xkn26ab08h</cell></row><row><cell></cell><cell>RT Bloodbanks #Nepal Hospital and Research Centre 4476225 Norvic</cell></row><row><cell>FMT3</cell><cell>Hospital 4258554 #NepalEarthquake #MNTL #India</cell></row><row><cell></cell><cell>WOREC and NAWHRD team are mobilized to Kavre and Bhaktapur</cell></row><row><cell></cell><cell>districts to provide relief to the earthquake victims and survivors.</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://en.wikipedia.org/wiki/April</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2015" xml:id="foot_1">Nepal earthquake</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Acknowledgements</head><p>We thank the anonymous reviewers for their thorough comments.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">An introduction to the bootstrap</title>
		<author>
			<persName><forename type="first">B</forename><surname>Efron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Tibshirani</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1994">1994</date>
			<publisher>CRC press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the fire 2016 microblog track: Information extraction from microblogs posted during disasters</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ghosh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Working notes of FIRE</title>
		<imprint>
			<biblScope unit="page" from="7" to="10" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Text categorization with support vector machines: Learning with many relevant features</title>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European conference on machine learning</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="137" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A review of machine learning algorithms for text-documents classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Baharudin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of advances in information technology</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="4" to="20" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Web 2.0 emergency applications: How useful can twitter be for emergency response?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Raghav Rao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Privacy and Security</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="3" to="26" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Extracting situational information from microblogs during disaster events: a classification-summarization approach</title>
		<author>
			<persName><forename type="first">K</forename><surname>Rudra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ganguly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghosh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</title>
				<meeting>the 24th ACM International on Conference on Information and Knowledge Management</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="583" to="592" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">How reliable are the results of large-scale information retrieval experiments?</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zobel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting>the 21st annual international ACM SIGIR conference on Research and development in information retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="307" to="314" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
