<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Atention Realignment and Pseudo-Labelling for Interpretable Cross-Lingual Classification of Crisis Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jitin Krishnan</string-name>
          <email>jkrishn2@gmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Social Media, Crisis Management, Text Classification, Unsuper-</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hemant Purohit</string-name>
          <email>hpurohit@gmu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huzefa Rangwala</string-name>
          <email>rangwala@gmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, George Mason University</institution>
          ,
          <addr-line>Fairfax, VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information, Sciences &amp; Technology, George Mason University</institution>
          ,
          <addr-line>Fairfax, VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>vised Cross-Lingual Adaptation</institution>
          ,
          <addr-line>Interpretability</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>24</volume>
      <issue>2020</issue>
      <abstract>
        <p>State-of-the-art models for cross-lingual language understanding such as XLM-R [7] have shown great performance on benchmark data sets. However, they typically require some fine-tuning or customization to adapt to downstream NLP tasks for a domain. In this work, we study unsupervised cross-lingual text classification task in the context of crisis domain, where rapidly filtering relevant data regardless of language is critical to improve situational awareness of emergency services. Specifically, we address two research questions: a) Can a custom neural network model over XLM-R trained only in English for such classification task transfer knowledge to multilingual data and vice-versa? b) By employing an attention mechanism, does the model attend to words relevant to the task regardless of the language? To this goal, we present an attention realignment mechanism that utilizes a parallel language classifier to minimize any linguistic diferences between the source and target languages. Additionally, we pseudo-label the tweets from the target language which is then augmented with the tweets in the source language for retraining the model. We conduct experiments using Twitter posts (tweets) labelled as a 'request' in the open source data set by Appen1, consisting of multilingual tweets for crisis response. Experimental results show that attention realignment and pseudo-labelling improve the performance of unsupervised crosslingual classification. We also present an interpretability analysis by evaluating the performance of attention layers on original versus translated messages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Social media platforms such as Twitter provide valuable information
to aid emergency response organizations in gaining real-time
situational awareness during the sudden onset of crisis situations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Extracting critical information about afected individuals,
infrastructure damage, medical emergencies, or food and shelter needs
can help emergency managers make time-critical decisions and
allocate resources eficiently [
        <xref ref-type="bibr" rid="ref15 ref21 ref22 ref30 ref31 ref36">15, 21, 22, 30, 31, 36</xref>
        ]. Researchers
have designed numerous classification models to help towards this
humanitarian goal of converting real-time social media streams into
actionable knowledge [
        <xref ref-type="bibr" rid="ref1 ref22 ref26 ref28 ref29">1, 22, 26, 28, 29</xref>
        ]. Recently, with the advent
of multilingual models such as multilingual BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and XLM
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], researchers have started adopting them to multilingual
disaster tweets [
        <xref ref-type="bibr" rid="ref25 ref6">6, 25</xref>
        ]. Since XLM-R [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] has been shown to be the most
superior model in cross-lingual language understanding, we
restrict our work to this model to explore the aspects of cross-lingual
transfer of knowledge and interpretability.
      </p>
      <p>In this work, we address two questions. First is to examine
whether XLM-R is efective in capturing multilingual knowledge by
constructing a custom model over it to analyze if a model trained
using English-only tweets will generalize to multilingual data and
vice-versa. Social media streams are generally diferent from other
text, given the user-generated content. For example, tweets are
usually short with possibly errors and ambiguity in the behavioral
expressions. These properties in turn make the classification task or
extracting representations a bit more challenging. Second question
is to examine whether word translations will be equally attended
by the attention layers. For instance, the words with higher
attention weights in a sentence in Haitian Creole such as “Tanpri nou
bezwen tant avek dlo nou zon silo mesi” should align with the words
in its corresponding translated tweet in English “Please, we need
tents and water. We are in Silo, Thank you!”. Our core idea is that if
‘dlo’ in the Haitian tweet has a higher weight, so should its English
translation ‘water’. This word-level language agnostic property can
promote machine learning models to be more interpretable. This
also brings several benefits to downstream tasks such as knowledge
graph construction using keywords extracted from tweets. In
situations where data is available only in one language, this similarity in
attention would still allow us to extract relevant phrases in
crosslingual settings. To the best of our knowledge in crisis analytics
domain, aligning attention in cross-lingual setting is not attempted
before. In this work, we focus our classification experiments only
to tweets containing ‘request’ intent, which will be expanded to
other behaviors, tasks, and datasets in the future.</p>
      <p>Contributions: We propose a novel attention realignment method
which promotes the task classifier to be more language agnostic,
which in turn tests the efectiveness of multilingual knowledge
capture of XLM-R model for crisis tweets; and a pseudo-labelling
procedure to further enhance the model’s generalizability. Furher,
incorporating the attention-based mechanism allows us to perform
an interpretability analysis on the model, by comparing how words
are attended in the original versus translated tweets.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK AND BACKGROUND</title>
      <p>
        There are numerous prior works (c.f. surveys [
        <xref ref-type="bibr" rid="ref14 ref4">4, 14</xref>
        ]) that focus
specifically on disaster related data to perform classification and
other rapid assessments during an onset of a new disaster event.
Crisis period is an important but challenging situation, where
collecting labeled data during an ongoing event is very expensive. This
problem led to several works on domain adaptation techniques in
which machine learning models can learn and generalize to unseen
crisis event [
        <xref ref-type="bibr" rid="ref10 ref18 ref23 ref3">3, 10, 18, 23</xref>
        ]. In the context of crisis data, Nguyen et al.
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] designed a convolutional neural network model which does not
require any feature engineering and Alam et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] designed a CNN
architecture with adversarial training on graph embeddings.
Krishnan et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] showed that sharing a common layer for multiple
tasks can improve performance of tasks with limited labels.
      </p>
      <p>
        In multilingual or cross-lingual direction, many works [
        <xref ref-type="bibr" rid="ref17 ref8">8, 17</xref>
        ]
tried to align word embeddings (such as fastText [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]) from diferent
languages into the same space so that a word and its translations
have the same vector. These models are superseded by models such
as multilingual BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and XLM-R [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that produce contextual
embeddings which can be pretrained using several languages
together to achieve impressive performance gains on multilingual
use-cases.
      </p>
      <p>
        Attention mechanism [
        <xref ref-type="bibr" rid="ref2 ref24">2, 24</xref>
        ] is one of the most widely used
methods in deep learning that can construct a context vector by
weighing on the entire input sequence which improves over previous
sequence-to-sequence models [
        <xref ref-type="bibr" rid="ref13 ref34 ref35">13, 34, 35</xref>
        ]. As the model produces
weights associated with each word in a sentence, this allows for
evaluating interpretability by comparing the words that are given
priority in original versus translated tweets.
      </p>
      <p>
        With more and more machine learning systems being adopted
by diverse application domains, transparency in decision-making
inevitably becomes an essential criteria, especially in high-risk
scenarios [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] where trust is of utmost importance. With deep
neural networks, including natural language systems, shown to
be easily fooled [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], there has been many promising ideas that
empower machine learning systems with the ability to explain
their predictions [
        <xref ref-type="bibr" rid="ref32 ref5">5, 32</xref>
        ]. Gilpin et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] presents a survey of
interpretability in machine learning, which provides a taxonomy of
research that addresses various aspects of this problem. Similar to
the work by Ross et al. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], we employ an attention-based approach
to evaluate model interpretability applied to the crisis-domain.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
    </sec>
    <sec id="sec-4">
      <title>Problem Statement: Unsupervised</title>
    </sec>
    <sec id="sec-5">
      <title>Cross-Lingual Crisis Tweet Classification</title>
      <p>Consider tweets in language A and their corresponding translated
tweets in language B. The task of unsupervised cross-lingual
classiifcation is to train a classifier using the data only from the source
language and predict the labels for the data in the target language.
This experimental set up is usually represented as  →  for
training a model using A and testing on B or  →  for training a
model using B and testing on A.  refers to the data and  refers
to the ground truth labels. The multilingual dataset used in our
experiments consists of original multilingual ( ) tweets and their
translated () tweets in English. To summarize:
Experiment  ( →  ):
Input:  ,  , 
Output:  ←  ( )
Experiment  ( → ):
Input:  ,  , 
Output:  ←  ( )
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Overview</title>
      <p>In the following sections, we propose two methodologies to
enhance cross-lingual classification: 1) Attention Realignment and 2)
Pseudo-Labelling. Attention realignment utilizes a language
classifier which is trained in parallel to realign the attention layer of
the task classifier such that the weights are more geared towards
task-specific words regardless of the language. Pseudo-Labelling
further enhances the classifier by adding high quality seeds from
the target language that are pseudo-labelled by the task classifier.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Attention Realignment by Parallel</title>
    </sec>
    <sec id="sec-8">
      <title>Language Classifier</title>
      <p>As depicted in Fig 2, model on the left side is the task classifier and
the model on the right side is a language classifier that is trained in
parallel. The purpose of this language classifier is to pick up aspects
that is missed by the XLM-R model. This could be tweet-specific,
crisis-specific, or other linguistic nuances that can separate original
tweets and translated tweets. Note that semantically, translated
words are expected to have similar XLM-R representations.</p>
      <p>Attention realignment is a mechanism we introduce to promote
the task classifier to be more language independent. The main idea
is that the words that are given higher attention in a language
classifier should be less important in a task classifier. For example,
‘dlo’ in Haitian and ‘water’ in English should have the same vector
where  is a hyperparameter to tune the amount of
subtraction needed for the task classifier. Similarly, for the language
classifier,
−→ ′
−→ ′
−  
−→ , 0, 1
−→
!
(2) Attention Loss: Along with attention diference, the model
can also be trained by inserting an additional loss function
term that penalizes the similarity between the attention
weights from the two classifiers. We use the Frobenius norm.
 = ∥−→  −→ ′ ∥2
 = ∥−→ −→ ′ ∥2
(4)
for task and language respectively. Resulting final loss
function of joint training will be:
 ( ) =   +   +   +</p>
      <p>1 Õ
where  is the hyperparameter to tune the attention loss
weight,  is the hyperparameter to tune the joint training
loss, and  denotes the binary cross entropy loss,</p>
      <p>= −  =1 [ log ˆ + (1 −  ) log(1 − ˆ )]
It is important to note that the Frobenius norm is not simply
between the attention weights of the two models but rather
between the attention weights produced by the two models
on the same input tweet. For example, for a given tweet, the
task classifier attends more to task-specific words and the
language classifier attends to language-specific words. So
the mechanism makes sure that they are distinct.
(2)
(3)
(5)
(6)</p>
    </sec>
    <sec id="sec-9">
      <title>3.4 Pseudo-Labelling</title>
      <p>To enhance the model further, we pseudo-label the data in the
target language. For example, if we are training a model using the
English tweets, we use the original tweets before translation for
pseudo-labelling. The idea is simply to gather high-quality seeds
from the target to retrain the model. Note that, we still do not use
any target labels here; still following the unsupervised goal. Thus,
for retraining model M1 for  →  , the new dataset would consist
of: + and  + as positive examples and − and  −
 
as negative examples.
3.5</p>
    </sec>
    <sec id="sec-10">
      <title>XLM-R Usage</title>
      <p>The recommended feature usage of XLM-R2 is either by fine-tuning
to the task or by aggregating features from all the 25 layers. We
employ the later to extract the multilingual embeddings for the
tweets.
4</p>
    </sec>
    <sec id="sec-11">
      <title>DATASET &amp; EXPERIMENTAL SETUP</title>
      <p>Positive
Negative</p>
      <sec id="sec-11-1">
        <title>Train</title>
        <p>3554
17473</p>
      </sec>
      <sec id="sec-11-2">
        <title>Validation</title>
        <p>418
2152</p>
      </sec>
      <sec id="sec-11-3">
        <title>Test</title>
        <p>496
2128

Deep Learning Library
Optimizer
Maximum Epoch
Dropout
Early Stopping Patience
Batch Size


 ,  ,  ,</p>
        <p>The attention weights for both task and language classifiers
are manipulated by each other during training by a process
of subtraction (attention diference) as well a loss component
(attention loss). See section 3.3.
(3) Model M2: Adding the pseudo-labelling procedure to model
M1 produces model M2. Using Model M1 which is trained
to be language agnostic, tweets from the target languages
are pseudo-labelled. High quality seeds are selected (using
Model M1 &gt;0.7) and augmented to the original training
dataset to retrain the task classifier.</p>
        <p>Results show that, for cross-lingual evaluation on  →  ,
model M1 outperforms the baseline by +4.3% and model M2
outperforms by +11.4%. On  → , model M1 outperforms the baseline
by +7.8% and model M2 outperforms by +16.5%. This shows that
both models are efective in cross-lingual crisis tweet classification.
An interesting observation to note is that using attention
realignment alone decreased the classification performance in the same
language, which is brought back up by pseudo-labelling. These
scores are shown in brackets in table 4. A deeper investigation in
this direction on various other tasks can shed more light on the
impact of realignment mechanism.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>Interpretability: Attention Visualization</title>
      <p>
        We follow a similar attention architecture shown in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The
context vector is constructed as a result of dot product between the
attention weights and word activations. This represents the
interpretable layer in our architecture. The attention weights represent
the importance of each word in the classification process. Two
examples are shown in figure 3. In the first example, both  → 
and  →  give attention to the word ‘hungry’ (i.e., ‘grangou’ in
Haitian Creole). Note that these two are results from the models
that are trained in the same language in which they are tested; thus,
expecting an ideal performance. For the baseline model in the
crosslingual set-up  →  , although it correctly predicts the label, the
attention weights are more spread apart. In model M2 with
attention realignment and pseudo-labelling, although with some spread,
the attention weights are shifted more toward ‘grangou’. Similarly
in example 2, the attention weights in the baseline model are more
spread apart. Cross-lingual performance of model M2 aligns more
with  →  and  →  . These examples show the importance
of having interpretability as a key criterion in cross-lingual crisis
tweet classification problems; which can also be used for
downstream tasks such as extracting relevant keywords for knowledge
graph construction.
6
      </p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSION</title>
      <p>We presented a novel approach for unsupervised cross-lingual
crisis tweet classification problem using a combination of attention
realignment mechanism and a pseudo-labelling procedure (over
the state-of-the-art multilingual model XLM-R) to promote the task
classifier to be more language agnostic. Performance evaluation
showed that both models M1 and M2 outperformed the baseline by
+4.3% and +11.4% respectively for cross-lingual text classification
from English to Multilingual. We also presented an
interpretability analysis by comparing the attention layers of the models. It
shows the importance of incorporating a word-level language
agnostic characteristic in the learning process, when training data
is available only in one language. Performing extensive
hyperparameter tuning and expanding the idea to other tasks (including
cross-task/multi-task) are left as future work. We also plan another
direction for future work as to incorporate the human-engineered
knowledge from the multilingual knowledge graphs such as
BabelNet in our model architecture that could improve the learning
of similar concepts across languages critical to the crisis response
agencies.</p>
      <p>Reproducibility: Source code is available available at: https://
github.com/jitinkrishnan/Cross-Lingual-Crisis-Tweet-Classification
7</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGEMENT</title>
      <p>Authors would like to thank U.S. National Science Foundation
grants IIS-1815459 and IIS-1657379 for partially supporting this
research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Firoj</given-names>
            <surname>Alam</surname>
          </string-name>
          , Shafiq Joty, and
          <string-name>
            <given-names>Muhammad</given-names>
            <surname>Imran</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Domain adaptation with adversarial training and graph embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>05151</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>John</given-names>
            <surname>Blitzer</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ryan McDonald</surname>
            ,
            <given-names>and Fernando</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Domain adaptation with structural correspondence learning</article-title>
          .
          <source>In Proceedings of the 2006 conference on empirical methods in natural language processing</source>
          .
          <volume>120</volume>
          -
          <fpage>128</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Castillo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Big crisis data: social media in disasters and time-critical situations</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Xi</given-names>
            <surname>Chen</surname>
          </string-name>
          , Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and
          <string-name>
            <given-names>Pieter</given-names>
            <surname>Abbeel</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Infogan: Interpretable representation learning by information maximizing generative adversarial nets</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>2172</volume>
          -
          <fpage>2180</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jishnu</given-names>
            <surname>Ray</surname>
          </string-name>
          <string-name>
            <surname>Chowdhury</surname>
          </string-name>
          , Cornelia Caragea, and
          <string-name>
            <given-names>Doina</given-names>
            <surname>Caragea</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>CrossLingual Disaster-related Multi-label Tweet Classification with Manifold Mixup</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop</source>
          . 292-
          <fpage>298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          , Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>02116</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          , Guillaume Lample,
          <string-name>
            <surname>Marc'Aurelio Ranzato</surname>
            , Ludovic Denoyer, and
            <given-names>Hervé</given-names>
          </string-name>
          <string-name>
            <surname>Jégou</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Word Translation Without Parallel Data</article-title>
          .
          <source>arXiv preprint arXiv:1710.04087</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Yaroslav</given-names>
            <surname>Ganin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Unsupervised domain adaptation by backpropagation</article-title>
          .
          <source>arXiv preprint arXiv:1409.7495</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Leilani H Gilpin</surname>
            , David Bau, Ben Z Yuan, Ayesha Bajwa,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Specter</surname>
            , and
            <given-names>Lalana</given-names>
          </string-name>
          <string-name>
            <surname>Kagal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explaining explanations: An overview of interpretability of machine learning</article-title>
          .
          <source>In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA)</source>
          . IEEE,
          <fpage>80</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>David</given-names>
            <surname>Gunning</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Explainable artificial intelligence (xai)</article-title>
          .
          <source>Defense Advanced Research Projects Agency (DARPA)</source>
          ,
          <source>nd Web 2</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9</source>
          ,
          <issue>8</issue>
          (
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Muhammad</surname>
            <given-names>Imran</given-names>
          </string-name>
          , Carlos Castillo, Fernando Diaz, and
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Vieweg</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Processing social media messages in mass emergency: A survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 47</source>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Muhammad</surname>
            <given-names>Imran</given-names>
          </string-name>
          , Prasenjit Mitra, and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Castillo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages</article-title>
          .
          <source>arXiv preprint arXiv:1605.05894</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Robin</given-names>
            <surname>Jia</surname>
          </string-name>
          and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Adversarial examples for evaluating reading comprehension systems</article-title>
          .
          <source>arXiv preprint arXiv:1707.07328</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Armand</surname>
            <given-names>Joulin</given-names>
          </string-name>
          , Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and
          <string-name>
            <given-names>Edouard</given-names>
            <surname>Grave</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Loss in translation: Learning bilingual word mapping with a retrieval criterion</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>07745</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jitin</surname>
            <given-names>Krishnan</given-names>
          </string-name>
          , Hemant Purohit, and
          <string-name>
            <given-names>Huzefa</given-names>
            <surname>Rangwala</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Diversity-Based Generalization for Neural Unsupervised Text Classification under Domain Shift</article-title>
          . https://arxiv.org/pdf/
          <year>2002</year>
          .10937.
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jitin</surname>
            <given-names>Krishnan</given-names>
          </string-name>
          , Hemant Purohit, and
          <string-name>
            <given-names>Huzefa</given-names>
            <surname>Rangwala</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Unsupervised and Interpretable Domain Adaptation to Rapidly Filter Social Web Data for Emergency Services</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>04991</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Guillaume</given-names>
            <surname>Lample</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Cross-lingual language model pretraining</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>07291</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Kathy</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ankit</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alok</given-names>
            <surname>Choudhary</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Real-time disease surveillance using twitter data: demonstration on flu and cancer</article-title>
          .
          <source>In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          .
          <volume>1474</volume>
          -
          <fpage>1477</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Hongmin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Doina</given-names>
            <surname>Caragea</surname>
          </string-name>
          , Cornelia Caragea, and
          <string-name>
            <given-names>Nic</given-names>
            <surname>Herndon</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Disaster response aided by tweet classification with a domain adaptation approach</article-title>
          .
          <source>Journal of Contingencies and Crisis Management</source>
          <volume>26</volume>
          ,
          <issue>1</issue>
          (
          <year>2018</year>
          ),
          <fpage>16</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ying Wei</surname>
          </string-name>
          , Yu Zhang, and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hierarchical attention transfer network for cross-domain sentiment classification</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence .</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Minh-Thang</surname>
            <given-names>Luong</given-names>
          </string-name>
          , Hieu Pham, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Efective approaches to attention-based neural machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1508.04025</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Guoqin</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Tweets Classification with BERT in the Field of Disaster Management</article-title>
          . https://pdfs.semanticscholar.org/d226/ 185fa1e14118d746cf0b04dc5be8f545ec24.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Reza</surname>
            <given-names>Mazloom</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hongmin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Doina</given-names>
            <surname>Caragea</surname>
          </string-name>
          , Cornelia Caragea, and
          <string-name>
            <given-names>Muhammad</given-names>
            <surname>Imran</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Hybrid Domain Adaptation Approach for Identifying CrisisRelevant Tweets</article-title>
          .
          <source>International Journal of Information Systems for Crisis Response and Management (IJISCRAM) 11</source>
          ,
          <issue>2</issue>
          (
          <year>2019</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Edouard Grave, Piotr Bojanowski,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Puhrsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Armand</given-names>
            <surname>Joulin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Advances in Pre-Training Distributed Word Representations</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Dat</given-names>
            <surname>Tien</surname>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , Kamela Ali Al Mannai,
          <string-name>
            <given-names>Shafiq</given-names>
            <surname>Joty</surname>
          </string-name>
          , Hassan Sajjad, Muhammad Imran, and
          <string-name>
            <given-names>Prasenjit</given-names>
            <surname>Mitra</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Rapid classification of crisis-related data on social networks using convolutional neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1608.03902</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Ferda</surname>
            <given-names>Ofli</given-names>
          </string-name>
          , Patrick Meier, Muhammad Imran, Carlos Castillo, Devis Tuia, Nicolas Rey, Julien Briant, Pauline Millet, Friedrich Reinhard,
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Parkan</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Combining human computing and machine learning to make sense of big (aerial) data for disaster response</article-title>
          .
          <source>Big data 4</source>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>47</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Bahman</given-names>
            <surname>Pedrood</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hemant</given-names>
            <surname>Purohit</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Mining help intent on twitter during disasters via transfer learning with sparse coding</article-title>
          .
          <source>In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation</source>
          . Springer,
          <fpage>141</fpage>
          -
          <lpage>153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Hemant</surname>
            <given-names>Purohit</given-names>
          </string-name>
          , Carlos Castillo, Fernando Diaz, Amit Sheth, and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Meier</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Emergency-relief coordination on social media: Automatically matching resource requests and ofers</article-title>
          .
          <source>First Monday</source>
          <volume>19</volume>
          ,
          <issue>1</issue>
          (Dec.
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Tulio</surname>
          </string-name>
          <string-name>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>" Why should i trust you?" Explaining the predictions of any classifier</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          .
          <volume>1135</volume>
          -
          <fpage>1144</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Slavin Ross</surname>
          </string-name>
          , Michael C Hughes, and
          <string-name>
            <surname>Finale</surname>
          </string-name>
          Doshi-Velez.
          <year>2017</year>
          .
          <article-title>Right for the right reasons: Training diferentiable models by constraining their explanations</article-title>
          .
          <source>arXiv preprint arXiv:1703.03717</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster and Kuldip K Paliwal</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          <volume>45</volume>
          ,
          <issue>11</issue>
          (
          <year>1997</year>
          ),
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Ilya</surname>
            <given-names>Sutskever</given-names>
          </string-name>
          , Oriol Vinyals, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>3104</volume>
          -
          <fpage>3112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>István</surname>
            <given-names>Varga</given-names>
          </string-name>
          , Motoki Sano, Kentaro Torisawa, Chikara Hashimoto, Kiyonori Ohtake, Takao Kawai, Jong-Hoon
          <string-name>
            <surname>Oh</surname>
          </string-name>
          , and Stijn De Saeger.
          <year>2013</year>
          .
          <article-title>Aid is out there: Looking for help from tweets during a large scale disaster</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          .
          <fpage>1619</fpage>
          -
          <lpage>1629</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>