<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic recognition of figurative language in biomedical articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Willie Rogers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Mork</string-name>
          <email>jmorkg@mail.nih.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Library of Medicine</institution>
          <addr-line>8600 Rockville Pike Bethesda, MD, 20894</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Figurative language plays an important role in thought processes and science. Automatic detection of figurative language is gaining momentum in the open domain natural language processing research, but it is hindered in the biomedical domain by the absence of document collections for development and testing of the approaches. Reliable approaches to detection of figurative language could potentially improve automatic indexing of the literature and support clinical applications. We have developed a collection of documents annotated for literal or non-literal use of seven terms that are known to cause errors in automatic indexing of biomedical abstracts. Using the collection, we explore detection of figurative language with CNN-RNN, logistic regression and transformer models. We establish baselines for each of the seven terms, achieving the results at the level of the state-ofthe-art reported in the open domain evaluations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Figurative language plays an important role in science, with
metaphors and idiomatic expressions viewed as foundations
for thought processes
        <xref ref-type="bibr" rid="ref24 ref4">(Taylor and Dewsbury 2018; Cork,
Kaiser, and White 2019)</xref>
        . Wide use of figurative language
in the biomedical literature presents a significant challenge
in automatic text understanding. Consider the term falls in
the following sentences:
      </p>
      <sec id="sec-1-1">
        <title>A patient who suffered a fall from a wagon.</title>
      </sec>
      <sec id="sec-1-2">
        <title>Falling off the care wagon.</title>
      </sec>
      <sec id="sec-1-3">
        <title>Falling off the dopamine wagon.</title>
      </sec>
      <sec id="sec-1-4">
        <title>Fall from a train wagon.</title>
      </sec>
      <sec id="sec-1-5">
        <title>Fall from horse-drawn wagon.</title>
        <p>
          Whereas it is relatively easy for people to discern which of
these phrases refer to physical falls, the biomedical named
entity recognition (NER) approaches often treat figurative
language as literal and link the word to inappropriate
ontology terms as a result. Specifically, in the task of
automated indexing that aims to summarize the main points of a
publication by assigning terms from a controlled vocabulary
created to index the biomedical literature: Medical Subject
Headings (MeSH) (NLM 2020
          <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed November, 2020)</xref>
          .
        </p>
        <p>In the biomedical publications, the problem of
recognizing non-literal utterances is intertwined with word sense
disambiguation (WSD), and compounded by the importance of
the term to the article. The WSD aspects could be illustrated
by the following:</p>
        <p>The head of each fish, including the brain and
pituitary, was sampled for double-colored FISH analysis.</p>
        <p>To many NER approaches, the first occurrence of fish is
indistinguishable from FISH, which stands for fluorescent in
situ hybridization. The confusion continues in:</p>
        <p>Is being a small fish in a big pond bad for students´
psychosomatic health?</p>
        <p>
          Moreover, for food products manufactured from fish, such
as fish oil, linking to Fishes also violates indexing rules.
To summarize, to label a biomedical publication with the
terms from a terminology, we need to determine if the terms
are used literally, if the sense in the context corresponds to
the sense in the terminology, and if the term is important
enough to be indexed for the article in MEDLINE/PubMed
database, which comprises more than 30 million biomedical
abstracts (NLM 2020
          <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed November, 2020)</xref>
          . The
importance of a term plays a bigger role when we use the
existing manual indexing of biomedical abstracts for training and
testing: The correct sense of a term could be used literally in
the abstract, but the term might not be central enough to the
publication to be assigned by the indexer.
        </p>
        <p>
          Whereas there continues to be a steady research in
biomedical WSD
          <xref ref-type="bibr" rid="ref18">(Pesaranghader et al. 2019)</xref>
          , and use of
figurative language in biomedicine
          <xref ref-type="bibr" rid="ref4">(Cork, Kaiser, and White
2019)</xref>
          , automated understanding of biomedical figurative
language is still an under-explored area. Our objectives
therefore are:
1. to determine which non-literal expressions are prevalent
in the biomedical literature and present difficulties to
automated understanding,
2. create training and test collections for these terms, and
3. explore approaches to automated detection of non-literal
language.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The body of work on detection of figurative language in the
open domain is significant, and the interest to the topic is
growing, as evidenced by the workshops and shared tasks
on figurative language processing
        <xref ref-type="bibr" rid="ref10">(Klebanov et al. 2020)</xref>
        .
        <xref ref-type="bibr" rid="ref25">Veale et al. (2016)</xref>
        provide an overview of the types of
figurative language and of the computational approaches to
detection and understanding of figurative language. The
approaches are mostly formulated as a binary classification
task on a limited set of triples, and sometimes as
prediction of the class of a token in a sentence
        <xref ref-type="bibr" rid="ref6 ref7">(Feldman and Peng
2013; Gao et al. 2018)</xref>
        . Taking into account the immediate
lexico–syntactic context of the utterance and incorporating
discourse features improves recognition of figurative
language
        <xref ref-type="bibr" rid="ref13">(Mu, Yannakoudakis, and Shutova 2019)</xref>
        . In an
endto-end RNN-based system,
        <xref ref-type="bibr" rid="ref12">Mao et al. (2019)</xref>
        emulated two
human approaches to identification of figurative language:
1) noticing a semantic contrast between a target word and
its context – Selectional Preference Violation, and 2)
identifying if the literal meaning of a word contrasts with the
meaning that word takes in the context – Metaphor
Identification Procedure.
      </p>
      <p>To the best of our knowledge, our work is the first to
explore the difficulties figurative language poses for
automated indexing of the biomedical literature. We also
provide the first publicly available biomedical literature dataset
annotated for figurative language at the token and sentence
level. In addition, leveraging the state-of-the-art approaches
explored in the open domain, we establish baselines for
detection of figurative language in biomedical abstracts using
sentence or token level classification.</p>
    </sec>
    <sec id="sec-3">
      <title>Data Sources and Collections</title>
      <p>
        We analyzed 870 American English idioms
        <xref ref-type="bibr" rid="ref1">(Bulkes and
Tanner 2017)</xref>
        , and 464 metaphors
        <xref ref-type="bibr" rid="ref2">(Katz et al. 1988;
Campbell and Raney 2016)</xref>
        . We searched the Free Dictionary
Idioms dictionary (FARLEX 2020
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed November, 2020)</xref>
        for additional examples of figurative phrases. We then
submitted figurative language expressions to MeSH on
Demand (NLM 2020
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed November, 2020)</xref>
        to identify
potential triggers for false-positive linking to MeSH e.g., cat
and mouse in “the game of cat and mouse” could be mapped
to Cats and Mice, respectively. We then searched PubMed
with these trigger terms to get the frequency of their use in
publications. We identified seven most frequent false
positives triggers that are shown in Table 1 along with the sizes
of the training and test sets for each term.
      </p>
      <p>We then searched PubMed for the exact figurative
expressions, and for the abstracts containing trigger terms that were
either indexed or not with the corresponding MeSH
headings. Abstracts with trigger terms and MeSH headings serve
as examples of literal use in the training set, and abstracts
without MeSH headings serve as examples of non-literal
use. For the test sets, we randomly sampled files from both
distributions and manually annotated the sentences
containing the terms at the token level. We annotated fine-grained
senses corresponding to:
1. Full MH: the literal Mesh Heading-appropriate sense,
e.g., “a healthy baby at 34 weeks of gestation.” The labels
assigned by the indexers were not shown to the annotators
to avoid bias.</p>
      <p>Term (MH)
fall (Accidental Falls)
fish (Fishes)
juvenile (Adolescent)
baby (Infant)
bull (Cattle)
cat (Cats)
dog (Dogs)</p>
      <p>Check Tag
no
no
yes
yes
yes
yes
yes
2. Partial Literal: MH-appropriate sense, but being a part
of an expression, which should not trigger mapping to
MeSH, e.g., shaken baby syndrome.
3. Literal Other: Literal senses other than MH, e.g., baby
hamster is still a baby, but it should not be indexed with
Infant, which applies only to human babies.
4. Figurative: Non-literal use of the term, e.g., in “There’s
a Baby in this Bath Water!”
Each document was annotated by two annotators and the
differences were reconciled.</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        We explored CNN-RNN (Svoboda 2020
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed
November, 2020)</xref>
        , Logistic Regression
        <xref ref-type="bibr" rid="ref17">(Pedregosa et al. 2011)</xref>
        and
BERT-based (Kaiyinzhou 2020
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed November, 2020)</xref>
        approaches with various embeddings and the Universal
Sentence Encoder
        <xref ref-type="bibr" rid="ref3">(Cer et al. 2018)</xref>
        . We used sentences from
PubMed abstracts containing the trigger terms and the
expressions from the above collections of idioms for
training these models. Due to sparseness of the annotations and
unavailability of sufficient examples for training and for
judging the results, we collapsed the annotations into two
classes: figurative or literal MH-appropriate. Any terms that
were labeled LiteralOther or PartialLiteral were relabeled
as Figurative. For example, in an article about dog owners,
dog was considered as non-literal. Terms labeled as
Figurative or FullMH remained unchanged.
      </p>
      <p>We then approached the task as binary classification at the
sentence or token level.</p>
      <p>
        To train the CNN-RNN and Logistic Regression
models, sentences containing the target trigger terms were
extracted from a set of retrieved documents that were labeled
using MeSH indexing information as described above. Each
extracted sentence was assigned the label of the document
from which it was derived. Sentence embeddings were
generated using a Doc2Vec
        <xref ref-type="bibr" rid="ref19">(Rehurek and Sojka 2010)</xref>
        model
pre-trained on the documents retrieved for the trigger terms.
      </p>
      <p>In the CNN-RNN approach, the embeddings and
associated labels served as input to a neural network
containing four groups of four layers: convolutional layer, dropout,
max-pooling, and dropout, followed by an LSTM layer.</p>
      <p>The model uses a sigmoid activation function, binary
crossentropy loss and the adam optimizer.</p>
      <p>We used the SciKit Learn Logistic Regression classifier,
with Doc2Vec output as inputs.</p>
      <p>The Universal Sentence Encoder was also applied in the
sentence level classification task. Unlike the Doc2Vec
models, the Universal Sentence Encoder was trained on a very
large corpus using a variety of sources. In our approach, each
sentence vector representation was generated using the
Universal Sentence Encoder during training. The vector
representation and the sentence label was then passed to a
twolayer neural network consisting of a RELU and a softmax
layer. A categorical cross-entropy loss and the adam
optimizer was used when building the model.</p>
      <p>
        We used BERT encoder extended with a CRF layer
for Named Entity Recognition (Kaiyinzhou 2020
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref22 ref5 ref8">(accessed
November, 2020)</xref>
        for the token-level classification of
literal and figurative use of the tokens. We used BIO-style
(Beginning-Inside-Outside) features. To train BERT, we
tagged the trigger terms with the label of the sentence and
all other terms in the sentence as outside.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>
        We created a collection of PubMed abstracts automatically
annotated for literal and non-literal use of seven terms that
proved to be a rich source of false positive linking to
terminologies and have sufficient amounts of training documents
in PubMed. Interestingly, one of these terms, fall was also
found to be difficult to classify as figurative in the open
domain tasks
        <xref ref-type="bibr" rid="ref20">(Stowe et al. 2019)</xref>
        .
      </p>
      <p>
        We explored several state-of-the-art approaches, casting
the task as binary classification at the sentence and
token level. We hoped to identify one best approach for the
task and achieve state-of-the-art performance for all trigger
terms. The best results reported in the literature for the
opendomain figurative language detection and in the shared task
on metaphor detection
        <xref ref-type="bibr" rid="ref10">(Klebanov et al. 2020)</xref>
        are around
70% F-1 score, sometimes reaching 80% and above
performance. Although we have obtained F-1 scores above 80%
for five of the seven terms, we cannot identify a single
approach that will achieve good scores on all trigger terms.
The F-1 score for fish is only 56%. This score could
probably be explained by the fact that this term often violates
the widely used WSD assumption of “one sense per
document”
        <xref ref-type="bibr" rid="ref26">(Yarowsky 1995)</xref>
        , which we used to create the
training set. As can be seen in the example, two senses of fish are
used in the same sentence:
      </p>
      <p>These preliminary results provide the basis for the
further development of a non-GMO approach to
modulate fish allergenicity and improve safety of
aquaculture fish. (PMID: 31622806)</p>
      <p>The indexers labeled this article with both Fishes and
Seafood. When the contexts for these occurrences of fish are
used in the models as positive examples, they might be too
close to the contexts of the articles that present fish only in
the context of food and thus serve as negative examples.</p>
      <p>
        With respect to identifying one approach that would work
best for all of the trigger terms, we can see that
casting the task as sentence-level classification and using the
CNN-RNN model produces the majority of best results.
        <xref ref-type="bibr" rid="ref20">Stowe (2019)</xref>
        observes that fall is difficult to classify
because the distribution of the literal and metaphoric uses of
this word in the open domain is almost even. In our
annotations, we also observed frequent use of fall in
personification, which might explain why the Universal Sentence
Encoder pre-trained on a variety of sources performs much
better for falls.
      </p>
      <p>Another interesting observation is that if we want to select
a method for automated indexing, we will have to decide
if recall or precision are more important when suggesting
the terms. For cat, dog, fish and juvenile, the differences in
these two metrics achieved by different approaches are
relatively large, although the F-scores are mostly close,
showing a typical trade-off between the two metrics. In selecting
approaches to support automated indexing, precision often
plays an important role, as currently the consensus is that it
is better to miss a term than to assign an inappropriate term
that will mislead the search engines that rely on MeSH
indexing. For that reason, we do not consider accuracy when
selecting an approach for supporting automated indexing.</p>
      <p>Our work has some limitations that we hope to address in
the future. First, we addressed only seven of the hundreds
of terms used figuratively in the biomedical literature.
Although the seven terms provided enough information to see
that no single approach is a winning strategy, additional
annotations will be needed for testing approaches to figurative
language detection on PubMed scale. We also found that
for many remaining terms figurative use in PubMed is
infrequent and additional sources of figurative language will
be needed for training. For example, butterflies in my
stomach is used in PubMed only two times, and butterflies AND
stomach 20 times. More data will be needed to train a
classifier to distinguish between these two titles:</p>
      <p>Butterflies in My Stomach: Insects in Human Nutrition
Neurotic butterflies in my stomach: the role of anxiety,
anxiety sensitivity and depression in functional
gastrointestinal disorders</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>This work presents an initial exploration of the use and
detection of figurative language in biomedical publications. On
the one hand, figurative language is known to play an
important role in thought processes and in science, and
therefore being widely used in biomedical publications, on the
other hand, automated detection of figurative language in
the biomedical publications has not yet attracted research.
To explore feasibility of automated detection of figurative
language, we created a collection of documents annotated
for literal or non-literal use of seven terms that are known to
cause errors in automatic indexing of biomedical abstracts
with MeSH terms. We then explored sentence and
tokenlevel classification approaches to detection of figurative
language using CNN-RNN, logistic regression and transformer
models. With the exception of one term, fish, our
performance is on par with the state-of-the-art achieved in the
open domain evaluations. We hope that the interesting
problem of detection of figurative language in biomedical text,
the dataset, and the automated approach to creation of the
training sets outlined in this work will bring about further
research in this area.</p>
      <p>Data &amp; code: https://ii.nlm.nih.gov/DataSets/index.shtml</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work was supported by the intramural research program
at the U.S. National Library of Medicine, National Institutes
of Health.</p>
      <p>We thank Alan Aronson, Francois Lang, Laritza
Rodriguez and Sonya Shooshan for judging parts of the
collections. We thank Anna Ripple for constructing PubMed
searches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bulkes</surname>
            ,
            <given-names>N. Z.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Tanner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2017</year>
          . “
          <article-title>Going to town”: Largescale norming and statistical analysis of 870 American English idioms</article-title>
          .
          <source>Behavior research methods 49</source>
          <volume>(2)</volume>
          :
          <fpage>772</fpage>
          -
          <lpage>783</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>S. J.;</given-names>
          </string-name>
          and Raney,
          <string-name>
            <surname>G. E.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>A 25-year replication of Katz et al.'s (1988) metaphor norms</article-title>
          .
          <source>Behavior research methods 48</source>
          <volume>(1)</volume>
          :
          <fpage>330</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kong</surname>
          </string-name>
          , S.-y.;
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Limtiaco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ; John, R. S.; Constant,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Guajardo-Cespedes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ; et al.
          <year>2018</year>
          .
          <article-title>Universal Sentence Encoder for English</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cork</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>B. N.</given-names>
          </string-name>
          ; and White,
          <string-name>
            <surname>R. G.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>The integration of idioms of distress into mental health assessments and interventions: a systematic review</article-title>
          .
          <source>Global Mental Health 6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>FARLEX.</surname>
          </string-name>
          <year>2020</year>
          (accessed November,
          <year>2020</year>
          ).
          <fpage>25</fpage>
          .
          <article-title>The Free Dictionary by FARLEX. Idioms and phrases</article-title>
          . URL https: //idioms.thefreedictionary.com/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Feldman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Automatic detection of idiomatic clauses</article-title>
          .
          <source>In International Conference on Intelligent Text Processing and Computational Linguistics</source>
          ,
          <fpage>435</fpage>
          -
          <lpage>446</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Neural Metaphor Detection in Context</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <fpage>607</fpage>
          -
          <lpage>613</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kaiyinzhou</surname>
          </string-name>
          .
          <source>2020 (accessed November</source>
          ,
          <year>2020</year>
          ).
          <article-title>BERT-NER.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          1988.
          <article-title>Norms for 204 literary and 260 nonliterary metaphors on 10 psychological dimensions</article-title>
          .
          <source>Metaphor and Symbol</source>
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>191</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Klebanov</surname>
            ,
            <given-names>B. B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shutova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lichtenstein</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Muresan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Feldman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          , D., eds.
          <source>2020. Proceedings of the Second Workshop on Figurative Language Processing</source>
          . Online:
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          URL https://www.aclweb.org/anthology/2020.figlang-
          <volume>1</volume>
          .0.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Guerin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>End-to-end sequential metaphor identification inspired by linguistic theories</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>3888</fpage>
          -
          <lpage>3898</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Mu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Yannakoudakis, H.; and
          <string-name>
            <surname>Shutova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Learning Outside the Box: Discourse-level Features Improve Metaphor Identification</article-title>
          .
          <source>In Proceedings of NAACL-HLT</source>
          ,
          <fpage>596</fpage>
          -
          <lpage>601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>NLM.</surname>
          </string-name>
          <year>2020</year>
          (accessed November,
          <year>2020</year>
          )
          <article-title>a. Medical Subject Headings</article-title>
          . URL https://www.nlm.nih.gov/mesh/meshhome.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>NLM.</surname>
          </string-name>
          <year>2020</year>
          (accessed November,
          <year>2020</year>
          )
          <article-title>b</article-title>
          .
          <article-title>MEDLINE and PubMed</article-title>
          . URL https://pubmed.ncbi.nlm.nih.gov/.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>NLM.</surname>
          </string-name>
          <year>2020</year>
          (accessed November,
          <year>2020</year>
          )c. MeSH on Demand. URL https://meshb.nlm.nih.gov/MeSHonDemand.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Weiss, R.;
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Brucher,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Duchesnay</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Scikitlearn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Pesaranghader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Matwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Sokolova,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Pesaranghader</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>deepBioWSD: effective deep neural word sense disambiguation of biomedical text data</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>26</volume>
          (
          <issue>5</issue>
          ):
          <fpage>438</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Rehurek</surname>
            , R.; and Sojka,
            <given-names>P.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Software framework for topic modelling with large corpora</article-title>
          .
          <source>In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Stowe</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Moeller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michaelis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and Palmer,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)</source>
          ,
          <fpage>362</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Svoboda</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2020</year>
          (accessed November,
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Doc2VecC N NRN N: U RL.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , C.; and
          <string-name>
            <surname>Dewsbury</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>On the problem and promise of metaphor use in science and science communication</article-title>
          .
          <source>Journal of microbiology &amp; biology education 19(1).</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Veale</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shutova</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Klebanov</surname>
            ,
            <given-names>B. B.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Metaphor: A computational perspective</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Yarowsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics</article-title>
          ,
          <volume>189</volume>
          -
          <fpage>196</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>