<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>In God we trust. All others must bring data. - W. Edwards Deming Using word embeddings to recognize idioms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jing Peng</string-name>
          <email>pengj@mail.montclair.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Feldman</string-name>
          <email>feldmana@mail.montclair.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Department of Linguistics Montclair State University USA</institution>
        </aff>
      </contrib-group>
      <fpage>96</fpage>
      <lpage>102</lpage>
      <abstract>
        <p>Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)'s, Sporleder and Li (2009)'s, and Li and Sporleder (2010b)'s methods and apply them to our data. We provide experimental results validating the proposed techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction
Natural language is filled with emotion and
implied intent, which are often not trivial to detect.
One specific challenge are idioms. Figurative
language draws off of prior references and is unique
to each culture and sometimes what we don’t say
is even more important than what we do. This,
naturally, presents a significant problem for many
Natural Language Processing (NLP) applications
as well as for big data analytics.</p>
      <p>
        Idioms are conventiolized expressions whose
figurative meanings cannot be derived from literal
meaning of the phrase. There is no single
agreedupon definition of idioms that covers all members
of this class
        <xref ref-type="bibr" rid="ref13 ref15 ref18 ref3 ref6">(Glucksberg, 1993; Cacciari, 1993;
Nunberg et al., 1994; Sag et al., 2002;
Villavicencio et al., 2004; Fellbaum et al., 2006)</xref>
        . At the
same time, idioms do not form a homogeneous
class that can be easily defined. Some examples
of idioms are I’ll eat my hat (I’m confident), Cut
it out (Stop talking/doing something), a blessing
in disguise (some bad luck or misfortune results
in something positive), kick the bucket (die), ring
a bell (sound familiar), keep your chin up (remain
cheerful), piece of cake (easy task), miss the boat
(miss out on something), (to be) on the ball (be
attentive/competent), put one’s foot in one’s mouth
(say something one regrets), rake someone over
the coals (to reprimand someone severely), under
the weather (sick), a hot potato (controversial
issue), an arm and a leg (expensive), at the drop of a
hat (without any hesitation), barking up the wrong
tree (looking in the wrong place), beat around the
bush (avoiding main topic).
      </p>
      <p>It turns out that expressions are often
ambiguous between an idiomatic and a literal
interpretation, as one can see in the examples below 1:
(A) After the last page was sent to the printer,
an editor would ring a bell, walk toward the door,
and holler ” Good night! ” (Literal) (B) His
name never fails to ring a bell among local voters.
Nearly 40 years ago, Carthan was elected mayor
of Tchula. . . (Idiomatic)</p>
      <p>(C) . . . that caused the reactor to literally blow
its top. About 50 tons of nuclear fuel
evaporated in the explosion. . . (Literal) (D) . . . He didn’t
pound the table, he didn’t blow his top. He always
kept his composure. (Idiomatic)</p>
      <p>1These examples are extracted from the Corpus of
Contemporary American English (COCA) (http://corpus.
byu.edu/coca/
(E) . . . coming out of the fourth turn, slid down
the track, hit the inside wall and then hit the
attenuator at the start of pit road. (Literal) (F) . . . job
training, research and more have hit a Republican
wall. (Idiomatic)</p>
      <p>Fazly et al. (2009)’s analysis of 60 idioms from
the British National Corpus (BNC) has shown
that close to half of these also have a clear
literal meaning; and of those with a literal
meaning, on average around 40% of their usages are
literal. Therefore, idioms present great challenges
for many Natural Language Processing (NLP)
applications. Most current translation systems rely
on large repositories of idioms. Unfortunately,
more frequently than not, MT systems are not able
to translate idiomatic expressions correctly.</p>
      <p>In this paper we describe an algorithm for
automatic classification of idiomatic and literal
expressions. Similarly to Peng et al. (2014), we treat
idioms as semantic outliers. Our assumption is that
the context word distribution for a literal
expression will be different from the distribution for an
idiomatic one. We capture the distribution in terms
of covariance matrix in vector space.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Previous Work</title>
      <p>
        Previous approaches to idiom detection can be
classified into two groups: 1) type-based
extraction, i.e., detecting idioms at the type level; 2)
token-based detection, i.e., detecting idioms in
context. Type-based extraction is based on the
idea that idiomatic expressions exhibit certain
linguistic properties such as non-compositionality
that can distinguish them from literal expressions
        <xref ref-type="bibr" rid="ref15 ref2">(Sag et al., 2002; Fazly et al., 2009)</xref>
        . While
many idioms do have these properties, many
idioms fall on the continuum from being
compositional to being partly unanalyzable to completely
non-compositional (Cook et al., 2007). Katz and
Giesbrecht (2006), Birke and Sarkar (2006),
Fazly et al. (2009),
        <xref ref-type="bibr" rid="ref8">Li and Sporleder (2009)</xref>
        , Li and
Sporleder (2010a), Sporleder and
        <xref ref-type="bibr" rid="ref8">Li (2009)</xref>
        , and
Li and Sporleder (2010b), among others, notice
that type-based approaches do not work on
expressions that can be interpreted idiomatically or
literally depending on the context and thus, an
approach that considers tokens in context is more
appropriate for idiom recognition.To address these
problems, Peng et al. (2014) investigate the bag of
words topic representation and incorporate an
additional hypothesis–contexts in which idioms
occur are more affective. Still, they treat idioms as
semantic outliers.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <p>
        We hypothesize that words in a given text
segment that are representatives of the local context
are likely to associate strongly with a literal
expression in the segment, in terms of projection (or
inner product) of word vectors onto the vector
representing the literal expression. We also
hypothesize that the context word distribution for a
literal expression in word vector space will be
different from the distribution for an idiomatic one.
This hypothesis also underlies the distributional
approach to meaning
        <xref ref-type="bibr" rid="ref1 ref4 ref7">(Firth, 1957; Katz and
Giesbrecht, 2006)</xref>
        .
3.1
      </p>
      <sec id="sec-3-1">
        <title>Projection Based On Local Context</title>
      </sec>
      <sec id="sec-3-2">
        <title>Representation</title>
        <p>
          The local context of a literal target verb-noun
construction (VNC) must be different from that of an
idiomatic one. We propose to exploit recent
advances in vector space representation to capture
the difference between local contexts
          <xref ref-type="bibr" rid="ref11 ref11 ref12 ref12">(Mikolov et
al., 2013a; Mikolov et al., 2013b)</xref>
          .
        </p>
        <p>
          A word can be represented by a vector of fixed
dimensionality q that best predicts its surrounding
words in a sentence or a document
          <xref ref-type="bibr" rid="ref11 ref11 ref12 ref12">(Mikolov et al.,
2013a; Mikolov et al., 2013b)</xref>
          . Given such a vector
representation, our first proposal is the following.
Let v and n be the vectors corresponding to the
verb and noun in a target verb-noun construction,
as in blow whistle, where v 2 &lt; q represents blow
and n 2 &lt; q represents whistle. Let vn = v + n 2
q
&lt; . Thus, vn is the word vector that represents
the composition of verb v and noun n, and in our
example, the composition of blow and whistle. As
indicated in Mikolov et al. (2013b), word vectors
obtained from deep learning neural net models
exhibit linguistic regularities, such as additive
compositionality. Therefore, vn is justified to
predict surrounding words of the composition of, say,
blow and whistle. Our hypothesis is that on
average, inner product blowwhistle · v, where vs are
context words in a literal usage, should be greater
than blowwhistle · v, where vs are context words in
an idiomatic usage.
        </p>
        <p>For a given vocabulary of m words, represented
by matrix V = [v1, v2, · · · , vm] 2 &lt; q⇥ m, we
calculate the projection of each word vi in the
vocabulary onto vn</p>
        <p>P = V t vn
MD 2 &lt;
where P 2 &lt; m, and t represents transpose. Here
we assume that vn is normalized to have unit
length. Thus, Pi = vit vn indicates how strongly
word vector vi is associated with vn. This
projection, or inner product, forms the basis for our
proposed technique.</p>
        <p>
          Let D = {d1, d2, · · · , dl} be a set of l text
segments (local contexts), each containing a target
VNC (i.e., vn). Instead of generating a term by
document matrix, where each term is tf-idf
(product of term frequency and inverse document
frequency), we compute a term by document matrix
m⇥ l, where each term in the matrix is
p · idf,
(2)
the product of the projection of a word onto a
target VNC and inverse document frequency. That
is, the term frequency (tf) of a word is replaced
by the projection (inner product) of the word onto
vn (1). Note that if segment dj does not contain
word vi, MD(i, j) = 0, which is similar to tf-idf
estimation. The motivation is that topical words
are more likely to be well predicted by a literal
VNC than by an idiomatic one. The assumption is
that a word vector is learned in such a way that it
best predicts its surrounding words in a sentence
or a document
          <xref ref-type="bibr" rid="ref11 ref11 ref12 ref12">(Mikolov et al., 2013a; Mikolov
et al., 2013b)</xref>
          . As a result, the words associated
with a literal target will have larger projection onto
a target vn. On the other hand, the projections
of words associated with an idiomatic target VNC
onto vn should have a smaller value.
        </p>
        <p>We also propose a variant of p · idf
representation. In this representation, each term is a product
of p and typical tf-idf. That is,
p · tf · idf.
(3)
3.2</p>
        <sec id="sec-3-2-1">
          <title>Local Context Distributions</title>
          <p>
            Our second hypothesis states that words in a local
context of a literal expression will have a
different distribution from those in the context of an
idiomatic one. We propose to capture local context
distributions in terms of scatter matrices in a space
spanned by word vectors
            <xref ref-type="bibr" rid="ref11 ref11 ref12 ref12">(Mikolov et al., 2013a;
Mikolov et al., 2013b)</xref>
            .
          </p>
          <p>Let d = (w1, w2 · · · , wk) 2 &lt; q⇥ k be a
segment (document) of k words, where wi 2 &lt; q are
(1)
(4)
where ⌃ represents the local context distribution
for a given target VNC.</p>
          <p>
            Given two distributions represented by two
scatter matrices ⌃ 1 and ⌃ 2, a number of measures
can be used to compute the distance between ⌃ 1
and ⌃ 2, such as Choernoff and Bhattacharyya
distances
            <xref ref-type="bibr" rid="ref5">(Fukunaga, 1990)</xref>
            . Both measures require
the knowledge of matrix determinant. In our case,
this can be problematic, because ⌃ (4) is most
likely to be singular, which would result in a
determinant to be zero.
          </p>
          <p>We propose to measure the difference between
⌃ 1 and ⌃ 2 using matrix norms. We have
experimented with the Frobenius norm and the spectral
norm. The Frobenius norm evaluates the
difference between ⌃ 1 and ⌃ 2 when they act on a
standard basis. The spectral norm, on the other hand,
evaluates the difference when they act on the
direction of maximal variance over the whole space.
4</p>
          <p>Experiments
We have carried out an empirical study evaluating
the performance of the proposed techniques. The
goal is to predict the idiomatic usage of VNCs.
4.1</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Methods</title>
          <p>For comparison, the following methods are
evaluated.</p>
          <p>1. tf · idf : compute term by document matrix
from training data with tf · idf weighting.
2. p · idf : compute term by document
matrix from training data with proposed p · idf
weighting (2).
3. p · tf · idf : compute term by document
matrix from training data with proposed p*tf-idf
weighting (3).
4. CoVARFro : proposed technique (4)
described in Section 3.2, the distance between
two matrices is computed using Frobenius
norm.
5. CoVARSp : proposed technique similar to
CoVARFro . However, the distance between
two matrices is determined using the spectral
norm.
6. Context+ (CTX+): supervised version of the
CONTEXT technique described in Fazly et
al. (2009) (see below).</p>
          <p>For methods from 1 to 3, we compute a latent
space from a term by document matrix obtained
from the training data that captures 80% variance.
To classify a test example, we compute cosine
similarity between the test example and the
training data in the latent space to make a decision.</p>
          <p>For methods 4 and 5, we compute literal and
idiomatic scatter matrices from training data (4). For
a test example, compute a scatter matrix according
to (4), and calculate the distance between the test
scatter matrix and training scatter matrices using
the Frobenius norm for method 4, and the spectral
norm for method 5.</p>
          <p>
            Method 6 corresponds to a supervised version
of CONTEXT described in Fazly et al. (2009).
CONTEXT is unsupervised because it does not
rely on manually annotated training data, rather
it uses knowledge about automatically acquired
canonical forms (C-forms). C-forms are fixed
forms corresponding to the syntactic patterns in
which the idiom normally occurs. Thus, the
goldstandard is “noisy” in CONTEXT. Here we
provide manually annotated training data. That is, the
gold-standard is “clean.” Therefore, CONTEXT+
is a supervised version of CONTEXT. We
implemented this approach from scratch since we had
no access to the code and the tools used in the
original article and applied this method to our dataset
and the performance results are reported in Table
2.
and labeled as L (Literal), I (Idioms), or Q
(Unknown). The list contains only those VNCs whose
frequency was greater than 20 and that occurred
at least in one of two idiom dictionaries
            <xref ref-type="bibr" rid="ref16">(Cowie
et al., 1983; Seaton and Macaulay, 2002)</xref>
            . The
dataset consists of 2,984 VNC tokens. For our
experiments we only use VNCs that are annotated as
I or L. We only experimented with idioms that can
have both literal and idiomatic interpretations. We
should mention that our approach can be applied
to any syntactic construction. We decided to use
VNCs only because this dataset was available and
for fair comparison – most work on idiom
recognition relies on this dataset.
          </p>
          <p>We use the original SGML annotation to extract
paragraphs from BNC. Each document contains
three paragraphs: a paragraph with a target VNC,
the preceding paragraph and following one.</p>
          <p>Since BNC did not contain enough examples,
we extracted additional ones from COCA, COHA
and GloWbE (http://corpus.byu.edu/). Two
human annotators labeled this new dataset for idioms
and literals. The inter-annotator agreement was
relatively low (Cohen’s kappa = .58); therefore,
we merged the results keeping only those entries
on which the two annotators agreed.
4.3</p>
          <p>
            Word Vectors
For our experiments reported here, we obtained
word vectors using the word2vec tool
            <xref ref-type="bibr" rid="ref11 ref11 ref12 ref12">(Mikolov
et al., 2013a; Mikolov et al., 2013b)</xref>
            and the
text8 corpus. The text8 corpus has more than
17 million words, which can be obtained from
mattmahoney.net/dc/text8.zip. The
resulting vocabulary has 71,290 words, each of
which is represented by a q = 200 dimension
vector. Thus, this 200 dimensional vector space
provides a basis for our experiments.
4.4
          </p>
          <p>Datasets
Table 1 describes the datasets we used to evaluate
the performance of the proposed technique. All
these verb-noun constructions are ambiguous
between literal and idiomatic interpretations. The
examples below (from the corpora we used) show
how these expressions can be used literally.
BlowWhistle: we can immediately turn towards a
high-pitched sound such as whistle being blown.
The ability to accurately locate a noise · · ·
LoseHead: This looks as eye-like to the predator as
the real eye and gives the prey a fifty-fifty chance
of losing its head. That was a very nice bull I shot,
but I lost his head. MakeScene: · · · in which the
many episodes of life were originally isolated and
there was no relationship between the parts, but
at last we must make a unified scene of our whole
life. TakeHeart: · · · cutting off one of the forelegs
at the shoulder so the heart can be taken out still
pumping and offered to the god on a plate.
BlowTop: Yellowstone has no large sources of water to
create the amount of steam to blow its top as in
previous eruptions.
5</p>
          <p>Results
Table 2 shows the average precision, recall and
accuracy of the competing methods on 12 datasets
over 20 runs. The best performance is in bold face.
The best model is identified by considering
precision, recall, and accuracy together for each model.
We calculate accuracy by adding true positives
(idioms) and true negatives (literals) and normalizing
the sum by the number of examples.</p>
          <p>Interestingly, the Frobenius norm outperforms
the spectral norm. One possible explanation is that
the spectral norm evaluates the difference when
two matrices act on the maximal variance
direction, while the Frobenius norm evaluates on a
standard basis. That is, Frobenius measures the
difference along all basis vectors. On the other hand,
the spectral norm evaluates changes in a particular
direction. When the difference is a result of all
basis directions, the Frobenius norm potentially
provides a better measurement. The projection
methods (p · idf and p · tf · idf ) outperform tf · idf
overall but not as pronounced as CoVAR.</p>
          <p>CT X + demonstrates a very competitive
performance. Since CT X + is a supervised version
of CONTEXT, we expect our proposed algorithms
to outperform Fazly’s CONTEXT method.
6</p>
          <p>
            Conclusions
In this paper we described an original algorithm
for automatic classification of idiomatic and
literal expressions. We also compared our
algorithm against several competing idiom detection
algorithms discussed in the literature. The
performance results show that our algorithm
generally outperforms Fazly et al. (2009)’s model. Note
that CT X + is a supervised version of Fazly et al.
(2009)’s, in that the training data here is the true
“gold-standard,” while in
            <xref ref-type="bibr" rid="ref2">(Fazly et al., 2009)</xref>
            is
noisy. A research direction is to incorporate
affect into our model. Idioms are typically used to
imply a certain evaluation or affective stance
toward the things they denote
            <xref ref-type="bibr" rid="ref13 ref15">(Nunberg et al., 1994;
Sag et al., 2002)</xref>
            . We usually do not use idioms to
describe neutral situations, such as buying tickets
or reading a book. Similarly to Peng et al. (2014)
we are exploring ways to incorporate affect into
our idiom detection algorithm. Even though our
method was tested on verb-noun constructions, it
is independent of syntactic structure and can be
applied to any idiom type. Unlike Fazly et al.
(2009)’s approach, for example, our algorithm is
language-independent and does not rely on POS
taggers and syntactic parsers, which are often
unavailable for resource-poor languages. Our next
step is to expand this method and use it for idiom
discovery. The move will imply an extra step –
extracting multiword expressions first and then
determining their status as literal or idiomatic.
          </p>
          <p>Lou Burnard, 2000. The British National Corpus Users
Reference Guide. Oxford University Computing
Services.</p>
          <p>Cristina Cacciari. 1993. The Place of Idioms in a
Literal and Metaphorical World. In Cristina
Cacciari and Patrizia Tabossi, editors, Idioms:
Processing, Structure, and Interpretation, pages 27–53.</p>
          <p>Lawrence Erlbaum Associates.</p>
          <p>Paul Cook, Afsaneh Fazly, and Suzanne Stevenson.
2007. Pulling their weight: Exploiting syntactic
forms for the automatic identification of idiomatic
expressions in context. In Proceedings of the ACL
07 Workshop on A Broader Perspective on
Multiword Expressions, pages 41–48.</p>
          <p>Paul Cook, Afsaneh Fazly, and Suzanne Stevenson.
2008. The VNC-Tokens Dataset. In Proceedings
of the LREC Workshop: Towards a Shared Task for
Multiword Expressions (MWE 2008), Marrakech,
Morocco, June.</p>
          <p>Anthony P. Cowie, Ronald Mackin, and Isabel R.
McCaig. 1983. Oxford Dictionary of Current Idiomatic
English, volume 2. Oxford University Press.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Julia</given-names>
            <surname>Birke</surname>
          </string-name>
          and
          <string-name>
            <given-names>Anoop</given-names>
            <surname>Sarkar</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>A clustering approach to the nearly unsupervised recognition of nonliteral language</article-title>
          .
          <source>In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL'06)</source>
          , pages
          <fpage>329</fpage>
          -
          <lpage>226</lpage>
          , Trento, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Afsaneh</given-names>
            <surname>Fazly</surname>
          </string-name>
          , Paul Cook, and
          <string-name>
            <given-names>Suzanne</given-names>
            <surname>Stevenson</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Unsupervised Type and Token Identification of Idiomatic Expressions</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>35</volume>
          (
          <issue>1</issue>
          ):
          <fpage>61</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          , Alexander Geyken, Axel Herold, Fabian Koerner, and
          <string-name>
            <given-names>Gerald</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Corpus-based Studies of German Idioms and Light Verbs</article-title>
          .
          <source>International Journal of Lexicography</source>
          ,
          <volume>19</volume>
          (
          <issue>4</issue>
          ):
          <fpage>349</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>John R Firth.</surname>
          </string-name>
          <year>1957</year>
          .
          <article-title>{A synopsis of linguistic theory,</article-title>
          <year>1930</year>
          -
          <fpage>1955</fpage>
          }.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukunaga</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Introduction to statistical pattern recognition</article-title>
          . Academic Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Sam</given-names>
            <surname>Glucksberg</surname>
          </string-name>
          .
          <year>1993</year>
          .
          <article-title>Idiom Meanings and Allusional Content</article-title>
          . In Cristina Cacciari and Patrizia Tabossi, editors,
          <source>Idioms: Processing, Structure, and Interpretation</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          . Lawrence Erlbaum Associates.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Graham</given-names>
            <surname>Katz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eugenie</given-names>
            <surname>Giesbrecht</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Automatic Identification of Non-compositional Multiword Expressions using Latent Semantic Analysis</article-title>
          .
          <source>In Proceedings of the ACL/COLING-06 Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties</source>
          , pages
          <fpage>12</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Linlin</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Caroline</given-names>
            <surname>Sporleder</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>A cohesion graph based approach for unsupervised recognition of literal and non-literal use of multiword expresssions</article-title>
          .
          <source>In Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (ACL-IJCNLP)</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>83</lpage>
          , Singapore.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Linlin</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Caroline</given-names>
            <surname>Sporleder</surname>
          </string-name>
          . 2010a.
          <article-title>Linguistic cues for distinguishing literal and non-literal usages</article-title>
          .
          <source>In COLING (Posters)</source>
          , pages
          <fpage>683</fpage>
          -
          <lpage>691</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Linlin</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Caroline</given-names>
            <surname>Sporleder</surname>
          </string-name>
          . 2010b.
          <article-title>Using gaussian mixture models to detect figurative language in context</article-title>
          .
          <source>In Proceedings of NAACL/HLT</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013a</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>In Proceedings of Workshop</source>
          at ICLR.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013b</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Proceedings of NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Nunberg</surname>
          </string-name>
          , Ivan A.
          <string-name>
            <surname>Sag</surname>
            , and
            <given-names>Thomas</given-names>
          </string-name>
          <string-name>
            <surname>Wasow</surname>
          </string-name>
          .
          <year>1994</year>
          . Idioms. Language,
          <volume>70</volume>
          (
          <issue>3</issue>
          ):
          <fpage>491</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Jing</given-names>
            <surname>Peng</surname>
          </string-name>
          , Anna Feldman, and
          <string-name>
            <given-names>Ekaterina</given-names>
            <surname>Vylomova</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Classifying idiomatic and literal expressions using topic models and intensity of emotions</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>2019</fpage>
          -
          <lpage>2027</lpage>
          , Doha, Qatar, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ivan A.</given-names>
            <surname>Sag</surname>
          </string-name>
          , Timothy Baldwin, Francis Bond, Ann Copestake, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Flickinger</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Multiword expressions: A Pain in the Neck for NLP</article-title>
          .
          <source>In Proceedings of the 3rd International Conference on Intelligence Text Processing and Computational Linguistics (CICLing</source>
          <year>2002</year>
          ), pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <string-name>
            <surname>Mexico</surname>
            <given-names>City</given-names>
          </string-name>
          , Mexico.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Maggie</given-names>
            <surname>Seaton</surname>
          </string-name>
          and Alison Macaulay, editors.
          <year>2002</year>
          .
          <article-title>Collins COBUILD Idioms Dictionary</article-title>
          . HarperCollins Publishers, second edition.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Caroline</given-names>
            <surname>Sporleder</surname>
          </string-name>
          and
          <string-name>
            <given-names>Linlin</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Unsupervised Recognition of Literal and Non-literal Use of Idiomatic Expressions</article-title>
          .
          <source>In EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>754</fpage>
          -
          <lpage>762</lpage>
          , Morristown, NJ, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Aline</surname>
            <given-names>Villavicencio</given-names>
          </string-name>
          , Ann Copestake, Benjamin Waldron, and
          <string-name>
            <given-names>Fabre</given-names>
            <surname>Lambeau</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Lexical Encoding of MWEs</article-title>
          .
          <source>In Proceedings of the Second ACL Workshop on Multiword Expressions: Integrating Processing</source>
          , pages
          <fpage>80</fpage>
          -
          <lpage>87</lpage>
          , Barcelona, Spain.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>