=Paper= {{Paper |id=Vol-2006/paper043 |storemode=property |title=Deep-learning the Ropes: Modeling Idiomaticity with Neural Networks |pdfUrl=https://ceur-ws.org/Vol-2006/paper043.pdf |volume=Vol-2006 |authors=Yuri Bizzoni,Marco S.G. Senaldi,Alessandro Lenci |dblpUrl=https://dblp.org/rec/conf/clic-it/BizzoniSL17 }} ==Deep-learning the Ropes: Modeling Idiomaticity with Neural Networks== https://ceur-ws.org/Vol-2006/paper043.pdf
        Deep-learning the Ropes: Modeling Idiomaticity with Neural Networks

                     Yuri Bizzoni1 , Marco S. G. Senaldi2 , Alessandro Lenci3
  University of Gothenburg - Sweden1 , Scuola Normale Superiore - Italy2 , University of Pisa - Italy3
yuri.bizzoni@gu.se1 , marco.senaldi@sns.it2 , alessandro.lenci@unipi.it3




                         Abstract                         a shark). On the other hand, although most id-
                                                          ioms originate as metaphors (Cruse, 1986), they
         English. In this work we explore the             have undergone a crystallization process in di-
         possibility of training a neural network         achrony, whereby they now appear as fixed and
         to classify and rank idiomatic expressions       non-compositional word combinations that be-
         under constraints of data scarcity. We           long to the wider class of Multiword Expressions
         discuss our results comparing them both          (MWEs) (Sag et al., 2002) and always exhibit lex-
         to other unsupervised models designed            ical and morphosyntactic rigidity to some extent
         to perform idiom detection and to simi-          (Cacciari and Glucksberg, 1991; Nunberg et al.,
         lar supervised classifiers trained to detect     1994). It is anyway crucial to underline that id-
         metaphoric bigrams.                              iomaticity itself is a multidimensional and gra-
         Italiano. In questo lavoro esploriamo            dient phenomenon (Nunberg et al., 1994; Wulff,
         la possibilità di addestrare una rete neu-      2010) with different idioms showing varying de-
         rale per classificare ed ordinare espres-        grees of semantic transparency, formal versatility,
         sioni idiomatiche in condizioni di scar-         proverbiality and affective valence.
         sità di dati.      I nostri risultati sono         The aim of this work is to explore the fuzzy
         discussi in comparazione sia con al-             boundary between idiomatic and metaphorical ex-
         tri algoritmi non supervisionati ideati          pression, by applying a method designed to dis-
         per l’identificazione di espressioni id-         criminate figurative vs. literal usages to the task of
         iomatiche sia con classificatori supervi-        distinguishing idiomatic from compositional ex-
         sionati dello stesso tipo addestrati per         pressions. Our starting point is the work of Biz-
         identificare bigrammi metaforici.                zoni et al. (2017). The authors managed to clas-
                                                          sify adjective-noun pairs where the same adjec-
                                                          tives were used both in a metaphorical and a lit-
    1    Introduction                                     eral sense (e.g. clean performance vs. clean floor)
    Figurative expressions like idioms (e.g. to learn     using a neural classifier trained on a composition
    the ropes ‘to learn how to do a job’, to cut the      of the words’ embeddings (Mikolov et al., 2013).
    mustard ‘to perform up to expectations’, etc.) and    Actually, the neural network was able to detect
    metaphors (e.g. clean performance, that lawyer is     the abstract/concrete semantic shift of nouns when
    a shark, etc.) are pervasive in language use. Im-     used with the same adjective in figurative and
    portant differences have been stressed between the    literal compositions respectively, basically treat-
    two types of expressions from a theoretical (Gibbs,   ing the noun as the “context” to discriminate the
    1993; Torre, 2014), neurocognitive (Bohrn et al.,     metaphoricity of the adjective. In our attempt, we
    2012) and corpus linguistic (Liu, 2003) prespec-      will use a relatively similar approach to classify
    tive. On the one hand, as stated by Lakoff and        idiomatic expressions by training a three-layered
    Johnson (2008), linguistic metaphors reflect an in-   neural network on a set of idiomatic and non-
    stantiation of conceptual metaphors, whereby ab-      idiomatic expressions and we’ll compare the per-
    stract concepts in a target domain (e.g. the ruth-    formance of the network when trained on differ-
    lessness of a lawyer) are described by a rather       ent syntactic patterns (Adjective-Noun and Verb-
    transparent mapping to concrete examples taken        Noun expressions, AN and VN henceforth).
    from a source domain (e.g. the aggressiveness of         Importantly, the abstract/concrete polarity the
network was able to learn in Bizzoni et al. (2017)       and Senaldi et al. (2016a) have combined insights
will not be available this time, since none of the id-   from both these approaches by observing that the
iom constituents will ever appear in its literal sense   vectors of VN and AN idioms are less similar to
inside the expressions, whatever their concrete-         the vectors of lexical variants of these expressions
ness may be. What we want to find out is whether         with respect to the vectors of compositional con-
the sole information captured by the distributional      structions. To the best of our knowledge, neu-
vector of a single expression is sufficient to learn     ral networks have been previously adopted to per-
its potential idiomaticity. Differently from Bizzoni     form MWE detection in general (Legrand and Col-
et al. (2017), for each idiom we collect a count-        lobert, 2016; Klyueva et al., 2017), but not idiom
based vector (Turney and Pantel, 2010) of the ex-        identification specifically. In Bizzoni et al. (2017),
pression as a whole, taken as a single token. We         pre-trained noun and adjective vector embeddings
compare this approach with a model trained on the        are fed to a single-layered neural network to dis-
composition of the individual words of an expres-        ambiguate metaphorical and literal AN combina-
sion, showing that the latter is less effective for      tions. Several combination algorithms are exper-
idioms than for metaphors. In both cases we will         imented with to concatenate adjective and noun
be operating on scarce training sets (26 AN and 90       embeddings. All in all, the method is shown to
VN constructions). Traditional ways to deal with         outperform the state of the art, presumably lever-
data scarcity in computational linguistics resort to     aging the abstractness degree of the noun as a clue
a wide number of different features to annotate the      to metaphoricity.
training set (see for example Tanguy et al. (2012))
or rely on artificial bootstrapping of the training      3     Dataset
set (He and Liu, 2017). In our case we test the
                                                         3.1    Target expressions extraction
performance of our classifier on scarce data with-
out bootstrapping the dataset and relying only on        The two idiom datasets we employ in the cur-
the information provided by the distributional se-       rent study come from Senaldi et al. (2016b) and
mantic space, showing that the distribution of an        Senaldi et al. (2016a). The first one is composed
expression in large corpora can provide enough in-       of 45 idiomatic and 45 non-idiomatic Italian V-
formation to learn idiomaticity from few examples        NP and V-PP constructions (e.g. tagliare la corda
with a satisfactory degree of accuracy.                  ‘to flee’ lit. ‘to cut the rope’ and leggere un libro
                                                         ‘to read a book’) that were selected from an Ital-
2   Related Work                                         ian idiom dictionary (Quartu, 1993) and extracted
                                                         from the itWaC corpus (Baroni et al., 2009), com-
Previous computational research has exploited dif-       posed of about 1,909M tokens. Their frequency
ferent methods to perform idiom type detection           spanned from 364 (ingannare il tempo ‘to while
(i.e., automatically telling apart potential idioms      away the time’) to 8294 (andare in giro ‘to get
like to get the sack from only literal combinations      about’). The latter comprises 13 idiomatic and 13
like to kill a man). For example Lin (1999) and          non-idiomatic AN constructions (e.g. punto de-
Fazly et al. (2009) label a given word combination       bole ‘weak point’ and nuova legge ‘new law’) that
as idiomatic if the Pointwise Mutual Information         were still extracted from itWaC and whose fre-
(PMI) (Church and Hanks, 1991) between its con-          quency varied from 21 (alte sfere ‘high places’,
stituents is higher than the PMIs between the com-       lit. ‘high spheres’) to 194 (punto debole).
ponents of a set of lexical variants of this combi-
nation obtained by replacing the component words         3.2    Building target vectors
of the original expressions with semantically re-        Count-based Distributional Semantic Models
lated words. Other studies have resorted to Distri-      (DSMs) (Turney and Pantel, 2010) allow for
butional Semantics (Lenci, 2008; Turney and Pan-         representing words and expressions as high-
tel, 2010) by measuring the cosine between the           dimensionality vectors, where the vector dimen-
vector of a given phrase and the single vectors          sions register the co-occurrence of the target words
of its components (Fazly and Stevenson, 2008) or         or expressions with some contextual features, e.g.
between the phrase vector and the sum or prod-           the content words that linearly precede and follow
uct vector of its components (Mitchell and Lapata,       the target element within a fixed contextual win-
2010; Krčmář et al., 2013). Senaldi et al. (2016b)    dow. We built two DSMs on itWaC, where our tar-
get AN and VN idioms and non-idioms were rep-                 reduction of data dimensionality is carried out by
resented as target vectors and co-occurrence statis-          the first layer of our model. The last layer applies
tics counted how many times each target construc-             a sigmoid activation function on the output in or-
tion occurred in the same sentence with each of               der to produce a binary judgment. While binary
the 30,000 top content words in the corpus. Differ-           scores are necessary to compute the model classi-
ently from Bizzoni et al. (2017), we did not opt for          fication accuracy and will be evaluated in terms of
prediction-based vector representations (Mikolov              F1, our model’s continuous scores can be retrieved
et al., 2013). Although some studies have brought             and will be used to perform an ordering task on
out that context-predicting models fare better than           the test set, that we will evaluate in terms of Inter-
count-based ones on a variety of semantic tasks               polated Average Precision (IAP) 2 and against the
(Baroni et al., 2014), including compositionality             human idiomaticity judgments with Spearman’s ρ.
modeling (Rimell et al., 2016), others (Blacoe and
Lapata, 2012; Cordeiro et al., 2016) have shown               5     Evaluation
them to perform comparably. Moreover, Levy
                                                              We trained our model on the 30,000 dimensional
et al. (2015) highlight that much of the superior-
                                                              distributional vectors of VN and AN expressions
ity in performance exhibited by word embeddings
                                                              as well as on the composition of their individual
is actually due to hyperparameter optimizations,
                                                              words’ vectors. We tried with different semantic
which, if applied to traditional models as well, can
                                                              spaces as well. When trained on PPMI- (Church
bring to equivalent outcomes. Therefore, we felt
                                                              and Hanks, 1991) and SVD-transformed (Deer-
confident in resorting to count-based vectors as an
                                                              wester et al., 1990) vectors of 150, 200, 250 and
equally reliable representation for the task at hand.
                                                              300 dimensions, our models performed compara-
3.3    Gold standard idiomaticity judgments                   bly or even worse; so, results for these cases won’t
                                                              be presented here. Details of both classification
In Senaldi et al. (2016b) and Senaldi et al. (2016a),         and ordering task are shown in Table 1.
we collected gold standard idiomaticity judgments
for our target AN and VN constructions. 9 Lin-                5.1    Verb-Noun
guistics students were presented with a list of our
                                                              We ran our model on the VN dataset, composed of
26 AN constructions and were asked to evaluate
                                                              90 elements, 45 idioms and 45 non-idiomatic ex-
how idiomatic each expression was from 1 to 7,
                                                              pressions. This is the larger of the two datasets.
with 1 standing for ‘totally compositional’ and 7
                                                              We trained our model both on 30 and 40 elements
standing for ‘totally idiomatic’. Inter-coder agree-
                                                              for 20 epochs and tested on the remaining 60 and
ment, measured with Krippendorff’s α (Krippen-
                                                              50 elements respectively, reaching a maximum
dorff, 2012), was equal to 0.76. The same pro-
                                                              IAP of 0.87 and Spearman’s ρ of 0.76. In general
cedure was repeated for our 90 VN constructions,
                                                              we found the model’s performance, both in accu-
but in this case the inital list was split into 3 sub-
                                                              racy and in correlation, comparable to the results
lists of 30 expressions, each one to be rated by 3
                                                              reported in Senaldi et al. (2016b), who reached
subjects. Krippendorff’s α was 0.83 for the first
                                                              a maximum IAP of 0.91 and a maximum Spear-
sublist and 0.75 for the other two.
                                                              man’s ρ of -0.67.
4     Classifier                                              5.2    Adjective-Noun
We built a neural network composed of three                   We ran our model on the AN dataset, composed of
“dense” or fully connected layers1 of dimensional-            26 elements, 13 idioms and 13 non-idiomatic ex-
ity 12, 8 and 1 respectively. Our network takes in            pressions. We empirically found that our model
input a single vector at a time, which can be a word          was able to perform some generalization on the
embedding, a count-based distributional vector or             data when the training set contained at least 14
a composition of several word vectors. For the                elements, evenly balanced between positive and
core part of our experiment we used as input sin-             negative examples. We trained our model on 16
gle distributional vectors of two-word expressions.           elements for 30 epochs and tested on the remain-
Due to our input’s magnitude, the most important              ing 10 elements. While accuracy’s exact value can
    1                                                             2
      We used Keras, a library running on TensorFlow (Abadi         Following Fazly et al. (2009), IAP was computed at re-
et al., 2016).                                                call levels of 20%, 50% and 80%.
                    Vector           Training    Test            IAP    rho         F1
                    VN               15+15       30+30           0.82   0.50***     0.8
                    VN               20+20       15+15           0.82   0.76***     0.87
                    Concat (VN)      15+15       14+14            0.7   0.47*       0.69
                    AN               8+8         6+4               1?   0.93***     0.9
                    VN+AN            23+23       14+14(VN)        0.9   0.76***     0.82
                    VN+AN            23+23       18+20(joint)     0.8   0.64***     0.76
                    VN+AN            23+23       5+5(AN)         0.57   -0.31       0.58

Table 1: Interpolated Average Precision, Spearman’s correlation with the speaker judgments and F-
measure for Vector-Noun training (VN), Adjective-Noun training (AN), joint training and training
through vector concatenation (** = p < .01, *** = p < .001). Training and test set are expressed as
the sum of positive and negative examples.


undergo some fluctuations when a model is trained       5.4    Vector composition
on very small sets, we always registered accura-
cies higher than 80%, with 4 out of 5 idioms cor-       In addition to using the vector of an expression as
rectly labeled in every trial. We reached an IAP of     a whole, we tried to feed our model with the con-
1.0 and a ρ of 0.93, although it is important to keep   catenation of the vectors of the single words in an
in mind that such scores are computed on a very         expression, as in Bizzoni et al. (2017). For exam-
restricted test set. Senaldi et al. (2016b) reached     ple, instead of using the 30,000 dimensional vec-
a maximum IAP of 0.85 and a maximum ρ of -              tor of the expression cambiare musica, we used
0.68. When the training size was under the critical     the 60,000 dimensional vector resulting from the
threshold, accuracy dropped significantly. With         concatenation of cambiare and musica. We ran
training sets of 10 or 12 elements, our model nat-      this experiment only on the VN dataset, being the
urally went in overfitting, quickly reaching 100%       largest and the one that yielded the best results
accuracy on the training set and failing to correctly   in the previous settings. We used 30 elements in
classify unforeseen expressions. In these cases a       training and 26 in testing and trained our model
partial learning was still visible in the ordering      for 80 epochs overall. Predictably enough, vec-
task, where most idioms, even if labeled incor-         tor composition resulted in the worst performance,
rectly, received higher scores than non-idioms.         differently from what happened with metaphors
                                                        (Bizzoni et al., 2017); nonetheless, the results are
                                                        not completely random: with an F1 of 69%, the
5.3   Joint training                                    model seems able to learn idiomaticity to a lower,
                                                        but not null, degree; these findings would be in
We also tried to train our model on both datasets       line with the claim that the meaning of the sub-
together, to check to what extent it would be           parts of several idioms, while less important than
able to recognize the same underlying seman-            in metaphors, is not completely obliterated (Mc-
tic phenomenon through different syntactic con-         Glone et al., 1994).
structions. We used two different approaches for
this experiment. Training our model first on one
dataset, e.g. the AN pairs, and then on the other re-   6     Error Analysis
quired more epochs overall (more than 100) to sta-
bilize and resulted in a poorer performance (66%        Two frequent false positives are tagliare il tra-
F-measure on both test sets). Training our model        guardo and abbassare la guardia. While we la-
on a mixed dataset containing the elements of both      beled them as non-idioms in our dataset, since
training sets, our model employed only 12 epochs        they’re rather compositional, nonetheless they can
to reach an F-measure of 76% on the mixed train-        be very often used figuratively and that’s probably
ing set. Anyway, we also noticed that VN expres-        why our algorithms identified them as idioms. A
sions were learned better than AN expressions. In       frequent false negative was vedere la luce, which
short, our model was able to generalize over the        probably occurs more often in its literal sense in
two datasets, but this involved a loss in accuracy.     the corpus we used.
7   Discussion and Conclusions                            of context-counting vs. context-predicting se-
                                                          mantic vectors. In Proceedings of the 52nd An-
It seems that the distribution of idiomatic and com-      nual Meeting of the Association for Computa-
positional expressions in large corpora can suf-          tional Linguistics, pages 238–247.
fice for a supervised classifier to learn the dif-
ference between the two linguistic elements from        Bizzoni, Y., Chatzikyriakidis, S., and Ghanimi-
small training sets and with a good level of accu-        fard, M. (2017). “Deep” learning: Detecting
racy. Unlike with metaphors (Bizzoni et al., 2017),       metaphoricity in adjective-noun pairs. In Pro-
feeding the classifier with a composition of the in-      ceedings of the Workshop on Stylistic Variation,
dividual words’ vectors of such expressions per-          pages 43–52.
forms quite scarcely and can be used to detect only     Blacoe, W. and Lapata, M. (2012). A compari-
some idioms. This takes us back to the core dif-          son of vector-based representations for seman-
ference that while metaphors are more composi-            tic composition. In Proceedings of the 2012
tional and preserve a transparent source domain to        joint conference on empirical methods in natu-
target domain mapping, idioms are by and large            ral language processing and computational nat-
non-compositional. Since our classifiers rely only        ural language learning, pages 546–556. Asso-
on contextual features, their ability in classifica-      ciation for Computational Linguistics.
tion must stem from a difference in distribution be-    Bohrn, I. C., Altmann, U., and Jacobs, A. M.
tween idioms and non-idioms. A possible expla-            (2012). Looking at the brains behind figu-
nation is that while the literal expressions we se-       rative language: A quantitative meta-analysis
lected, like vedere un film or ascoltare un discorso,     of neuroimaging studies on metaphor, id-
tend to be used with animated subjects and thus to        iom, and irony processing. Neuropsychologia,
appear in more concrete contexts, most of our id-         50(11):2669–2683.
ioms (e.g. cadere dal cielo or lasciare il segno)
                                                        Cacciari, C. and Glucksberg, S. (1991). Under-
allow for varying degrees of animacy or concrete-
                                                          standing idiomatic expressions: The contribu-
ness of the subject, and thus their context can eas-
                                                          tion of word meanings. Advances in Psychol-
ily get more diverse. At the same time, the drop in
                                                          ogy, 77:217–240.
performance we observe in the joint models seems
to indicate that the different parts of speech com-     Church, K. W. and Hanks, P. (1991). Word asso-
posing our elements entail a significant contextual       ciation norms, mutual information, and lexicog-
difference between the two groups, which intro-           raphy. Computational Linguistics, 16(1):22–29.
duces a considerable amount of uncertainty in our       Cordeiro, S., Ramisch, C., Idiart, M., and Villavi-
model. It is also possible that other contextual el-      cencio, A. (2016). Predicting the composition-
ements we did not consider have played a role in          ality of nominal compounds: Giving word em-
the learning process of our models. We intend to          beddings a hard time. In Proceedings of the 54th
deepen this aspect in future works.                       Annual Meeting of the Association for Com-
                                                          putational Linguistics, volume 1, pages 1986–
References                                                1997.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E.,         Cruse, D. A. (1986). Lexical semantics. Cam-
  Chen, Z., Citro, C., Corrado, G. S., Davis, A.,         bridge University Press.
  Dean, J., Devin, M., et al. (2016). Tensor-           Deerwester, S., Dumais, S. T., Furnas, G. W., Lan-
  flow: Large-scale machine learning on hetero-           dauer, T. K., and Harshman, R. (1990). In-
  geneous distributed systems. arXiv preprint             dexing by latent semantic analysis. Journal of
  arXiv:1603.04467.                                       the American society for information science,
Baroni, M., Bernardini, S., Ferraresi, A., and            41(6):391.
  Zanchetta, E. (2009). The WaCky wide web: a           Fazly, A., Cook, P., and Stevenson, S. (2009). Un-
  collection of very large linguistically processed       supervised type and token identification of id-
  web-crawled corpora. Language Resources and             iomatic expressions. Computational Linguis-
  Evaluation, 43(3):209–226.                              tics, 1(35):61–103.
Baroni, M., Dinu, G., and Kruszewski, G. (2014).        Fazly, A. and Stevenson, S. (2008). A distribu-
  Don’t count, predict! a systematic comparison           tional account of the semantics of multiword
  expressions. Italian Journal of Linguistics,       Mikolov, T., Sutskever, I., Chen, K., Corrado,
  1(20):157–179.                                      G. S., and Dean, J. (2013). Distributed repre-
Gibbs, R. W. (1993). Why idioms are not dead          sentations of words and phrases and their com-
  metaphors. Idioms: Processing, structure, and       positionality. In Proceedings of the 26tth In-
  interpretation, pages 57–77.                        ternational Conference on Neural Information
                                                      Processing System, pages 3111–3119.
He, X. and Liu, Y. (2017). Not enough data?: Joint
  inferring multiple diffusion networks via net-     Mitchell, J. and Lapata, M. (2010). Composition
  work generation priors. In Proceedings of the       in Distributional Models of Semantics. Cogni-
  Tenth ACM International Conference on Web           tive Science, 34(8):1388–1429.
  Search and Data Mining, pages 465–474.             Nunberg, G., Sag, I., and Wasow, T. (1994). Id-
                                                       ioms. Language, 70(3):491–538.
Klyueva, N., Doucet, A., and Straka, M. (2017).
  Neural networks for multi-word expression de-      Quartu, M. B. (1993). Dizionario dei modi di dire
  tection. Proceedings of the 13th Workshop on         della lingua italiana. RCS Libri.
  Multiword Expressions, pages 60–65.                Rimell, L., Maillard, J., Polajnar, T., and Clark, S.
Krippendorff, K. (2012). Content analysis: An in-      (2016). RELPRON: A relative clause evaluation
  troduction to its methodology. Sage.                 data set for compositional distributional seman-
                                                       tics. Computational Linguistics, 42(4):661–
Krčmář, L., Ježek, K., and Pecina, P. (2013).      701.
  Determining Compositionality of Expresssions
  Using Various Word Space Models and Mea-           Sag, I. A., Baldwin, T., Bond, F., Copestake, A.,
  sures. In Proceedings of the Workshop on Con-        and Flickinger, D. (2002). Multiword Expres-
  tinuous Vector Space Models and their Compo-         sions: A Pain in the Neck for NLP. In Pro-
  sitionality, pages 64–73.                            ceedings of the 3rd International Conference on
                                                       Intelligent Text Processing and Computational
Lakoff, G. and Johnson, M. (2008). Metaphors we        Linguistics, pages 1–15.
  live by. University of Chicago press.
                                                     Senaldi, M. S. G., Lebani, G. E., and Lenci,
Legrand, J. and Collobert, R. (2016). Phrase rep-      A. (2016a). Determining the compositional-
  resentations for multiword expressions. In Pro-      ity of noun-adjective pairs with lexical variants
  ceedings of the 12th Workshop on Multiword           and distributional semantics. In Proceedings of
  Expressions, pages 67–71.                            the Third Italian Conference on Computational
Lenci, A. (2008). Distributional semantics in lin-     Linguistics (CLiC-it 2016), pages 268–273.
  guistic and cognitive research. Italian Journal    Senaldi, M. S. G., Lebani, G. E., and Lenci, A.
  of Linguistics, 20(1):1–31.                          (2016b). Lexical variability and composition-
Levy, O., Goldberg, Y., and Dagan, I. (2015).          ality: Investigating idiomaticity with distribu-
  Improving distributional similarity with lessons     tional semantic models. In Proceedings of the
  learned from word embeddings. Transactions           12th Workshop on Multiword Expression, pages
  of the Association for Computational Linguis-        21–31.
  tics, 3:211–225.                                   Tanguy, L., Sajous, F., Calderone, B., and
Lin, D. (1999). Automatic identification of non-       Hathout, N. (2012). Authorship attribution: Us-
  compositional phrases. In Proceedings of the         ing rich linguistic features when training data is
  37th Annual Meeting of the Association for           scarce. In PAN Lab at CLEF.
  Computational Linguistics, pages 317–324.          Torre, E. (2014). The emergent patterns of Ital-
Liu, D. (2003). The most frequently used spoken        ian idioms: A dynamic-systems approach. PhD
  american english idioms: A corpus analysis and       thesis, Lancaster University.
  its implications. Tesol Quarterly, 37(4):671–      Turney, P. D. and Pantel, P. (2010). From Fre-
  700.                                                 quency to Meaning: Vector Space Models of
                                                       Semantics. Journal of Artificial Intelligence Re-
McGlone, M. S., Glucksberg, S., and Cacciari, C.
                                                       search, 37:141–188.
 (1994). Semantic productivity and idiom com-
 prehension. Discourse Processes, 17(2):167–         Wulff, S. (2010). Rethinking Idiomaticity: A
 190.                                                 Usage-based Approach. A&C Black.