=Paper= {{Paper |id=Vol-2411/paper12 |storemode=property |title=Affect Enriched Word Embeddings for News Information Retrieval |pdfUrl=https://ceur-ws.org/Vol-2411/paper12.pdf |volume=Vol-2411 |authors=Tommaso Teofili,Niyati Chhaya |dblpUrl=https://dblp.org/rec/conf/sigir/TeofiliC19 }} ==Affect Enriched Word Embeddings for News Information Retrieval== https://ceur-ws.org/Vol-2411/paper12.pdf
Affect Enriched Word Embeddings for News Information
                      Retrieval

                                 Tommaso Teofili                     Niyati Chhaya
                                       Adobe                            Adobe
                                teofili@adobe.com                 nchhaya@adobe.com



                                                                 GloVe [PSM14] are among the most widely adopted
                                                                 word embedding models because of their effectiveness
                        Abstract                                 in capturing word semantics. One of the advantage
                                                                 of using word embeddings in information retrieval is
    Distributed representations of words have
                                                                 that they are more effective in capturing query intent
    shown to be useful to improve the effective-
                                                                 and document topics than other local vector repre-
    ness of IR systems in many sub-tasks like
                                                                 sentations traditionally used in IR (like TF-iDF vec-
    query expansion, retrieval and ranking. Al-
                                                                 tors). Text tokens in IR don’t always overlap with
    gorithms like word2vec, GloVe and others are
                                                                 exact words; tokens often coincide with subwords (e.g.
    also key factors in many improvements in dif-
                                                                 generated by stemmers), ngrams, shingles, etc. There-
    ferent NLP tasks. One common issue with
                                                                 fore word embeddings are also often referred to as term
    such embedding models is that words like
                                                                 embeddings in the context of IR. Term embeddings
    happy and sad appear in similar contexts and
                                                                 can be used to rank queries and documents; in such
    hence are wrongly clustered close in the em-
                                                                 context a dense vector representation for the query is
    bedding space. In this paper we leverage
                                                                 derived and scored against corresponding dense vec-
    Aff2Vec, a set of word embeddings models
                                                                 tor representations for documents in the IR system.
    which include affect information, in order to
                                                                 Query and document vector representations are gen-
    better capture the affect aspect in news text
                                                                 erated by aggregating term or word embeddings asso-
    to achieve better results in information re-
                                                                 ciated with their respective text terms from the query
    trieval tasks, also such embeddings are less
                                                                 and document texts. Word embeddings can also be
    hit by the synonym/antonym issue. We evalu-
                                                                 used in the query expansion task. Term embeddings
    ate their effectiveness on two IR related tasks
                                                                 are used in such contexts to find good expansion can-
    (query expansion and ranking) over the New
                                                                 didates from a global vocabulary of terms (by com-
    York Times dataset (TREC-core ’17) com-
                                                                 paring word vectors), such enriched queries are used
    paring them against other word embeddings
                                                                 to retrieve the documents. Most of recent good per-
    based models and classic ranking models.
                                                                 forming word embedding models are generated in an
                                                                 unsupervised manner by learning word representations
1    Introduction                                                looking at their surrounding contexts. However one
Distributed representations of words, also known as              issue with word embeddings is that words with about
word embeddings, have played a key role in various               opposite meanings can have very similar contexts, so
downstream NLP tasks. Such vector representations                that, for example, ‘happy’ and ‘sad’ may lie closer than
place vectors of semantically similar words close in the         they should be in the embedding space, see related ef-
embedding space, allowing for efficient and effective es-        forts in [CLC+ 15] and [NWV16]. In order to mitigate
timation of word similarity. Word2vec [MCCD13] and               this semantic understanding issue, we propose to use
                                                                 affect-enriched word embedding models (also known
Copyright c 2019 for the individual papers by the papers’ au-
                                                                 as Aff2Vec[KCC18]) for IR tasks, as they outperform
thors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.         baseline word embedding models on word-similarity
In: A. Aker, D. Albakour, A. Barrón-Cedeño, S. Dori-Hacohen,   task and sentiment analysis. Our contribution is the
M. Martinez, J. Stray, S. Tippmann (eds.): Proceedings of the    usage of Aff2Vec models as term embeddings for in-
NewsIR’19 Workshop at SIGIR, Paris, France, 25-July-2019,        formation retrieval in the news domain. Beyond the
published at http://ceur-ws.org
synonym-antonym issue we except Aff2Vec models to           queries and related relevant and non-relevant results.
work well for news IR because of their capability of        In [FFJ+ 16], word vectors in combination with bilin-
better capturing writers’ affective attitude towards ar-    gual dictionaries are used to extract synonyms so that
ticles’ text (see section 1.1). We present experiments      they can be used to expand queries. Documents are
against standard IR datasets, empirically establishing      represented as bags of vectors generated as mixture of
the utility of the proposed approach.                       distributions in [RPMG16]. Efforts like [CLC+ 15] and
                                                            [NWV16] are related to our work in the fact that they
1.1   Affect scores in news datasets                        can be incorporated in usage of term embeddings in
                                                            IR tasks. For our ranking scenario, [RGMJ16] is rele-
In order to assess the potential applicability of Aff2Vec
                                                            vant as documents and queries are represented by mix-
embeddings in the context of information retrieval, we
                                                            tures of Gaussians over word embeddings, each of the
run preliminary evaluation of the amount of formality,
                                                            Gaussians centered around centroid learned via e.g. a
politeness and frustration contained in common text
                                                            k-means algorithm. The likelihood of a query with re-
collections used in information retrieval experiments.
                                                            spect to a document is measured by the distance of
For this purpose we leverage the affect scoring algo-
                                                            the query vector from each centroid that document
rithm that is used for building Aff2Vec embeddings.
                                                            belongs to, using centroid similarity or average inter-
We extract mean affect scores for formality, polite-
                                                            similarity.
ness and frustration on each dataset. Such an evalua-
tion involves two collection of news: the datasets from     2.1   Aff2Vec:    Affect-enriched             embed-
TREC core 2018 track, Washington Post articles, and               dings [KCC18]
TREC core 2017 track, New York Times articles. Also
we extract affect scores from the ClueWeb09 dataset         Word representations historically have only captured
[CHYZ09], containing text of HTML pages crawled             semantic or contextual information, but ignored other
from the Web, and the CACM dataset, a collection of         subtle word relationships such as difference in sen-
titles and abstracts from the CACM journal. Results         timent. Affect refers to the feeling of an emotion
are reported in table 1.                                    or a feeling [Pic97]. Words such as ‘glad’, ‘awe-
                                                            some’, ‘happy’, ‘disgust’ or ‘sad’ can be referred to
               Dataset affect scoring                       as affective words. Aff2Vec introduces a post-training
 Dataset       formality politeness       frustration       approach that introduces ‘emotion’-sensitivity or af-
 NYT           0.7087      0.6291         0.6248            fect information in word embeddings. Aff2Vec lever-
 WP            0.7788      0.7456         0.6510            ages existing affect lexicon such as Warriner’s lexi-
 CACM          0.3619      0.1229         0.3511            con [WKB13] which has a list of over 14,000 English
 ClueWeb09     0.4319      0.2708         0.6216            words tagged with valence (V), arousal (A), and dom-
                                                            inance (D) scores. The affect-enriched embeddings in-
Table 1: Mean affect scores on some common IR
                                                            troduced by Aff2Vec are either built on top of vanilla
datasets
                                                            word embeddings i.e. word2vec, GloVe, or paragram
   The scores for formality, politeness and frustration     or introduced along with counterfitting [MOT+ 16] or
extracted on the Ney Work Times and Washington              retrofitting [FDJ+ 15]. In this work, we leverage these
Post articles are generally higher than the ones ex-        enriched vector spaces too in order to evaluate their
tracted for CACM and ClueWeb09 datasets, except             performance for standard IR tasks, namely - query ex-
for the frustration score reported for ClueWeb which is     pansion and ranking.
very close to the frustration score extracted for NYT
articles. These results suggest that Aff2Vec embed-         3     Word embeddings for query expan-
dings should work well on the news domain as they are             sion
built to appropriately capture such affective aspects of
information.                                                We leverage word embeddings to perform query ex-
                                                            pansion in a way similar to [RPMG16]. For each query
                                                            term q contained in the query text Q, the word em-
2     Related work
                                                            bedding model is used to fetch wq nearest neighbour
Dict2vec[TGH17b] builds word embeddings using on-           we in the embedding space, so that cos(we , wq ) > t,
line dictionaries and optimizing an objective function      where t is the minimum allowed cosine similarity be-
where each word embedding is built via positive sam-        tween two embeddings to consider the word e associ-
pling of strongly correlated words and negative sam-        ated to the vector we a good expansion for the word
pling of weak correlated ones [TGH17a]. In [ZC17],          q associated with the query term vector wq . Upon
embeddings are optimized using different objective          successful retrieval of an expansion of at least a term
functions in a supervised manner based on lists of          q in a query, a new ”alternative” query A where q is
substituted by e is created. Consequently the query       5.1   Results
to be executed on the IR system becomes a boolean
query of the form Q OR A. If more than one query          Table 2 shows performance for ranking experiments on
term has a valid expansion fetched from the embedding     the NYT dataset using different embeddings. We ob-
model, all possible combinations of query terms and       serve that usage of term embeddings doesn’t give ben-
relative expansion terms is generated. For example,       efits in many cases, classic BM25 and query likelihood
given a query ”recent research about AI”, if term em-     retrieval models provide better NDCG than almost all
beddings output that nearest(recent) = latest with        the models except the affect enriched ones. A GloVe
cos(recent, latest) = 0.8 bigger than the threshold       retrofitted affect enriched embedding model is the top
0.75, the output query will be composed by two op-        performing one for NDCG measure. On the other hand
tional clauses: ”recent research about AI” OR ”latest     none of the term embedding ranking could outperform
research about AI”.                                       BM25 on the mean average precision measure.
                                                                    Ranking experiments on NYT
4   Word embeddings for ranking                            Model                        NDCG     MAP
                                                           BM25                         0.4334 0.1977
In order to use word embedding models for ranking we       QL                           0.4325  0.1913
chose to use the averaging word embeddings approach                  NON ENRICHED MODELS
(also known as AWE ). Each document and query vec-         GloVe                        0.4292  0.1883
tor is calculated by averaging the word vectors related    GloVe.42B.300d               0.4003  0.1690
to each word in documents and query texts. The query       GloVe.6B.100d                0.4291  0.1911
/ document score is measured by the cosine similarity      GloVe.6B.200d                0.4314  0.1964
between their respective averaged vectors, as in other     GloVe.6B.300d                0.4316  0.1946
research works like [MNCC16, RGMJ16, RMLH17,               GloVe.6B.50d                 0.4078  0.1760
GSS17]. In our experiments we used each word TF-           GloVe-Twitter-100            0.4212  0.1849
iDF vector to normalize (divide) the averaged word         GloVe-Twitter-200            0.4242  0.1873
embedding for query and document vectors. We ob-           GloVe-Twitter-50             0.4128  0.1798
served that using this technique to smooth the sum         GloVe-Twitter-25             0.3541  0.1377
of the word vectors instead of just dividing it by the     w2v-GoogleNews-300           0.4294  0.1922
number of its words (mean) resulted in better rank-        dict2vec-dim100              0.4101  0.1885
ing results. This seems in line with the findings from     dict2vec-dim200              0.4155  0.1891
[SLMJ15] which indicate that cosine similarity may         dict2vec-dim300              0.4151  0.1899
be polluted by term frequencies when comparing word                    ENRICHED MODELS
embeddings.                                                counterfit-GloVe             0.3980  0.1720
                                                           GloVe-retrofitted            0.4216  0.1861
5   Experiments                                            paragram-counterfit          0.3840  0.1580
                                                           paragram-74627               0.4337  0.1937
We compare the usage of Aff2Vec word embeddings            paragram-retrofitted         0.3969  0.1703
in the ranking and query expansion task against both       paragram-retrofitted-74627 0.3963    0.1698
vanilla embedding models (like word2vec and GloVe)         w2v-76427                    0.4328  0.1969
and enriched models like Dict2vec models [TGH17a].         w2v-counterfit-header        0.3972  0.1721
We also present experiments with variants in Aff2Vec:      w2v-retrofitted              0.4341  0.1914
counterfitted and retrofitted models with enriched af-            AFFECT ENRICHED MODELS
fect information. All the models used in our exper-        counterfit-GloVe-affect      0.4311  0.1753
iments are pretrained. To setup our evaluations we         GloVe-affect                 0.4594  0.1926
use two open source toolkits Anserini [YFL17] and          GloVe-retrofitted-affect-555 0.4693  0.1948
Lucene4IR [AMH+ 17], both based on Apache Lucene           paragram-affect              0.4619  0.1969
[BMII12]. We run ranking and query expansion ex-           paragram-counterfit-affect   0.4339  0.1788
periments on the New York Times articles from the          w2v-affect                   0.4592  0.1926
TREC Core ’17 track [AHK+ 17] since it’s a relevant        w2v-counterfit-affect        0.4309  0.1766
dataset for the news domain. For the sake of generaliz-    w2v-retrofitted-affect       0.4601  0.1911
ability, we also conduct the same evaluations over the
CACM dataset [Fox83], a ”classic” dataset for IR. For      Table 2: Ranking experiments on TREC Core ’17
the case of query expansion we include evaluation us-
ing WordNet [Mil95] in order to provide an expansion        Table 3 shows performance for query expansion ex-
baseline not based on word embeddings.                    periments on the NYT dataset using different embed-
dings. We observe that classic BM25 and query like-      the best results, with affect enriched paragram em-
lihood retrieval models provide better NDCG than al-     beddings reporting both best NDCG and MAP, 0.02
most all the models except some of the affect enriched   better than non affect enriched paragram embeddings
ones. This is in line with what we observed for the      results in both NDCG and MAP.
ranking task on the same dataset. A GloVe retrofitted
affect enriched embedding model is the top performing
                                                                 Ranking experiments on CACM
one for both NDCG and MAP.
                                                          Model                      NDCG     MAP
      Query expansion experiments on NYT                  BM25                       0.3805  0.1947
 Model                        MAP      NDCG               QL                         0.3621  0.2056
 BM25                         0.1977   0.4334                      NON ENRICHED MODELS
 QL                           0.1913   0.4325             GloVe.42B.300d             0.3638  0.2007
           NON ENRICHED MODELS                            GloVe.6B.100d              0.4440  0.2722
 GloVe                        0.1951   0.4337             GloVe.6B.200d              0.4452  0.2732
 GloVe.42B.300d               0.1947   0.4308             GloVe.6B.300d              0.4450  0.2730
 GloVe.6B.100d                0.1903   0.4291             GloVe.6B.50d               0.4437  0.2720
 GloVe.6B.200d                0.1947   0.4308             GloVe-Twitter-100          0.5109  0.3260
 GloVe.6B.300d                0.1947   0.4308             GloVe-Twitter-200          0.5138  0.3292
 GloVe.6B.50d                 0.1799   0.4119             GloVe-Twitter-25           0.5309  0.3217
 GloVe-Twitter-100            0.1863   0.4218             GloVe-Twitter-50           0.4682  0.2715
 GloVe-Twitter-200            0.1863   0.4218             w2v-GoogleNews-300         0.3697  0.1960
 GloVe-Twitter-25             0.1391   0.3488             GloVe                      0.4483  0.2760
 GloVe-Twitter-50             0.1812   0.4147                         ENRICHED MODELS
 w2v-GoogleNews-300           0.1947   0.4308             counterfit-GloVe           0.4563  0.2680
 dict2vec-dim100              0.1995   0.4335             GloVe-retrofitted          0.4507  0.2787
 dict2vec-dim200              0.1959   0.4315             w2v-76427                  0.4920  0.3033
 dict2vec-dim300              0.1957   0.4315             w2v-counterfit-header      0.4085  0.2225
 WordNet                      0.1977   0.4334             w2v-retrofitted            0.3993  0.2350
             ENRICHED MODELS                              paragram-counterfit        0.5675  0.3722
 counterfit-GloVe             0.1801   0.4027             paragram-74627             0.5539  0.3541
 GloVe-retrofitted            0.1940   0.4264             paragram-retrofitted       0.5263  0.3467
 paragram-counterfit          0.1663   0.3906             paragram-retrofitted-74627 0.5380  0.3633
 paragram-74627               0.2005   0.4365                   AFFECT ENRICHED MODELS
 paragram-retrofitted         0.1798   0.4012             counterfit-GloVe-affect    0.4247  0.2383
 paragram-retrofitted-74627 0.1798     0.4012             GloVe-affect               0.4326  0.2553
 w2v-76427                    0.1964   0.4318             w2v-affect                 0.3900  0.2080
 w2v-counterfit-header        0.1734   0.3991             w2v-counterfit-affect      0.3791  0.2006
 w2v-retrofitted              0.1967   0.4368             w2v-retrofitted-affect     0.3555  0.1986
        AFFECT ENRICHED MODELS                            paragram-affect            0.5848  0.3986
 GloVe-affect                 0.1947   0.4308             paragram-counterfit-affect 0.5860 0.3996
 counterfit-GloVe-affect      0.1810   0.4044
 GloVe-retrofitted-affect-555 0.2021 0.4421                    Table 4: Ranking experiments on CACM
 paragram-affect              0.1977   0.4309
 paragram-counterfit-affect   0.1844   0.4094
 w2v-affect                   0.1940   0.4305
                                                            Table 5 shows performance for query expansion ex-
 w2v-counterfit-affect        0.1762   0.4029
                                                         periments on the CACM dataset using different em-
 w2v-retrofitted-affect       0.1971   0.4345
                                                         beddings. We observe that usage of term embeddings
Table 3: Query expansion experiments on TREC Core        generally causes steadily higher NDCG and MAP.
’17.                                                     While we expected best results with Aff2Vec mod-
                                                         els it turned out ”vanilla” word2vec model trained on
   Table 4 shows performance for Ranking experi-         Google News corpus outperformed all the others in
ments on the CACM dataset using different embed-         NDCG and MAP. On the other hand the best perform-
dings. We observe that usage of term embeddings          ing enriched model is a retrofitted word2vec model
generally causes steadily higher NDCG and MAP. In        whereas among affect enriched models the GloVe
particular the paragaram embeddings models report        retrofitted one provides the best results.
    Query expansion experiments on CACM                  tion in expansion tasks and then towards building fu-
 Model                        NDCG     MAP               sion approaches leveraging enriched word vectors with
 BM25                         0.3805  0.1947             standard IR baselines.
 QL                           0.3621  0.2056
           NON ENRICHED MODELS                           References
 WordNet                      0.4014  0.2146
                                                         [AHK+ 17] James Allan, Donna Harman, Evangelos
 GloVe.42B.300d               0.4657  0.2701
                                                                   Kanoulas, Dan Li, Christophe Van Gysel,
 GloVe.6B.100d                0.4646  0.2635
                                                                   and Ellen Vorhees. Trec 2017 common core
 GloVe.6B.200d                0.4633  0.2631
                                                                   track overview. In Proc. TREC, 2017.
 GloVe.6B.300d                0.4724  0.2707
 GloVe.6B.50d                 0.4575  0.2588             [AMH+ 17] Leif Azzopardi, Yashar Moshfeghi, Martin
 GloVe-Twitter-100            0.4500  0.2576                       Halvey, Rami S Alkhawaldeh, Krisztian
 GloVe-Twitter-200            0.4454  0.2524                       Balog, Emanuele Di Buccio, Diego Cec-
 GloVe-Twitter-25             0.4215  0.2373                       carelli, Juan M Fernández-Luna, Charlie
 GloVe-Twitter-50             0.4422  0.2528                       Hull, Jake Mannix, et al. Lucene4ir: De-
 w2v-GoogleNews-300           0.4824 0.2828                        veloping information retrieval evaluation
 GloVe                        0.4635  0.2685                       resources using lucene. In ACM SIGIR Fo-
             ENRICHED MODELS                                       rum, volume 50, pages 58–75. ACM, 2017.
 counterfit-GloVe             0.4622  0.2661
 GloVe-retrofitted            0.4676  0.2723             [BMII12]    Andrzej Bialecki, Robert Muir, Grant In-
 w2v-76427                    0.4366  0.2518                         gersoll, and Lucid Imagination. Apache
 w2v-counterfit-header        0.4557  0.2629                         lucene 4. In SIGIR 2012 workshop on
 w2v-retrofitted              0.4738  0.2816                         open source information retrieval, page 17,
 paragram-counterfit          0.4661  0.2716                         2012.
 paragram-74627               0.4626  0.2712
                                                         [CHYZ09] Jamie Callan, Mark Hoy, Changkuk Yoo,
 paragram-retrofitted         0.4470  0.2636
                                                                  and Le Zhao. Clueweb09 data set, 2009.
 paragram-retrofitted-74627 0.4486    0.2646
        AFFECT ENRICHED MODELS                           [CLC+ 15] Zhigang Chen, Wei Lin, Qian Chen, Xi-
 counterfit-GloVe-affect      0.4622  0.2673                       aoping Chen, Si Wei, Hui Jiang, and Xi-
 GloVe-affect                 0.4694  0.2734                       aodan Zhu. Revisiting word embedding
 GloVe-retrofitted-affect-555 0.4722  0.2799                       for contrasting meaning. In Proceedings
 w2v-affect                   0.4609  0.2643                       of the 53rd Annual Meeting of the Asso-
 w2v-counterfit-affect        0.4579  0.2674                       ciation for Computational Linguistics and
 w2v-retrofitted-affect       0.4667  0.2744                       the 7th International Joint Conference on
 paragram-affect              0.4426  0.2586                       Natural Language Processing (Volume 1:
 paragram-counterfit-affect   0.4634  0.2723                       Long Papers), volume 1, pages 106–115,
                                                                   2015.
    Table 5: Query expansion experiments on CACM
                                                         [FDJ+ 15] Manaal Faruqui, Jesse Dodge, Sujay Ku-
                                                                   mar Jauhar, Chris Dyer, Eduard Hovy,
6     Conclusions                                                  and Noah A Smith. Retrofitting word vec-
We present extensive experiments to evaluate the im-               tors to semantic lexicons. In Proceedings of
pact of affect-enriched word embeddings for informa-               the 2015 Conference of the North Ameri-
tion retrieval over a news corpus, namely ranking and              can Chapter of the Association for Com-
query expansion implemented using open-source toolk-               putational Linguistics: Human Language
its. We show that using affect-enriched models shows               Technologies, pages 1606–1615, 2015.
a significant improvement for ranking against base-      [FFJ+ 16]   Linnea Fornander, Marc Friberg, Vida Jo-
line/vanilla embeddings (2̃0%) as well as other en-                  hansson, V Lindh-Håård, Pontus Ohlsson,
riched embeddings (2̃-10%). In case of query expan-                  and Ida Palm. Generating synonyms using
sion, improvement is observed for the NYT dataset but                word vectors and an easy-to-read corpus.
vanilla GloVe embeddings report highest values for the               2016.
CACM dataset. We believe the semantic structure and
vocabulary distribution of the CACM dataset results      [Fox83]     Edward A Fox. Characterization of two
in this behavior. We plan to extend this work first                  new experimental collections in computer
towards understanding the role of semantic informa-                  and information science containing textual
           and bibliographic concepts. Technical re-    [RGMJ16] Dwaipayan Roy, Debasis Ganguly, Mandar
           port, Cornell University, 1983.                       Mitra, and Gareth JF Jones. Representing
                                                                 documents and queries as sets of word em-
[GSS17]    Lukas Galke, Ahmed Saleh, and Ansgar                  bedded vectors for information retrieval.
           Scherp. Word embeddings for practical in-             arXiv preprint arXiv:1606.07869, 2016.
           formation retrieval. In 47. Jahrestagung
           der Gesellschaft für Informatik, Infor-     [RMLH17] Navid Rekabsaz, Bhaskar Mitra, Mi-
           matik 2017, Chemnitz, Germany, Septem-                hai Lupu, and Allan Hanbury.      To-
           ber 25-29, 2017, pages 2155–2167, 2017.               ward incorporation of relevant docu-
                                                                 ments in word2vec.      arXiv preprint
[KCC18]    Sopan Khosla, Niyati Chhaya, and Kushal               arXiv:1707.06598, 2017.
           Chawla. Aff2vec: Affect–enriched distri-     [RPMG16] Dwaipayan Roy, Debjyoti Paul, Mandar
           butional word representations. In Pro-                Mitra, and Utpal Garain. Using word em-
           ceedings of the 27th International Confer-            beddings for automatic query expansion.
           ence on Computational Linguistics, pages              arXiv preprint arXiv:1606.07608, 2016.
           2204–2218, 2018.
                                                        [SLMJ15] Tobias Schnabel, Igor Labutov, David
[MCCD13] Tomas Mikolov, Kai Chen, Greg Corrado,                  Mimno, and Thorsten Joachims. Evalu-
         and Jeffrey Dean. Efficient estimation                  ation methods for unsupervised word em-
         of word representations in vector space.                beddings. In Proceedings of the 2015 Con-
         arXiv preprint arXiv:1301.3781, 2013.                   ference on Empirical Methods in Natural
                                                                 Language Processing, pages 298–307, 2015.
[Mil95]    George A Miller. Wordnet: a lexical
                                                        [TGH17a] Julien Tissier, Christophe Gravier, and
           database for english. Communications of
                                                                 Amaury Habrard. Dict2vec: Learning
           the ACM, 38(11):39–41, 1995.
                                                                 word embeddings using lexical dictionar-
                                                                 ies. In Conference on Empirical Methods
[MNCC16] Bhaskar Mitra, Eric Nalisnick, Nick
                                                                 in Natural Language Processing (EMNLP
         Craswell, and Rich Caruana. A dual em-
                                                                 2017), pages 254–263, 2017.
         bedding space model for document rank-
         ing.  arXiv preprint arXiv:1602.01137,         [TGH17b] Julien Tissier, Christopher Gravier, and
         2016.                                                   Amaury Habrard. Dict2vec : Learning
                                                                 word embeddings using lexical dictionar-
[MOT+ 16] Nikola Mrkšic, Diarmuid OSéaghdha,                   ies. In Proceedings of the 2017 Conference
          Blaise Thomson, Milica Gašic, Lina Rojas-             on Empirical Methods in Natural Lan-
          Barahona, Pei-Hao Su, David Vandyke,                   guage Processing, pages 254–263. Associ-
          Tsung-Hsien Wen, and Steve Young.                      ation for Computational Linguistics, 2017.
          Counter-fitting word vectors to linguistic
          constraints. In Proceedings of NAACL-         [WKB13]    Amy Beth Warriner, Victor Kuperman,
          HLT, pages 142–148, 2016.                                and Marc Brysbaert. Norms of valence,
                                                                   arousal, and dominance for 13,915 en-
[NWV16]    Kim Anh Nguyen, Sabine Schulte im                       glish lemmas. Behavior Research Methods,
           Walde, and Ngoc Thang Vu. Integrating                   45(4):1191–1207, 2013.
           distributional lexical contrast into word    [YFL17]    Peilin Yang, Hui Fang, and Jimmy Lin.
           embeddings for antonym-synonym distinc-                 Anserini: Enabling the use of lucene for
           tion. arXiv preprint arXiv:1605.07766,                  information retrieval research. In Pro-
           2016.                                                   ceedings of the 40th International ACM
                                                                   SIGIR Conference on Research and De-
[Pic97]    Rosalind W. Picard. Affective Computing.                velopment in Information Retrieval, pages
           MIT Press, Cambridge, MA, USA, 1997.                    1253–1256. ACM, 2017.
[PSM14]    Jeffrey Pennington, Richard Socher, and      [ZC17]     Hamed Zamani and W Bruce Croft.
           Christopher Manning. Glove: Global vec-                 Relevance-based word embedding. In Pro-
           tors for word representation. In Proceed-               ceedings of the 40th International ACM SI-
           ings of the 2014 conference on empiri-                  GIR Conference on Research and Develop-
           cal methods in natural language processing              ment in Information Retrieval, pages 505–
           (EMNLP), pages 1532–1543, 2014.                         514. ACM, 2017.