<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generalizing Representations of Lexical Semantic Relations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anupama Chingacham</string-name>
          <email>anu.vgopal2009@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denis Paperno</string-name>
          <email>denis.paperno@loria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS, LORIA, UMR 7503, Vandoeuvre-le`s-Nancy</institution>
          ,
          <addr-line>F-54500</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SFB 1102, Saarland University</institution>
          ,
          <addr-line>Saarbrucken ,66123</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. We propose a new method for unsupervised learning of embeddings for lexical relations in word pairs. The model is trained on predicting the contexts in which a word pair appears together in corpora, then generalized to account for new and unseen word pairs. This allows us to overcome the data sparsity issues inherent in existing relation embedding learning setups without the need to go back to the corpora to collect additional data for new pairs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Proponiamo un nuovo metodo
per l’apprendimento non
supervisionato delle rappresentazioni delle relazioni
lessicali fra coppie di parole (word pair
embeddings). Il modello viene allenato
a prevedere i contesti in cui compare uns
coppia di parole, e successivamente viene
generalizzato a coppie di parole nuove o
non attestate. Questo ci consente di
superare i problemi dovuti alla scarsita` di
dati tipica dei sistemi di apprendimento
di rappresentazioni, senza la necessita` di
tornare ai corpora per raccogliere dati per
nuove coppie di parole.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>In this paper we address the problem of
unsupervised learning of lexical relations between any two
words. We take the approach of unsupervised
representation learning from distribution in corpora,
as familiar from word embedding methods, and
enhance it with an additional technique to
overcome data sparsity.</p>
      <p>
        Word embedding models give a promise of
learning word meaning from easily available text
data in an unsupervised fashion and indeed the
resulting vectors contain a lot of information about
the semantic properties of words and objects they
refer to, cf. for instance Herbelot and Vecchi
(2015). Based on the distributional hypothesis
coined by Z. S. Harris (1954), word embedding
models, which construct word meaning
representations as numeric vectors based on the
cooccurrence statistics on the word’s context, have
been gaining ground due to their quality and
simplicity. Produced by efficient and robust
implementations such as word2vec
        <xref ref-type="bibr" rid="ref21">(Mikolov et al.,
2013)</xref>
        and GloVe
        <xref ref-type="bibr" rid="ref27">(Pennington et al., 2014)</xref>
        ,
modern word vector models are able to predict whether
two words are related in meaning, reaching human
performance on benchmarks like WordSim353
        <xref ref-type="bibr" rid="ref1">(Agirre et al., 2009)</xref>
        and MEN
        <xref ref-type="bibr" rid="ref8">(Bruni et al., 2014)</xref>
        .
      </p>
      <p>
        On the other hand, lexical knowledge includes
not only properties of individual words but also
relations between words. To some extent, lexical
semantic relations can be recovered from the word
representations via the vector offset method as
evidenced by various applications including analogy
solving, but already on this task it has multiple
drawbacks
        <xref ref-type="bibr" rid="ref20">(Linzen, 2016)</xref>
        and has a better
unsupervised alternative
        <xref ref-type="bibr" rid="ref19 ref27 ref31 ref6 ref8">(Levy and Goldberg, 2014)</xref>
        .
      </p>
      <p>Just like a word representation is inferred from
the contexts in which the word occurs,
information about the relation in a given word pair can be
extracted from the statistics of contexts in which
the two words of the pair appear together. In our
model, we use this principle to learn high-quality
pair embeddings from frequent noun pairs, and on
their basis, build a way to construct a relation
representation for an arbitrary pair.</p>
      <p>Note that we approach the problem from the
viewpoint of lerning general-purpose semantic
knowledge. Our goal is to provide a vector
representation for an arbitrary pair of words w1; w2.
This is a more general task than relation
extraction, which aims at identifying the semantic
relation between the two words in a particular
context. Modeling such general relational knowledge
is crucial for natural language understanding in
realistic settings. It may be especially useful for
recovering the notoriously difficult bridging
relations in discourse since they involve understanding
implicit links between words in the text.</p>
      <p>
        Representations of word relations have
applications in many NLP tasks. For example, they could
be extremely useful for resolving bridging,
especially of the lexical type
        <xref ref-type="bibr" rid="ref30">(Ro¨siger et al., 2018)</xref>
        .
But in order to be useful in practice, word relation
models must generalize to rare or unseen cases.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Our project is related to the task of relation
extraction that has been in focus of various
complex models
        <xref ref-type="bibr" rid="ref22 ref36">(Mintz et al., 2009; Zelenko et al.,
2003)</xref>
        including recurrent
        <xref ref-type="bibr" rid="ref34">(Takase et al., 2016)</xref>
        and
convolutional neural network architectures
        <xref ref-type="bibr" rid="ref10 ref14 ref16 ref18 ref23 ref26 ref32 ref35 ref35">(Xu et
al., 2015; Nguyen and Grishman, 2015; Zeng et
al., 2014)</xref>
        , although the simple averaging or
summation of the context word vectors seems to
produce good results for the task
        <xref ref-type="bibr" rid="ref10 ref14">(Fan et al., 2015;
Hashimoto et al., 2015)</xref>
        . The latter work by
Hashimoto et al. bears the greatest resemblance
to the approach to learning semantic relation
representations that we utilize here. Hashimoto et
al. train noun embeddings on the task of
predicting words occurring in between the two nouns in
text corpora and use these embeddings along with
averaging-based context embeddings as input to
relation classification.
      </p>
      <p>
        There are numerous studies dedicated to
characterizing relations in word pairs abstracted away
from the specific context in which the word pair
appears. Much of this literature focuses on one
specific lexical semantic relation at a time. Among
these, lexical entailment (hypernymy) has
probably been the most popular since Hearst (1992)
with various representation learning approaches
specifically targeting lexical entailment
        <xref ref-type="bibr" rid="ref11 ref18 ref2 ref29 ref33 ref34 ref7">(Fu et al.,
2014; Anh et al., 2016; Roller and Erk, 2016;
Bowman, 2016; Kruszewski et al., 2015)</xref>
        and the
antonymy relation has also received considerable
attention
        <xref ref-type="bibr" rid="ref26 ref28 ref31 ref33">(Ono et al., 2015; Pham et al., 2015;
Shwartz et al., 2016; Santus et al., 2014)</xref>
        .
Another line of work in representing the
compositionality of meaning of words using syntactic
structures(like Adjective-Noun pairs) is another
approach towards semantic relation representations.
        <xref ref-type="bibr" rid="ref12 ref4">(Baroni and Zamparelli, 2010; Guevara, 2010)</xref>
        .
      </p>
      <p>
        The kind of relation representations we aim at
learning are meant to encode general relational
knowledge and are produced in an unsupervised
way, even though it can be useful for
identification of specific relations like hypernymy and for
relation extraction from text occurrences
        <xref ref-type="bibr" rid="ref17">(Jameel
et al., 2018)</xref>
        . The latter paper documents a model
that produces word pair embeddings by
concatenating Glove-based word vectors with relation
embeddings trained to predict the contexts in which
the two words of the pair co-occur. The main issue
with Jameel et al.’s models is scalability: as the
authors admit, it is prohibitively expensive to collect
all the data needed to train all the relation
embeddings. Instead, their implementation requires, for
each individual word pair, going back to the
training corpus via an inverse index and collecting the
data needed to estimate the embedding of the pair.
This strategy might not be efficient for practical
applications.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Model</title>
      <p>We propose a simple solution to the
scalability problem inherent in word relation embedding
learning from joint cooccurrence data, which also
allows the model to generalize to word pairs that
never occur together in the corpus, or occur too
rarely to accumulate significant relational cues
information. The model is trained in two steps.</p>
      <p>First, we apply the skip-gram with negative
sampling algorithm to learn relation vectors for
pairs of nouns n1; n2 with high individual and
joint occurrence frequencies. In our experiments,
all word pairs with pair frequency more than 100
and its individual word frequency more than 500
are considered as frequent pairs. To estimate the
SkipRel vector of the pair, we adapted the
learning objective of skip-gram with negative sampling,
maximizing
log (vc0T :un1:n2 )+ ik=1 Eci Pn(c)[log ( vc0T :un1:n2 )]
i
(1)
where un1:n2 is the SkipRel embedding of a word
pair, vc0 is the embedding of a context word
occurring between n1 and n2, and k is the number of
negative samples.</p>
      <p>High-quality SkipRel embeddings can only
obtained for noun pairs that co-occur frequently. To
allow the model to generalize to noun pairs that do
not co-occur in our corpus, we estimated an
interpolation u~n1:n2 of the word pair embedding
u~n1:n2 = relU (Avn1 + Bvn2 )
(2)
where vn1 ; vn2 are pretrained word embeddings
for the two nouns and the matrices A; B encode
systematic correspondences between the
embeddings of a word and the relations it participates
in. Matrices A; B were estimated using stochastic
gradient descent with the objective of minimizing
the square error for the SkipRel vectors of frequent
noun pairs n1; n2</p>
      <p>We call u~n1:n2 the generalized SkipRel
embedding (g-SkipRel) for the noun pair n1; n2.
RelWord, the proposed relation embedding, is the
concatenation of the g-SkipRel vector u~n1:n2 and
the Diff vector vn1 vn2 .
4</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental setup</title>
      <p>
        We trained relation vectors on the ukWAC corpus
        <xref ref-type="bibr" rid="ref5">(Baroni et al., 2009)</xref>
        containing 2 bln tokens of
web-crawled English text. SkipRel is trained on
noun pair instances separated by at most 10
context tokens with embedding size of 400 and
minibatch size of 32. Frequency filtering is performed
to control the size of pair vocabulary (jP j).
Frequent pairs are pre-selected using pair and word
frequency thresholds. For pretrained word
embeddings we used the best model from Baroni et
al. (2014).
      </p>
      <p>
        The experimental setup is built and
maintained on GPU clusters provided by GRID5000
        <xref ref-type="bibr" rid="ref9">(Cappello et al., 2005)</xref>
        . The code for
model implementation and evaluation is
publicly available at https://github.com/
Chingcham/SemRelationExtraction
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation</title>
      <p>
        If our relation representations are rich enough in
the information they encode, they will prove
useful for any relation classification task regardless
of the nature of the classes involved. We evaluate
the model with a supervised softmax classifier on
2 labeled multiclass datasets, BLESS
        <xref ref-type="bibr" rid="ref3">(Baroni and
Lenci, 2011)</xref>
        and EVALuation1.0
        <xref ref-type="bibr" rid="ref32">(Santus et al.,
2015)</xref>
        , as well as the binary classification EACL
antonym-synonym dataset
        <xref ref-type="bibr" rid="ref24">(Nguyen et al., 2017)</xref>
        .
BLESS set consists of 26k triples of concept and
      </p>
      <sec id="sec-6-1">
        <title>Diff</title>
        <p>g-SkipRel
RelWord
Random
Majority
81.15
59.07
80.94
12.5
24.71
relata spanned across 8 classes of semantic
relation and EVALuation1.0 has 7.5k datasets spanned
across 9 unique relation types. From EACL 2017
dataset, we used a list of 4062 noun pairs.</p>
        <p>Since we aim at recognizing whether the
information relevant for relation identification is
present in the representations in an easily
accessible form, we choose to employ a simple, one-layer
SoftMax classifier. The classifier was trained for
100 epochs, and the learning rate for the model is
defined through crossvalidation. L2 regularization
is employed to avoid over-fitting and the l2 factor
is decided through empirical analysis. The
classifier is trained with mini-batches of size 16 for
BLESS &amp; EVALuation1.0 and 8 for EACL 2017.
SGD is utilized for optimizing model weights.</p>
        <p>We prove the efficiency of RelWord vectors, we
contrast them with the simpler representations of
(g-)SkipRel and to Diff, the difference of the two
word vectors in a pair, which is a commonly used
simple method. We also include two simple
baselines: random choice between the classes and the
constant classifier that always predicts the
majority class.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>All models outperform the baselines by a wide
margin (Table 1). RelWord model compares
favorably with the other options, outperforming them
on EVAL and EACL datasets and being on par
with the vector difference model for BLESS. This
result signifies a success of our generalization
strategy, because in each dataset only a minority of
examples had pair representations directly trained
from corpora; most WordRel vectors were
interpolated from word emeddings.</p>
      <p>Now let us restrict our attention to word pairs
that frequently co-occur (Table 2). Note that the
composition of classes, and by consequence the
majority baseline, is different from Table 1, so
the accuracy figures in the two tables are not
di</p>
      <sec id="sec-7-1">
        <title>Diff</title>
        <p>SkipRel
RelWord
Random
Majority
77.13
73.37
83.27
12.5
33.22
44.61
48.40
54.47
11.11
26.37
rectly comparable. For these frequent pairs we can
rely on SkipRel relation vectors that have been
estimated directly from corpora and have a higher
quality; we also use SkipRel vectors instead of
gSkipRel as a component of RelWord. We note that
for these pairs the performance of the Diff method
dropped uniformly. This presumably happened in
part because the classifier could no longer rely on
the information on relative frequencies of the two
words which is implicitly present in Diff
representations; for example, it is possible that antonyms
have more similar frequencies than synonyms in
the EACL dataset. For BLESS and EVAL, the
drop in the performance of Diff could have
happened in part because the classes that include more
frequent pairs such as isA, antonyms and
synonyms are inherently harder to distinguish than
classes that tend to contain rare pairs. In contrast,
the comparative effectiveness of RelWord is more
pronounced after frequency filtering. The
usefulness of relation embeddings is especially
impressive for the EACL dataset. In this case, vanilla
SkipRel emerges as the best model, confirming
that word embeddings per se are not particularly
useful for detecting the synonymy-antonymy
distinction for this subset of EACL, getting an
accuracy just above the majority baseline, while pair
embeddings go a long way.</p>
        <p>Finally, quantitative evaluation in terms of
classification accuracy or other measures does not
fully characterize the relative performance of the
models; among other things, certain types of
misclassification might be worse than others. For
example, a human annotator would rarely confuse
synonyms with antonyms, while mistaking has a
for has property could be a common point of
disagreement between annotators. To do a
qualitative analysis of errors made by different models,
we selected the elements of EVAL test partition
where Diff and RelWord make distinct predictions
that are both different from the gold standard label.
We manually annotated for each of the 53
examples of this kind, which model is more a acceptable
according to a human’s judgment. In a majority
of cases (28) the WordRel model makes a
prediction that is more human-like than that of Diff. For
example, WordRel predicts that shade is part of
shadow rather than its synonym (gold label);
indeed, any part of a shadow can be called shade.
The Diff model in this case and in many other
examples bets on the antonym class, which does
not make any sense semantically; the reason why
antonym is a common false label is probably that
it is simply the second biggest class in the dataset.
The examples where Diff makes a more
meaningful error than RelWord are less numerous (6 out
of 53). There are also 15 examples where both
system’s predictions are equally bad (for example,
for Nice,France Diff predict isa label and
WordRel predicts synonym) and 4 examples where
the two predictions are equally reasonable. For
more examples, see Table 3. We note that
sometimes our model’s prediction seems more correct
than the gold standard, for example in assigning
hasproperty rather than isa label to the pair
human, male.
7</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>
        The proposed model is simple in design and
training, learning word relation vectors based on
cooccurrence with unigram contexts and extending
to rare or unseen words via a non-linear
mapping. Despite its simplicity, the model is
capable of capturing lexical relation patterns in vector
representations. Most importantly, RelWord
extends straightforwardly to novel word pairs in a
manner that does not require recomputing
cooccurrence counts from the corpus as in related
approaches
        <xref ref-type="bibr" rid="ref17">(Jameel et al., 2018)</xref>
        . This allows for an
easy integration of the pretrained model into
various downstream applications.
      </p>
      <p>In our evaluation, we observed that learning
word pair relation embeddings improves on the
semantic information already present in word
embeddings. With respect to certain semantic
relations like synonyms, the performance of
relation embedding is comparable to that of word
embeddings but with an additional cost of training a
representation for a significant number of pair of
words. For other relation types like antonyms or
hypernyms, in which words differ semantically but
share similar contexts, learned word pair relation
embeddings have an edge over those derived from
word embeddings via simple subtraction. While in
practice one has to make a choice based on the task
requirements, it is generally beneficial to combine
both types of relation embeddings for best results
in a model like RelWord.</p>
      <p>
        Our current model employs pretrained word
embeddings and learns the word pair embeddings
and a word-to-relation embedding mapping
separately. In the future, we plan to train a version
of the model end-to-end, with word embeddings
and the mapping trained simultaneously. As
literature suggests
        <xref ref-type="bibr" rid="ref14 ref34">(Hashimoto et al., 2015; Takase et
al., 2016)</xref>
        , such joint training might not only
benefit the model but also improve the performance of
the resulting word embeddings on other tasks.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This research is supported by CNRS PEPS grant
ReSeRVe. We thank Roberto Zamparelli, Germa´n
Kruszewski, Luca Ducceschi and anonymous
reviewers who gave feedback on previous versions
of this work.</p>
      <p>Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou,
Jun Zhao, et al. 2014. Relation classification via
convolutional deep neural network. In COLING,
pages 2335–2344.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          , Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pas¸ca, and Aitor Soroa.
          <year>2009</year>
          .
          <article-title>A study on similarity and relatedness using distributional and wordnet-based approaches</article-title>
          .
          <source>In Proceedings of Human Language Technologies</source>
          :
          <article-title>The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Tuan</given-names>
            <surname>Luu</surname>
          </string-name>
          <string-name>
            <surname>Anh</surname>
          </string-name>
          , Yi Tay, Siu Cheung Hui, and See Kiong Ng.
          <year>2016</year>
          .
          <article-title>Learning term embeddings for taxonomic relation identification using dynamic weighting neural network</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>403</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Lenci</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>How we blessed distributional semantic evaluation</article-title>
          .
          <source>In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics</source>
          , GEMS '
          <volume>11</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Zamparelli</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space</article-title>
          .
          <source>In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10</source>
          , pages
          <fpage>1183</fpage>
          -
          <lpage>1193</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          , Silvia Bernardini, Adriano Ferraresi, and
          <string-name>
            <given-names>Eros</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The wacky wide web: a collection of very large linguistically processed web-crawled corpora</article-title>
          .
          <source>Language resources and evaluation</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          , Georgiana Dinu, and
          <string-name>
            <given-names>Germn</given-names>
            <surname>Kruszewski</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In 52nd Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <source>ACL 2014 - Proceedings of the Conference</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>238</fpage>
          -
          <lpage>247</lpage>
          ,
          <fpage>06</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Ryan Bowman</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling natural language semantics in learned representations</article-title>
          .
          <source>Ph.D. thesis, Ph. D. thesis</source>
          , Stanford University.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Elia</given-names>
            <surname>Bruni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Nam-Khanh Tran</surname>
            , and
            <given-names>Marco</given-names>
          </string-name>
          <string-name>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Multimodal distributional semantics</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>49</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Franck</given-names>
            <surname>Cappello</surname>
          </string-name>
          , Eddy Caron,
          <string-name>
            <given-names>Michel J.</given-names>
            <surname>Dayd</surname>
          </string-name>
          , Frdric Desprez, Yvon Jgou, Pascale
          <string-name>
            <surname>Vicat-Blanc</surname>
            <given-names>Primet</given-names>
          </string-name>
          , Emmanuel Jeannot, Stphane Lanteri, Julien Leduc, Nouredine Melab, Guillaume Mornet, Raymond Namyst, Benjamin Qutier, and Olivier Richard.
          <year>2005</year>
          . Grid'
          <volume>5000</volume>
          :
          <article-title>a large scale and highly reconfigurable grid experimental testbed</article-title>
          .
          <source>In GRID</source>
          , pages
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Miao</given-names>
            <surname>Fan</surname>
          </string-name>
          , Kai Cao, Yifan He, and
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Grishman</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Jointly embedding relations and mentions for knowledge population</article-title>
          .
          <source>arXiv preprint arXiv:1504</source>
          .
          <fpage>01683</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Ruiji</given-names>
            <surname>Fu</surname>
          </string-name>
          , Jiang Guo, Bing Qin, Wanxiang Che,
          <string-name>
            <given-names>Haifeng</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ting</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning semantic hierarchies via word embeddings</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          , pages
          <fpage>1199</fpage>
          -
          <lpage>1209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Emiliano</given-names>
            <surname>Guevara</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A regression model of adjective-noun compositionality in distributional semantics</article-title>
          .
          <source>In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics</source>
          , GEMS '
          <volume>10</volume>
          , pages
          <fpage>33</fpage>
          -
          <lpage>37</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Zellig</given-names>
            <surname>Harris</surname>
          </string-name>
          .
          <year>1954</year>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          ,
          <volume>10</volume>
          (
          <issue>23</issue>
          ):
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Kazuma</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          , Pontus Stenetorp, Makoto Miwa, and
          <string-name>
            <given-names>Yoshimasa</given-names>
            <surname>Tsuruoka</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Task-oriented learning of word embeddings for semantic relation classification</article-title>
          .
          <source>arXiv preprint arXiv:1503</source>
          .
          <fpage>00095</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Marti A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>Technical Report S2K-92-09.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>Aure´lie Herbelot and Eva Maria Vecchi</article-title>
          .
          <year>2015</year>
          .
          <article-title>Building a shared world: Mapping distributional to model-theoretic semantic spaces</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>22</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Shoaib</given-names>
            <surname>Jameel</surname>
          </string-name>
          , Zied Bouraoui, and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Schockaert</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Unsupervised learning of distributional relation vectors</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>33</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>German</given-names>
            <surname>Kruszewski</surname>
          </string-name>
          , Denis Paperno, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deriving boolean structures from distributional vectors</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>3</volume>
          :
          <fpage>375</fpage>
          -
          <lpage>388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Omer</given-names>
            <surname>Levy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Linguistic regularities in sparse and explicit word representations</article-title>
          .
          <source>In Proceedings of the eighteenth conference on computational natural language learning</source>
          , pages
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Tal</given-names>
            <surname>Linzen</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Issues in evaluating semantic spaces using word analogies</article-title>
          .
          <source>arXiv preprint arXiv:1606</source>
          .
          <fpage>07736</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301</source>
          .
          <fpage>3781</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Mintz</surname>
          </string-name>
          , Steven Bills, Rion Snow, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          .
          <source>In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 -</source>
          Volume 2, ACL '
          <volume>09</volume>
          , pages
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Thien</given-names>
            <surname>Huu</surname>
          </string-name>
          Nguyen and
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Grishman</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Relation extraction: Perspective from convolutional neural networks</article-title>
          .
          <source>In Phil Blunsom</source>
          ,
          <string-name>
            <surname>Shay</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
          </string-name>
          , Paramveer S. Dhillon, and Percy Liang, editors,
          <source>VS@HLT-NAACL</source>
          , pages
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
          <article-title>The Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Kim</given-names>
            <surname>Anh</surname>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <source>Sabine Schulte im Walde, and Ngoc Thang Vu</source>
          .
          <year>2017</year>
          .
          <article-title>Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>76</fpage>
          -
          <lpage>85</lpage>
          , Valencia, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Masataka</given-names>
            <surname>Ono</surname>
          </string-name>
          , Makoto Miwa, and
          <string-name>
            <given-names>Yutaka</given-names>
            <surname>Sasaki</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Word embedding-based antonym detection using thesauri and distributional information</article-title>
          .
          <source>In HLT-NAACL</source>
          , pages
          <fpage>984</fpage>
          -
          <lpage>989</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Nghia</given-names>
            <surname>The</surname>
          </string-name>
          <string-name>
            <surname>Pham</surname>
          </string-name>
          , Angeliki Lazaridou,
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          , et al.
          <year>2015</year>
          .
          <article-title>A multitask objective to inject lexical contrast into distributional semantics</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>2</volume>
          , pages
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Roller</surname>
          </string-name>
          and
          <string-name>
            <given-names>Katrin</given-names>
            <surname>Erk</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Relations such as hypernymy: Identifying and exploiting hearst patterns in distributional vectors for lexical entailment</article-title>
          .
          <source>CoRR, abs/1605</source>
          .05433.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Ina</given-names>
            <surname>Ro</surname>
          </string-name>
          ¨siger, Arndt Riester, and
          <string-name>
            <given-names>Jonas</given-names>
            <surname>Kuhn</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bridging resolution: Task definition, corpus resources and rule-based experiments</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>3516</fpage>
          -
          <lpage>3528</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Santus</surname>
          </string-name>
          , Qin Lu, Alessandro Lenci, and
          <string-name>
            <given-names>Churen</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Unsupervised antonym-synonym discrimination in vector space</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Santus</surname>
          </string-name>
          , Frances Yung, Alessandro Lenci, and
          <string-name>
            <surname>Chu-Ren Huang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Evalution 1.0: an evolving semantic dataset for training and evaluation of distributional semantic models</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications</source>
          , pages
          <fpage>64</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>Vered</given-names>
            <surname>Shwartz</surname>
          </string-name>
          , Enrico Santus, and
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Hypernyms under siege: Linguistically-motivated artillery for hypernymy detection</article-title>
          .
          <source>arXiv preprint arXiv:1612</source>
          .
          <fpage>04460</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>Sho</given-names>
            <surname>Takase</surname>
          </string-name>
          , Naoaki Okazaki, and
          <string-name>
            <given-names>Kentaro</given-names>
            <surname>Inui</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling semantic compositionality of relational patterns</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          ,
          <volume>50</volume>
          :
          <fpage>256</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>Kun</given-names>
            <surname>Xu</surname>
          </string-name>
          , Yansong Feng, Songfang Huang, and
          <string-name>
            <given-names>Dongyan</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Semantic relation classification via convolutional neural networks with simple negative sampling</article-title>
          .
          <source>CoRR, abs/1506</source>
          .07650.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Zelenko</surname>
          </string-name>
          , Chinatsu Aone, and
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Richardella</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Kernel methods for relation extraction</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>1083</fpage>
          -
          <lpage>1106</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>