<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimal Transport Methods for Aligning Knowledge Graph Triples with Natural Language in Unsupervised Settings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Kalinowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Drexel University</institution>
          ,
          <addr-line>Philadelphia, PA 19104</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>33</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>Frameworks for aligning embeddings of text and embeddings of knowledge graphs (KG) have been used for generating mappings for test-to-text alignment and KG-to-KG alignment, but little has been done for alignment between these two domains. In this dissertation proposal, I aim to create a framework for KG-to-text alignment that utilizes little to no training data to learn these correspondences. Additionally, motivated by the semantic geometries of these embedding spaces, I propose a new line of research into generating explicit embeddings of triples from a knowledge graph.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge representation</kwd>
        <kwd>Automated metadata generation</kwd>
        <kwd>Embeddings</kwd>
        <kwd>Optimal transport</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Importance</title>
      <p>Knowledge graphs (KGs) and ontologies form the computational backbone of the
modern Semantic Web, curated by taxonomists and ontologists in conjunction
with domain subject matter experts. Collaboration between these parties is a
bottleneck in large-scale organizations due to coordination of people and sourcing
of relevant information and terminologies for ontologists to massage into an
enterprise-wide standard. This bottleneck is felt both in developing ontological
terminologies and populating the knowledge graph with facts and assertions
about the domain being described.</p>
      <p>Industrial terminologies are buried deep in policy documents or technical
white papers, making the job of the ontologist one of synthesizing these
documents into a conscise, machine-readable set of interlinked terminologies (T-Box).
For large-scale knowledge graphs, such as those leveraged in popular search
engines, the information scales past terminologies and additionally covers facts
or assertions (A-Box) about the objects described in the graph. Validating the
accuracy of assertions in the knowledge graph is critical for auditing the
trustworthiness of those claims, which is especially relevant in highly regulated industries
such as banking or pharmatecuticals. As facts (in the form of hs; p; oi triples)
are added to a knowledge graph, either through automated methods such as
link prediction techniques or human generated annotations, there is an
additional opportunity to enrich the knowledge graph with metadata about these
triples, such as a source of evidence from a text document. Developing
methodologies for linking the T-Box and A-Box to textual evidence will provide a set
of tools to allow ontologists and knowledge graph developers to expedite their
work while ensuring the highest degree of accuracy and auditability, and thus is
the focus of this work.</p>
      <p>For example, an ontologist working in the nancial services domain may wish
to develop a standardized de nition of a xed- oat interest rate swap. They may
begin by inheriting the structure of previously de ned terminologies, namely
those related to interest rates and swaps, deriving the triples</p>
      <p>hf ixed f loat interest rate swap; has type; swap contracti,
hf ixed f loat interest rate swap; has leg; f ixed interest ratei and
hf ixed f loat interest rate swap; has leg; f loating interest ratei.
However, without a background in nancial terminology, they may miss the
fact that a xed- oat interest rate swap is more commonly refered to as a
vanilla interest rate swap. This fact can be inferenced from a variety of textual
sources, such as a sentence like `A vanilla interest rate swap allows two
counterparties to hedge against interest rate volatility by trading
a floating rate for a fixed rate.', given the set of seed triples the
ontologist has developed, helping to expand the coverage of the knowledge graph.</p>
      <p>Of critical importance in the development of a text to KG methodology is its
ability to rapidly adapt to new source ontologies and data domains.
Additionally, such a system should not be bottlenecked by reliance on abundant training
data, a point of failure for many ML projects. This motivates the use of
unsupervised techniques for knowledge representation, and I propose an exploration
of cross-domain optimal transport between embeddings of natural language and
embeddings of a knowledge graph. Given this motivation, I present the following
formalism.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>Suppose S = fs1; s2; : : : ; sng is a set of sentences and T = ft1; t2; : : : ; tng is a set
of knowledge graph triples, each triple of the form ti = hs; p; oi. Next, de ne two
functions f and g that create low-dimensional (latent) representations of each
sentence and triple, respectively, such that f (si) = esi 2 Rn (the source space)
and g(tj) = etj 2 Rm (the target space) where n jSj, m jT j. In order to
link triples to their most relevant sentences, both for validating terminology and
assertions, we seek a mapping function : Rn ! Rm to transport one set to
another with minimal loss, i.e. (T ) S, such that for new triples ti 2= T we
can nd a supporting sentence representation (ti) sj 2= S in a large-scale
text corpus C that re ects the same semantic information. Learning such a
should not be taxed by reliance on an abundance of paired labeled data points,
i.e. a set of labels L = f(tk; sk); : : : ; (tl; sl)g. Instead, I assume no such set exists
at training time, framing the learning of as an unsupervised task.</p>
      <p>
        The desire to limit the reliance on paired labeled data points helps inform
the potential choices of . Speci cally, methods from optimal transport (OT)
theory lend themselves well to this problem by exploiting the structure of each
embedding space using pairwise distance metrics rather than relying on data with
paired labels. Instead, couplings of objects from each respective space are inferred
through probabilistic transport maps, shifting the focus from gathering labeled
sentence-triple pairs for supervised learning to re ning the representations of
these objects in their respective latent spaces. Optimal transport also provides
a probabilistic framework for mapping assignments; the Kantorivitch relaxation
of Monge's original statement admits a solution where the mass at any point in
the source space can be dispatched to several locations in the target space [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
tting the problem setting as a single KG triple may admit in nitely innumerable
sentence representations. The probabilistic and rigorous mathematical approach
of OT add to the understandability of results in opposition to black-box models
such as generative adversarial networks (GANs).
      </p>
      <p>One drawback of OT techniques is the requirement of a cost matrix de ned
between the spaces X and Y. De ning such a cost matrix requires labeled data
points beween the two spaces, although a weaker assumption can be used to avoid
this by de ning two inter-domain distance matrices D 2 Rn n and D0 2 Rm m.
Such matrices can then be aligned through the following formulation of the
Gromov-Wasserstein problem</p>
      <p>GW ((a; D); (b; D0))2 = P 2mUi(na;b) ED;D0 (P )
ED;D0 (P ) =
i;j;i0;j0 jDi;i0</p>
      <p>Dj0;j0 j2Pi;j Pi0;j0
.</p>
      <p>Using this formalism, the problem of interest can then be framed as follows:
Problem Statement: What are the optimal choices of embedding
functions f and g to establish distance matrices D and D0 such that
1. for pairs of triples (ti; ti0 ) 2 T that contain some notion of semantic
similarity, Di;i0 = d(f (ti); f (ti0 )) is minimized
2. while simultaneously minimizing Dj0;j0 = d(g(sj ); g(sj0 )) for
semantically similar sentences (sj ; sj0 ) 2 S
for some distance metric d, thus allowing optimal couplings between ti
and si to be established via the Gromov-Wasserstein distance?
3</p>
    </sec>
    <sec id="sec-3">
      <title>Research Questions, Hypotheses and Research Plan</title>
      <p>
        Prior research in this area shows that optimal transport of embedding spaces
has been successful within a given data domain (i.e. unsupervised alignment of
word embeddings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], unsupervised alignment of knowledge graph entities [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
My research questions seek to extend these methodologies to the cross-domain
task in order to align embeddings of knowledge graph triples with embeddings
of semantically related sentences.
      </p>
      <p>RQ-I: Is there an accurate, unsupervised technique for aligning a set of
knowledge graph triples to a set of semantically similar sentences?</p>
      <p>H-I: Optimal transport methods provide a mathematical framework for
unsupervised alignment based on intra-domain pairwise similarities. Successful
application of OT is dependent on how similarities of like-objects are represented
in each respective space.</p>
      <p>
        To accomplish this, properties of each embedding space that make them
useful for such alignment must rst be established. In my prior work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], these
properties are explored for sentence embeddings while keeping the knowledge
graph embeddings xed as the concatenation of head, tail and relation vectors
generated by TransE. It remains to be seen how changing the structure of
knowledge graph embeddings, via changing the algorithm selection (such as using more
expressive models like ComplEx [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or ConvE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), incorporating additional
information such as literals [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or changing the way entity and relation vectors are
combined to represent a triple, help or hurt the ability to generate high-quality
alignments, motivating the next research question.
      </p>
      <p>RQ-II: How can current knowledge graph embedding methods be extended
past representing entities and relations as separate objects and instead focus on
embedding triples as the target objects?</p>
      <p>As the majority of current methods focus almost exclusively on the link
prediction task, these methods may not be well-suited for establishing embeddings
of triples, leading to the following hypothesis.</p>
      <p>H-II: Triple embeddings built from aggregations of entity and relation
embeddings do not su ciently encode the underlying semantics of such triples.</p>
      <p>
        Building upon the work of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], treating triples as walks on the knowledge
graph and weighting the strength of each relationship may help to create a
semantic embedding space that will assist in alignment. The following section
details how I will approach measuring the amount of semantic information
captured by these methods.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Approach and Evaluation</title>
      <p>
        Motivated by work on word embedding regularities [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], I wish to probe both
sentence and KG embedding spaces generated by a variety of embedding algorithms
and measure the degree to which they exhibit an underlying structure that can
be leveraged for aligning these resources. Lurking beneath the above research
questions is the fuzzily-de ned notion of \semantic similarity," but metrics exist
to make this quanti cation concrete. These metrics are used to de ne how well
semantic similarity is encoded in the latent representations of both triples and
sentences, and they are important to capture with the goal of de ning optimal
pair-wise distance matrices D and D0 in mind. To formalize the notion of
structure, I introduce a de nition of clusterability, following the work of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For some
dataset X 2 Rn, a description of the clusterability of X is a function c : X ! v
where v 2 R is a real value. Here, v is a measure of how strong a clustering
presence is in the underlying set X.
      </p>
      <p>
        To test the clusterability hypothesis, I use the spatial histogram (SpatHist)
approach to measure the clusterability of each space [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The SpatHist approach
compares the data binned in all d-dimensions to samples randomly generated
in the same d-dimensional space. As many of these bins may end up empty
in high-dimensional embedding spaces, I perform principal component analysis
(PCA) to project down to the two most informative dimensions, split the data
into n equal-width bins, and compute the empirical joint probability mass
function (EPMF). The same is then done for 500 sets of uniformly generated points
with the same feature dimensionality, and the di erences are compared using the
Kullback-Leibler (KL) divergence { higher KL divergence indicates more
clusterability. I report the mean and standard deviation of each of these experiments
as my nal estimates of clusterability. Additionally, I apply the Hopkins test
of uniformity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. As the Hopkins test statistic tends to zero, the underlying
data exhibits less uniformity, indicating that clustering may be a good way of
exploring the data in an unsupervised way. As the test statistic increases, the
data tends to be more uniformly distributed, exhibiting less of the structure I
seek to exploit.
      </p>
      <p>
        For evaluating the quality of learned alignments, I follow in the tradition
of knowledge graph embedding literature [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and evaluate these results for the
Hits@5 and Hits@10 metrics. Analyzing the results of the top 5 and top 10
closest matches allows for a nearest neighborhood analysis of each aligned
embbedding instance, helping to pinpoint areas for future improvement, such as
the mitigation of the in uence of hubs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] .
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Preliminary Results</title>
      <p>
        To establish a baseline for this task, prior work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] tested the ability for a
lowcapacity linear model to learn a mapping between sentence and knowledge graph
representations. The purpose of this work is in evaluating sentence
representations, measuring the extent to which they are able to create structure in the
low-dimensional embedding spaces by evaluating how well they cluster together
around their semantic content, in this case the expression of a particular
relationship. Findings on the clustering capacity of a selection of sentence embedding
methods are reproduced below.
      </p>
      <p>
        The results demonstrate the dramatic di erences in the e cacy of sentence
embedding methodologies. In particular, the geometrically motivated GEM
algorithm vastly outperforms all others in terms of semantic clusterability,
especially those using more complex deep neural models. In addition, the GEM
algorithm outperforms all others in terms of Hits@5 and Hits@10 when
performing a simple linear map for alignment (Linear@5,10), and all alignments
show improvement when replacing the linear alignment with optimal transport
techniques (OT@5,10). Utilizing these results gives a clue as to how to build
knowledge graph triple embeddings: by focusing on the novelty of each
predicate and the entities involved, they can be \pushed" into respective areas of
the low-dimensional embedding space, leading to increased cluster cohesion and
higher within cluster semantic similarity. Additional insights and recommended
improvements are presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Model Dim. Linear@5 Linear@10 OT@5 OT@10 SpatHist
Random 300 0.0762 0.0943 0.008 0.021 0.0018
GloVe-mean 300 0.1175 0.1509 0.202 0.236 2.1680
GloVe-DCT 300 0.0249 0.0345 0.105 0.150 1.2412
GEM 300 0.2417 0.3111 0.360 0.397 4.9716
SkipThought 4800 0.0538 0.0665 0.073 0.101 2.9001
QuickThought 2048 0.0319 0.0418 0.067 0.204 1.2082
LASER 1024 0.0956 0.1290 0.115 0.159 2.0528
InferSentV1 4096 0.0560 0.0824 0.104 0.395 2.0010
InferSentV2 4096 0.0598 0.0859 0.118 0.252 2.3067
SentBERT 768 0.1038 0.1307 0.132 0.220 1.2141
I detail related work in the following two areas: word alignment methods and
graph alignment methods. A complete survey of word, sentence and knowledge
graph embeddings as well as methods for aligning each domain can be found
in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Regression models for word-to-word alignment were rst proposed by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as a
means of capturing geometric patterns between embeddings across embedding
spaces. Inconsistencies in this approach were noted by [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] who in turn modi ed
the regression process to add unit length normalization and constrain map to
be orthogonal. Applications of pre-processing and orthogonal constraints spurred
further research into ways to manipulate the source and target embedding spaces
to further express their geometric structures in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Building
`pseudodictionaries' as a means of reducing the amount of necessary training data is
suggested in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The work of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] further explores iterative learning,
alternating between supervised alignment and unsupervised distribution matching, as
well as introducing novel metrics to assess the orthogonality assumptions used
in supervised approaches. A key approach for unsupervised learning is described
in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] where the authors propose leveraging an adversarial learning paradigm.
While the adversarial method directly leverages word frequencies, an alternative
unsupervised method in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] captures these patterns by analyzing the similarity
distributions of the word vectors themselves. Using the Gromov{Wasserstein
distance, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] transform the alignment problem to one of nding an optimal transport
from source X to target Z.
6.2
      </p>
      <p>
        Graph Representation Alignment
The majority of research in the area of knowledge graph embeddings focus on
one speci c task, namely knowledge graph completion, which seeks to make
predictions of the following form: given hs; p; ?i, make the best prediction for an
object o such that the triple is a valid one in the context of the greater graph. The
state-of-the-art methods in this space leverage knowledge graph embeddings,
low-dimensional representations of the entities and relations between them as
vectors. The majority of research in this area focuses on representing the nodes,
or entities, of the graph, with considerable less emphasis on how the relations are
represented, and representations of the entire hs; p; oi triple are rarely considered.
Recent research into representing the entire triple is presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], yet the
results are limited to explorations in clustering and recommendation systems.
This work can be extended to further use cases and eventually tied in with
alignment methods to link knowledge graphs to text documents.
7
      </p>
    </sec>
    <sec id="sec-6">
      <title>Re ection and Future Work</title>
      <p>Based on the initial results presented herein, there is clearly room for
improvement in methodologies for representation learning of triples in a
knowledge graph. Additionally, the alignment of cross-domain representations { those
spanning both text and knowledge graph, is not currently well explored.
Exploration in this area can provide a brigde between the two respective data
realms and provide tooling for unsupervised automation of ontology and
knowledge graph development. Future work will involve measuring the clusterability
of multiple existing knowledge graph embedding algorithms, evaluating their
efcacy in alignments with sentence embeddings, and proposing new, semantically
grounded approaches to embedding triples as singular objects.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adolfsson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ackerman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brownstein</surname>
            ,
            <given-names>N.C.</given-names>
          </string-name>
          :
          <article-title>To cluster, or not to cluster: An analysis of clusterability methods</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>88</volume>
          ,
          <issue>13</issue>
          {26 (Apr
          <year>2019</year>
          ). https://doi.org/10.1016/j.patcog.
          <year>2018</year>
          .
          <volume>10</volume>
          .026, http://dx.doi.org/10.1016/j.patcog.
          <year>2018</year>
          .
          <volume>10</volume>
          .026
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alvarez-Melis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaakkola</surname>
            ,
            <given-names>T.S.</given-names>
          </string-name>
          :
          <article-title>Gromov-wasserstein alignment of word embedding spaces</article-title>
          . CoRR abs/
          <year>1809</year>
          .00013 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1809</year>
          .00013
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Artetxe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labaka</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Learning principled bilingual mappings of word embeddings while preserving monolingual invariance</article-title>
          . pp.
          <volume>2289</volume>
          {
          <issue>2294</issue>
          (01
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>D16</fpage>
          -1250
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Artetxe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labaka</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
          </string-name>
          , E.:
          <article-title>Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations</article-title>
          .
          <source>In: Proceedings of the Thirty-Second AAAI Conference on Arti cial Intelligence</source>
          . pp.
          <volume>5012</volume>
          {
          <issue>5019</issue>
          (
          <year>February 2018</year>
          ), https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16935
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Artetxe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labaka</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
          </string-name>
          , E.:
          <article-title>A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
          </string-name>
          , H.:
          <article-title>Word translation without parallel data</article-title>
          .
          <source>CoRR abs/1710</source>
          .04087 (
          <year>2017</year>
          ), http://arxiv.org/abs/1710.04087
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dettmers</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minervini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stenetorp</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Convolutional 2d knowledge graph embeddings (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fionda</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirro</surname>
          </string-name>
          , G.:
          <article-title>Triple2vec: Learning triple embeddings from knowledge graphs</article-title>
          . CoRR abs/
          <year>1905</year>
          .11691 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1905</year>
          .11691
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kalinowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A comparative study on structural and semantic properties of sentence embeddings (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kalinowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A survey of embedding space alignment methods for language and knowledge graphs (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kristiadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukovnikov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Incorporating literals into knowledge graph embeddings (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lawson</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurs</surname>
            ,
            <given-names>P.C.</given-names>
          </string-name>
          :
          <article-title>New index for clustering tendency and its application to chemical problems</article-title>
          .
          <source>Journal of Chemical Information and Computer Sciences</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <volume>36</volume>
          {
          <fpage>41</fpage>
          (
          <year>1990</year>
          ). https://doi.org/10.1021/ci00065a010, https://pubs.acs.org/doi/abs/10.1021/ci00065a010
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lazaridou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Hubness and pollution: Delving into cross-space mapping for zero-shot learning</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics)</source>
          . pp.
          <volume>270</volume>
          {
          <fpage>280</fpage>
          . Association for Computational Linguistics, Beijing, China (
          <year>2015</year>
          ). https://doi.org/10.3115/v1/
          <fpage>P15</fpage>
          -1027, https://www.aclweb.org/anthology/P15- 1027
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Exploiting similarities among languages for machine translation (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Exploiting similarities among languages for machine translation</article-title>
          .
          <source>CoRR abs/1309</source>
          .4168 (
          <year>2013</year>
          ), http://arxiv.org/abs/1309.4168
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Patra</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moniz</surname>
            ,
            <given-names>J.R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gormley</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neubig</surname>
          </string-name>
          , G.:
          <article-title>Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>184</volume>
          {
          <fpage>193</fpage>
          . Association for Computational Linguistics, Florence, Italy (
          <year>2019</year>
          ), https://www.aclweb.org/anthology/P19-1018
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Improving cross-lingual entity alignment via optimal transport</article-title>
          . pp.
          <volume>3231</volume>
          {
          <issue>3237</issue>
          (08
          <year>2019</year>
          ). https://doi.org/10.24963/ijcai.
          <year>2019</year>
          /448
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Peyre</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuturi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Computational optimal transport (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turban</surname>
            ,
            <given-names>D.H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamblin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammerla</surname>
          </string-name>
          , N.Y.:
          <article-title>O ine bilingual word vectors, orthogonal transformations and the inverted softmax (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Trouillon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welbl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouchard</surname>
          </string-name>
          , G.:
          <article-title>Complex embeddings for simple link prediction</article-title>
          .
          <source>In: International Conference on Machine Learning (ICML)</source>
          . vol.
          <volume>48</volume>
          , pp.
          <year>2071</year>
          {
          <year>2080</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering PP</source>
          ,
          <volume>1</volume>
          {
          <issue>1</issue>
          (
          <issue>09</issue>
          <year>2017</year>
          ). https://doi.org/10.1109/TKDE.
          <year>2017</year>
          .2754499
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Normalized word embedding and orthogonal transform for bilingual word translation</article-title>
          .
          <source>In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          . pp.
          <volume>1006</volume>
          {
          <fpage>1011</fpage>
          . Association for Computational Linguistics, Denver, Colorado (
          <year>2015</year>
          ), https://www.aclweb.org/anthology/N15-1104
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>