<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DTransX: A Distributed Framework for Knowledge Graph Representation Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jun Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guozheng Rao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang?</string-name>
          <email>xiaowangzhang@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country>China Tianjin</country>
          <institution>Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a distributed framework named DTransX for knowledge graph representation learning in a hybrid parallel way. Firstly, we introduce a hybrid parallel computing model, where embedding model and data are distributed, in order to e ciently process large-scale knowledge graph. Moreover, we propose a distributed model merging method based on word frequency weight of the entity or relation to avoid semantic loss during parallel processing. Finally, we develop a decentralized architecture for parallel embedding to enhance the stability of our embedding system. The experiments show that our proposal can avoid the loss of semantics and even overcome over tting problem of single computing as well as it exhibits signi cant in speeding up training.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In recent years, the translation model has achieved great advances in the work
of knowledge graph completion based on knowledge graph representation
learning. When we build representation model to explore richer semantic
associations on large-scale knowledge graph, we face tow challenges: (1) Big data: the
computing power of single machine is not enough to train large-scale data in
a reasonable time. (2) Large model: the size of the knowledge representation
model is limited to the stand-alone memory level. In existing translation
models, such as TransE [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], TransH [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and TransR [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], all based on the assumption
of stand-alone machine. At present, the parameter servers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is the
mainstream distributed machine learning framework. This architecture rely heavily
on the central parameter server to maintain and update all global parameters [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In this poster, we propose a distributed framework for knowledge graph
embedding. The large model and big data are trained by hybrid parallel computing
model. And we use distributed model merging method to build a more complete
and accurate embedding model. From an architectural point of view, DTransX
is based on a exible decentralized architecture. The experiments on data sets
FB15K [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and WN18 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] showed that our system performed well on link
prediction. And it observably speed up training on a larger data set, wikidata.
* Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Framework of DTransX</title>
      <p>The approach proposed in this section has been implemented in the DTransX.
The framework of DTransX contains three modules, namely, Hybrid-Parallel
Con gurator, Decentralized Trainer and Distributed Merging Executor shown in
the following gure.</p>
      <p>Hybrid-Parallel Con gurator Hybrid-Parallel Con gurator distributes
data and model to computing groups and their subordinate nodes. The role of
Data Processing is to shu e the RDF triples according to the computing power
of each computing group. We use Model Processing to logically parting model in
each computing group, and each node construct local submodel according to the
logical submodel. There are Tj computing nodes in the computing group Gj , and
the model M contains COUNT (M ) parameters, the position of the parameter
piGj is expressed as (RANK (piGj ); OFFSET (piGj )). The node tag RANK (piGj )
and the o set OFFSET (piGj ) are expressed as follows:</p>
      <p>RANK (piGj ) = min(
i COUNT (M )</p>
      <p>Tj
; Tj
1);
OFFSET (piGi ) = i</p>
      <p>COUNT (M ) RANK (piGi )</p>
      <p>Tj
;
where i 2 f0; 1; ; COU N T (M ) 1g.</p>
      <p>Decentralized Trainer The architecture of our framework is decentralized.
In each computing group, the nodes synchronization training, and the
parameters are directly updated to the submodel, which is maintained by the
corresponding node without central node scheduling. In a system with K computing
groups, the information is expressed as f(Gj ; Tj )jj 2 f0; 1; ; Kgg in the public
information block PIB of each group.</p>
      <p>Distributed Merging Executor After each round of training, we unify
the replicas of the model, which are trained in each group, by the Distributed
Merging Executor. In the rst stage, each node computes address information
(RAN K(piGj ); OF F SET (piGj )) by the information in the PIB. Then, each
A Distributed Framework for Representation Learning
node independently pull all pi from other nodes, which is in di erent groups.
Finally, nodes merge models by combining the word frequency weights. The
merging function is de ned as:</p>
      <p>K
pinew = X wiGj piGj ;</p>
      <p>j=0
where wiGj is the word frequency weight of the entity or relation, which
corresponding to piGj in the data set on Gj group.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Evaluations</title>
      <p>Experiments are implemented in a cluster running linux system, and we build
system by C++ and MPI.</p>
      <p>We evaluated the performance of the link prediction on FB15K and WN18,
and used Meak Rank and Hit@10 as evaluation methods under the settings of
\Raw" and \Filter". All experiments were performed under the same
hyperparameters, with the dimension d=100, the margin m=1, and the learning rate =
0.001. The baseline includes TransE, TransH, and TransR. These classical
translation models can be trained in the DTransX framework, denoted by DTransE,
DTransH, and DTransR in the Table 1.
In the process of training, the unbalanced word frequency of the entity or
relation leads to di erent training degrees of the corresponding model vector,
so we use the word frequency as the weight to coordinate the training e ect of
the model vector. In the experiment, link prediction performance in DTransX
is as good as the corresponding translating model on the standard data set,
and some even outperform it. The explanation is that, in our framework, each
computing group uses di erent data sets to train di erent model, so multiple
models merging can reduce the risk of single model falling into local minima,
thus improving the generalization of the whole model.</p>
      <p>We test training e ciency on the wikidata data set (about 68902801 lines)
with di erent numbers of groups and nodes. The results are shown in Table2.</p>
      <p>And we can observe that training time continues to decrease as the increasing
number of groups or nodes.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this poster, we present a hyprid parallel representation learning in knowledge
graph to improve the performance of training without loss of semantics during
distributed training. As an advantage, our approach is independent of data and
embedding models through our experiments compare the three classical
embedding models (TransE, TransH, and TransR). Due to its strong expansibility, we
believe it is helpful in representation learning in large-scale data. In the future
work, we further expand DTransX to support more other types of knowledge
representation learning models, such as ConvE, R-GCN, etc.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Key Research and Development Program
of China (2017YFC0908401) and the National Natural Science Foundation of
China (61672377,61972455). Xiaowang Zhang is supported by the Peiyang Young
Scholars in Tianjin University (2019XRX-0032).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In: Proc. of NIPS</source>
          , pp.
          <volume>2787</volume>
          {
          <issue>2795</issue>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph embedding by translating on hyperplanes</article-title>
          .
          <source>In: Proc. of AAAI</source>
          , pp.
          <volume>1112</volume>
          {
          <issue>1119</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Learning entity and relation embeddings for knowledge graph completion</article-title>
          .
          <source>In: Proc. of AAAI</source>
          , pp.
          <volume>2181</volume>
          {
          <issue>2187</issue>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andersen</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          , et al.:
          <article-title>Scaling distributed machine learning with the parameter server</article-title>
          .
          <source>In: Proc. of OSDI</source>
          , pp.
          <volume>583</volume>
          {
          <issue>598</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , et al.:
          <article-title>Flexps: Flexible parallelism control in parameter server architecture</article-title>
          .
          <source>In: Proc. of PVLDB</source>
          , pp.
          <volume>566</volume>
          {
          <issue>579</issue>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lian</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , W., Liu, J.:
          <article-title>Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent</article-title>
          .
          <source>In: Proc. of NIPS</source>
          , pp.
          <volume>5330</volume>
          {
          <issue>5340</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>