Iterative Entity Alignment with Improved
              Neural Attribute Embedding

Ning Pang1 , Weixin Zeng1 , Jiuyang Tang1,2 , Zhen Tan1 , and Xiang Zhao1,2
1
    Science and Technology on Information Systems Engineering Laboratory, National
                  University of Defense Technology, Changsha, China
      2
        Collaborative Innovation Center of Geospatial Technology, Wuhan, China
                                xiangzhao@nudt.edu.cn

        Abstract. Entity alignment (EA) aims to detect equivalent entities in
        different knowledge graphs (KGs), which can facilitate the integration
        of knowledge from multiple sources. Current EA methods usually har-
        ness KG embeddings to project entities in various KGs into the same
        low-dimensional space, where equivalent entities are placed close to each
        other. Nevertheless, most methods fail to take fully advantage of other
        sources of information, e.g., attribute information, and overlook the neg-
        ative impact brought by lack of labelled data. To overcome these deficien-
        cies, in this paper, we propose to generate neural attribute representation
        by considering both local and global signals. Besides, entity representa-
        tions are refined via an iterative training process on the neural network.
        We evaluate our proposal on real-life datasets against state-of-the-art
        methods, and the results demonstrate the effectiveness of our solution.
        Keywords: Entity alignment · Attribute information · Iterative train-
        ing.


1     Introduction
Knowledge graphs (KGs) are becoming increasingly important for many down-
stream applications such as question answering [1] and sentence generation [2].
A large number of KGs, e.g., YAGO and DBpedia, have been constructed. How-
ever, in reality, these KGs are far from complete. To tackle this problem, various
methods have been proposed, among which KG alignment attracts growing at-
tention since it can incorporate complementary knowledge from multiple external
KGs. Unfortunately, KGs are usually built in different natural languages or with
various ontology systems, resulting in the obstacle of integrating knowledge from
external KGs to refine the target KG. As thus, many research works have been
devoted to improving the performance of KG alignment.
   Current KG alignment approaches lay emphasis on entity alignment (EA),
as entities are the pivots connecting different KGs. The task of EA aims to
identify equivalent entities in different KGs. State-of-the-art methods [3, 4] nor-
mally harness translation-based KG embeddings to project entities and relations
into a low-dimensional embedding space. The separated embedding spaces are
then unified by harnessing seed entity pairs. Eventually given a target entity,
its counterparts in other KGs can be determined in accordance to the distance
in the unified embedding space. Nevertheless, Wang et al. [5] argued that KG
embedding might fail to fully mine the structural information and instead they
utilize graph convolutional network (GCN) [6] to generate entity embeddings.
42     Ning Pang, Weixin Zeng, Jiuyang Tang, Zhen Tan, and Xiang Zhao

Additionally, they proposed to incorporate attribute information to serve as ad-
ditional signals for EA. Due to the limitation of dataset, attribute names are
considered instead of attribute values. Their method has also achieved superior
results on existing EA benchmarks.
   In [5], attribute names are represented as one-hot embeddings of the most
frequent attributes. However, the most frequent attributes appear with the ma-
jority of entities and are not able to help identify a specific entity. Addition-
ally, the neighbourhood attribute information is completely ignored. For in-
stance, to determine the equivalent entity of entity Michael Jordan, optional
attribute hasSpouse would be more useful than obligatory attribute birthDate
since every person has a birthday while not necessarily a spouse. Besides, the at-
tributes of Michael Jordan’s neighbouring entities, e.g., hasNBAChampionship
for Chicago Bulls, can also be harnessed for representing Michael Jordan.
Also, the shortage of labelled data (seed entity pairs) is largely overlooked by
previous works, which will restrain the quality of entity embeddings, and hence,
the performance of EA.
   In this paper, to handle these drawbacks, we devise an iterative entity alignment
method with improved neural attribute embedding, Inga, which enhances EA
performance by harnessing neural network, i.e., GCN, iterative training strategy
and refined attribute information to generate entity representations. In specific,
by incorporating the neighbouring attributes of an entity (local attribute infor-
mation) and the frequency of an attribute (global attribute information) to form
the improved attribute feature vector, more comprehensive signals can be cap-
tured in comparison to the one-hot representation [5]. To deal with the second
limitation, an iterative training strategy is utilized to train GCN, which keeps
labelling unlabelled instances and select high-quality ones to retrain itself so as
to generate better entity embeddings.
   The main contributions of this work are:
– Attribute representation is improved by considering both local and global
  information.
– We apply an iterative training mechanism on GCN to generate more accurate
  structure and attribute representations.
– We evaluate Inga against state-of-the-art methods on three cross-lingual EA
  datasets, and the results demonstrate the effectiveness of our proposal.

Related Works. The task of KG alignment can be traced back to traditional
ontology matching task [7]. With the emergence and prevalence of embedding
techniques, most KG alignment solutions resort to KG embedding for determin-
ing equivalent elements in different KGs. Chen et al. [3] (MTransE) are the first
to utilize TransE to embed entities in each KG into separated embedding spaces,
which are then unified by different alignment models using seed entities pairs.
The distance in the unified embedding space is used to determine entity pairs.
JAPE [4] introduces attribute type information for refining structure representa-
tion captured by KG embedding. GCN [5], on the other hand, harnesses GCN,
instead of KG embedding, to generate entity representation. Attribute informa-
tion, represented as one-hot vectors of most frequent attributes, is also utilized
to complement structure information.
     Iterative Entity Alignment with Improved Neural Attribute Embedding                      43

2    Methodology

Task Definition. A KG is usually represented as G = (E, R, A, V ), where E,
R, A, V denotes entities, relations, attributes and attribute values respectively.
Given two KGs, G1 and G2 , EA aims to automatically mine new aligned entity
pairs based on existing seed entity pairs S = {(ei1 , ei2 )|ei1 ∈ E1 , ei2 ∈ E2 }m
                                                                                 i=1 .


                                                                           Entity

                     e3                         Seed pairs                 Attribute 1
                                       GCN                                 Attribute 2
    e5                                          e11
                                                      e12                  Attribute 3
                                                             e31 e32
                     e1
                                                                           Attribute 4
                                                e21 e22         e42 e
             e4                                                            Structure Information
                            e2                                        41

                                                                           Attribute Information
                                                        Embedding Space


Fig. 1. The framework of our model. Dashed-line rectangle represents the structure
information of e1 . The solid-line rectangle represents the attribute information of e1
with respect to attribute 4. GCN embeds entities into a unified embedding space.


Structure and Attribute Embedding. Equivalent entities in multiple KGs
are assumed to have similar neighbours (structure information) and attribute
names (attribute information). To capture these information, GCN is utilized to
operate on KGs and produce node-level embeddings for all entities. An entity’s
structure information can be represented by xs , as thus, the matrix encoding
structure information of all entities is denoted by Xs , which is randomly initial-
ized and updated during model training in our setting. Similarly, the attribute
feature of an entity can be represented by a vector xa , and the corresponding
attribute feature matrix for all entities is Xa . The initial attribute matrix is
pre-computed, as detailed in the following. Note that following previous works,
here we focus on attribute names, instead of attribute values.
   In [5], attribute information is converted into a k-dimension one-hot vector
encoding k most frequent attributes. Nonetheless, this setting fails to differen-
tiate entities or consider the neighboring information. In our model, the most
frequent attributes (which we define as attributes appearing with more than
80% of entities) are discarded for representing an entity since they appear with
many entities and are not discriminative. Among the rest of attributes, we se-
lect k most frequent ones as they can better distinguish entities and are not too
long-tail (which might result in very sparse attribute matrix). For an entity, its
attribute feature vector can be denoted by xa = [x1a , x2a , ..., xka ], where
                                            ni
                                  xia = Pk        .                            (1)
                                           j=1 nj

ni is the total times of i-th attribute appearing among the attributes of an
entity and its one-hop neighbours, which is harnessed to capture local attribute
information. As thus, both local and global attribute information can be encoded.
44      Ning Pang, Weixin Zeng, Jiuyang Tang, Zhen Tan, and Xiang Zhao

  The inputs of GCN model include Xs , Xa , and adjacency matrix A. By feeding
the inputs into GCN model, the output entity embedding matrix is:

                              [Cs ; Ca ] = GCN (A, [Xs ; Xa ]),                                   (2)

where [; ] denotes the concatenation of two matrices, Cs ∈ RN ×ds is the fi-
nal structure embedding matrix and Ca ∈ RN ×da represents the final attribute
embedding matrix. In our model, we harness two 2-layer GCNs to generate em-
beddings for entities in two KGs respectively. The dimensionalities of structure
and attribute feature vectors are set to ds and da for all layers in respective
models.
Distance Function. A weighted distance function, which combines structure
embedding and attribute embedding, is designed for entity alignment prediction.
Concretely, for ei1 ∈ G1 and ei2 ∈ G2 , the distance can be calculated by:

              Dis(ei1 , ei2 ) = θDiss (ei1 , ei2 ) + (1 − θ)Disa (ei1 , ei2 ),                    (3)

where θ is a hyper-parameter balancing the importance of structure embedding
distance and attribute embedding distance. The structure (attribute) embedding
distance is defined as the vector norm of ci1        i2 i1     i2
                                                s − cs (ca − ca ) divided by the
dimensionality ds (da ). The distance between equivalent entities is expected to
be as small as possible. As thus, the entity in G2 with the smallest distance from
a specific entity ei1 ∈ G1 can be regraded as the counterpart of ei1 .
Loss Function. We use pre-aligned entity pairs S to train GCN models. The
training objectives for learning structure embedding and attribute embedding
are to minimize the following margin-based ranking loss functions,
                   X            X
         Js =                                 ds · [Diss (e1 , e2 ) − Diss (v1 , v2 ) + γs ]+ ,   (4)
                (e1 ,e2 )∈S (v1 ,v2   )∈S −
                   X            X
         Ja =                                 da · [Disa (e1 , e2 ) − Disa (v1 , v2 ) + γa ]+ ,   (5)
                (e1 ,e2 )∈S (v1 ,v2 )∈S −

where [x]+ = max{0, x}, S − denotes the set of negative aligned entity pairs; γs
and γa are two positive margins separating positive and negative aligned entity
pairs. Loss functions Js and Ja are optimized by stochastic gradient descent
(SGD) separately.
Iterative Training. Considering the lack of labelled data, inspired by [8], we
adopt semi-supervised training strategy to enlarge the training set iteratively by
including aligned pairs with high confidence during training process.
  Once newly-aligned entity pairs are added into S, they are considered as valid
training data. However, some false positive pairs may be included, which will
hurt the following training process. Consequently, the key challenge is how to
choose highly confident samples from newly-aligned entity pairs to enlarge S.
In consequence, we consider candidate entity pairs {(ei1 , ej2 )|ei1 ∈ G1 \S1 , ej2 ∈
G2 \S2 }, which satisfy ei1 = arg min Dis( , ej2 ), ej2 = arg min Dis(ei1 , ), as reli-
able aligned pairs for iterative training, where S1 and S2 are the set of pre-aligned
entities in G1 and G2 respectively.
     Iterative Entity Alignment with Improved Neural Attribute Embedding        45

3   Experiment
Datasets. We adopt the widely used DBP15K datasets in the experiments,
which were developed by [4]. The datasets were constructed from subsets of
DBpedia, which has multiple versions in different languages. DBP15K consists of
three datasets, Chinese-English (Zh-En), Japanese-English (Ja-En), and French-
English (Fr-En). In each dataset, there are 15 thousand already-known equivalent
entity pairs, 30% of which are used for training and 70% of which are for testing.
Parameter Settings. In our GCN models, the dimensionality of structure
embedding and attribute embedding in all layers were set to ds = 300 and
da = 600 respectively. The number of top attributes k is set to 1000. The iterative
training processing would not stop until the size of the newly-included set |C|
is under a threshold α = 100. The margins γs and γa are set to 3. The hyper-
parameter θ in weighted distance function is set to 0.9.
Competing Approaches and Evaluation Metric. Three approaches are
utilized for comparison, including MTransE [3], JAPE [4], and GCN [5]. The
evaluation metric, Hits@k, measures the proportion of correctly aligned enti-
ties in top k ranked candidates. We report the results of Hits@1 (accuracy),
Hits@10, and Hits@50 in the experiment.


                          Table 1. Experimental Results

                            Zh − En                    En − Zh
                   Hits@1 Hits@10 Hits@50 Hits@1 Hits@10 Hits@50
          MTransE 30.83      61.41     79.12   24.78   52.42      70.45
           JAPE   41.18      74.46      88.9   40.15   71.05      86.18
           GCN    41.25      74.38     86.23   36.49   69.94      82.45
            Inga  50.45      79.42     89.79   49.36   76.05      86.38
                            Ja − En                    En − Ja
                   Hits@1 Hits@10 Hits@50 Hits@1 Hits@10 Hits@50
          MTransE 27.86      57.45     75.94   23.72   49.92      67.93
           JAPE   36.25       68.5     85.35   38.37   67.27      82.65
           GCN    39.91      74.46      86.1   38.42   71.81      83.72
            Inga  51.46      79.46     88.25   51.05   77.04      86.27
                            F r − En                   En − F r
                   Hits@1 Hits@10 Hits@50 Hits@1 Hits@10 Hits@50
          MTransE 24.41      55.55     74.41   21.26    50.6      69.93
           JAPE   32.39      66.68     83.19   32.97   65.91      82.38
           GCN    37.29      74.49     86.73   36.77   73.06      86.39
            Inga  50.45      79.42     87.79   49.36   76.05      86.48


Experiment Results. The experimental results of Inga and three competi-
tors on DBP15K datasets are shown in Table 1. It can be easily observed that
46      Ning Pang, Weixin Zeng, Jiuyang Tang, Zhen Tan, and Xiang Zhao

Inga achieves the best performance among most settings on three bi-directional
datasets.
   Among the four approaches, MTransE achieves relatively worse results. The
Hits@1 values of MTransE on all datasets are between 20% and 30%, indicating
that translation-based KG embeddings can capture structure information and
serve as useful signals for EA. Another KG embedding based method, JAPE, out-
performs MTransE significantly by over 10% in most cases due to its ability to
incorporate attribute information for refining entity structure embeddings. GCN
attains slightly better results than JAPE on Ja−En and F r−En language pairs,
indicating the effectiveness of GCN model for generating structure representa-
tion. Inga is built on the architecture of GCN, whereas it improves the results by
a large margin. In both alignment directions, Inga outperforms GCN and JAPE
by about 3% − 12% regarding all Hits@k metrics. This demonstrates the use-
fulness of the improved attribute feature representation and iterative training
strategy.
   Noteworthy is that the gap between Inga and the rest approaches is much
larger on Hits@1 (accuracy) than other metrics. This reveals that Inga can align
more accurate entity pairs, which is critical to EA task.
4    Conclusion
In this paper, we propose a GCN-based model to align entities in different KGs
by projecting entities into a unified embedding space, where equivalent entities
are placed close to each other. Attribute representation is improved by capturing
more informative attribute features. Furthermore, we devise an iterative training
strategy to enlarge training set and generate better entity embeddings via neural
network. Our proposal is then evaluated on real-life datasets and the results
demonstrate that our model outperforms three state-of-the-art competitors by
a large margin. For further work, to take more information especially attribute
values as guidance for our model is also necessary.
Acknowledgements. This work was partially supported by NSFC under grants
Nos. 61872446, 61876193 and 71690233.
References
1. J. Yin, X. Jiang, Z. Lu, L. Shang, H. Li, and X. Li. Neural generative question
   answering. In Proceedings of IJCAI, pages 2972–2978, 2016.
2. B. D. Trisedya, J. Qi, R. Zhang, and W. Wang. GTR-LSTM: A triple encoder for
   sentence generation from RDF data. In Proceedings of ACL, pages 1627–1637, 2018.
3. M. Chen, Y. Tian, M. Yang, and C. Zaniolo. Multilingual knowledge graph em-
   beddings for cross-lingual knowledge alignment. In Proceedings of IJCAI, pages
   1511–1517, 2017.
4. Z. Sun, W. Hu, and C. Li. Cross-lingual entity alignment via joint attribute-
   preserving embedding. In Proceedings of ISWC, Part I, pages 628–644, 2017.
5. Z. Wang, Q. Lv, X. Lan, and Y. Zhang. Cross-lingual knowledge graph alignment
   via graph convolutional networks. In Proceedings of EMNLP, pages 349–357, 2018.
6. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional
   networks. CoRR, abs/1609.02907, 2016.
7. F. M. Suchanek, S. Abiteboul, and P. Senellart. PARIS: probabilistic alignment of
   relations, instances, and schema. PVLDB, 5(3):157–168, 2011.
8. H. Zhu, R. Xie, Z. Liu, and M. Sun. Iterative entity alignment via joint knowledge
   embeddings. In Proceedings of IJCAI, pages 4258–4264, 2017.