Embedding Metadata-Enriched Graphs?

                 Stefan Bachhofner1[0000−0001−7785−2090] , Peb Ruswono
             2[0000−0002−1698−1064]
    Aryan                        , Bernhard Krabina1,4[0000−0002−6871−3037] , and
                                    Robert David3
    1
          Vienna University of Economics and Business, Institute for Data, Process and
               Knowledge Management, Welthandelsplatz 1, 1020 Vienna, Austria
                                {forename.surname}@wu.ac.at
        2
           Vienna University of Technology, Vienna, Austria peb.aryan@tuwien.ac.at
                         3
                           Semantic Web Company, Vienna, Austria
             4
                KDZ – Centre for Public Administration Research, Vienna, Austria


           Abstract. This paper presents an on-going research where we study
           the problem of embedding meta-data enriched graphs, with a focus on
           knowledge graphs in a vector space with transformer based deep neural
           networks. Experimentally, we compare ceteris paribus the performance of
           a transformer-based model with other non-transformer approaches. Due
           to their recent success in natural language processing we hypothesize
           that the former is superior in performance. We test this hypothesizes
           by comparing the performance of transformer embeddings with non-
           transformer embeddings on different downstream tasks. Our research
           might contribute to a better understanding of how random walks in-
           fluence the learning of features, which might be useful in the design of
           deep learning architectures for graphs when the input is generated with
           random walks.

           Keywords: Graph Embedding · Knowledge Graph Embedding · Deep
           Learning · Metadata · Random Walks


1        Introduction

Deep Learning (DL) has drastically improved the state-of-the-art on many tasks
in Natural Language Processing (NLP) and Computer Vision (CV) since its
breakthrough in 2012 [2]. For the former, [5] claim that DL is able to learn word
embeddings which capture material science concepts without any supervision,
and that these embeddings can be used to predict materials years before their
discovery. This success has been largely attributed to its ability to learn features
of a concept in an unsupervised manner, therefore eliminating most, if not all, the
?
    This research has received funding form the Teaming.AI project, which is part of
    the European Union’s Horizon 2020 research and innovation program under grant
    agreement No 957402.
    Copyright ©2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2      S. Bachhofner et al.

need for feature engineering. Unsurprisingly, this success and the prospect of next
to none feature engineering lead to an interest in this machine learning technique
from both the graph and semantic web research community. Specifically, a stream
of research emerged which dedicates itself to learning representations of nodes,
edges, sub-graphs, or a whole graph, and any combination of these.
    Two approaches for embedding are DeepWalk, which claims to be the first
that introduces DL for network analytics, and RDF2Vec, where the former is
from the literature on graph embeddings while the latter is from the specialized
community on Knowledge Graph (KG) embeddings. Both first use random walks
to generate sequences which are then fed into a technique originated in NLP.
They hence treat the result of a random walk as being equivalent to a sentence.
    In our research, we are interested in enriching these random walks with meta-
data present in the graph or KG and the effect this has on different DL models.
In particular, we study the ability of transformer based DL models to learn
embeddings from random walks enriched with meta-data. We hypothesize that
the former is superior ceteris paribus to non-transformer methods in learning
representations evaluated by their performance on downstream tasks.


2   Background on Graph Embedding Approaches with
    Deep Learning and Random Walks

At the core of graph embedding approaches with DL and random walks is the
idea to represent the graph as a sequence of random walks, which is the in-
put to the DL model [1]. The random walk is hence a feature engineering pre-
processing step to enable the use of existing DL embedding approaches, which
are usually from NLP. Two frequently used approaches are Continuous bag-of-
words (CBOW) and skip-gram, which are explained in more detail in the next
paragraph. DeepWalk and RDF2Vec are examples for approaches that use them,
where the former is from the literature on graph embeddings, while the latter
is more specialized for knowledge graphs serialized with the Resource Descrip-
tion Framework (RDF). For this embedding family, the random walks are of
paramount importance as they are the input to the DL model. The model hence
relies on the properties of the paths created by the random walk, which in turn
logically implies that the DL model is constrained by (i) the degree to which
they preserve the graph properties, and (ii) the expressiveness of these paths.
We hypothesize that adding meta-data leads to an increase in performance for a
given task ceteris paribus, given the DL model is capable to learn the structure
of meta-data enriched paths. Which is the motivation for this research. In the
next paragraph, we exemplary describe RDF2Vec.
    RDF2Vec is a KG embedding approach specifically designed for RDF seral-
ized KGs, which uses random walks to first generate sequences of a fixed length
d. These sequences are then fed into a 3-layer multi-layer perceptron for train-
ing [3,4]. The vector representation can then be obtained from the hidden layer.
In the original paper, the authors set d to either 4 or 8, and use 500 or 200 walks
per entity, depending on the data set. These sequences can be generated with
                                   Embedding Metadata-Enriched Graphs           3

either random graph walks or the Weisfeiler-Lehman algorithm. As a training
strategy one can either use CBOW or Skip-gram, where one is the inverse of
the other. In the former, the 3-layer perceptron attempts to predict one missing
word in a sequence, while in the latter it attempts to predict surrounding words
of a given word.


3   Related Work
In our research we focus on embedding graphs with random walks and DL. Nat-
urally, this implies similarities with these approaches. However, we differentiate
ourslefes in two important ways. First, our graph walks can in principle contain
a massive amount of meta-data which need to be processed by the DL model.
Which leads to the second distinction, which is the use of a transformer based
model as they have started to outperform recurrent models in NLP [6]. This
might indicate an increased capability to learn structure from sequences, which
actually is the research question we contribute to.
     Due to their success in NLP, the semantic web community has also started
to investigate DL for graph embeddings. For example, [7] use them to generate
embeddings for context-aware and temporal KGs. Their initial empirical results
may provide evidence for their increased capability to learn structure from se-
quences. In particular, they report improvement by a factor of up to 15 on Hit@3
compared to their baseline models TransE, SimplE, and Hol3. However, they also
report a decrease in performance and no increase at all for Hit@1. In addition,
they acknowledge that the baseline models have performance issues which, as
they argue, may be due to a skewed distribution in the data set. Our research
is similar to theirs as we also study the problem of designing DL architectures
that are best suited for learning structure from sequences with additional infor-
mation. We however have a focus on meta-data, while they have a focus on time
and context. This is not a sharp distinction as context may be added to the KG
via meta-data. Finally, our research might shine light on why the transformer
models were not able to improve the performance on the above mentioned Hit@1.
     In our research we do not intend to introduce a new DL architecture or
propose a pre-processing method as the authors above. Instead, we contribute
to a better understanding of how DL learns the graph structure when it has only
access to a set of paths generated by random walks. This means that we will
not alter the architecture and its hyperparameters (for example the number of
filters or the kernel size), except for the case where we want to keep the number
of parameters approximately equal among all architectures. Please find more
experimental details in the next section.


4   Experiment Details
We have a strong focus on reproducability and comparability in our research
(Fig. 1). To ensure that, we take the following steps. First, we will take care
to control all involved random number generators with seeds and will mention
4           S. Bachhofner et al.


                                          :bob                               :bob

                          Random Walk     foaf:name       Add Meta-Data      foaf:name

                                          "Bob"                              "Bob"

                                                                             dct:creator <http://example.com/crawler1>

                                                                             dct:source <http://example.net/text.html>


    Compare                                  Deep Learning
                                             DeepWalk
    Performance on      Representations                                              Encode Sequence
                                             RDF2Vec
    Downstream Tasks                         Transformer Model, e.g. RETRA


    Fig. 1: Embedding Metadata-Enriched Graphs with Random Walks and DL.


them in the paper. Second, we will separate the random walk generation from
the training loop as we first generate the sequences and save them. These will
then be the input to the approaches. Third, will record and save the order in
which these sequences are presented to the DL model. The sequences and order
will be made publicly available. Fourth, we will perform preliminary short exper-
iments that are specifically designed to test reproducability. Fifth, we will make
sure that the number of model parameters are approximately equal given the re-
spective approach and architecture restrictions, e.g. some approaches may have
different parameter scaling factors. Evaluation data sets we are considering are,
inter alia, American Association of University Professors (AAUP), Angewandte
Informatik und Formale Beschreibungsverfahren (AIFB), and British Geological
Survey (BGS).


5       Discussion and Outlook

In this paper, we report on an on-going research were we study the problem
of embedding graphs represented as a sequence of meta-data enriched random
walks with DL. We are in particular interested in the embedding capabilities
of transformer based models. In our experiment design we put sizeable effort
in ensuring reproducability and comparability. Since the quality of embeddings
have a huge influence on downstream tasks (e.g. node prediction and link predic-
tion), our research might have broad implications for many streams of reserach.
Among others, results of our research might have an influence on the quality of
knowledge completion and fact checking technologies, e.g. detecting fake news
and tracing back a news story to its origins. Further, our research might aid the
design of DL architectures for graphs if the input is the result of random walks.


References
1. Cai, H., Zheng, V.W., Chang, K.C.C.: A comprehensive survey of graph embedding:
   Problems, techniques, and applications. IEEE Transactions on Knowledge and Data
   Engineering 30(9), 1616–1637 (2018)
                                       Embedding Metadata-Enriched Graphs                5

2. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
3. Ristoski, P., Paulheim, H.: Rdf2vec: Rdf graph embeddings for data mining. In: The
   Semantic Web – ISWC 2016. pp. 498–514. Springer International Publishing, Kobe,
   Japan (October 2016)
4. Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: Rdf2vec: Rdf graph
   embeddings and their applications. Semantic Web 10(4), 721–752 (2019)
5. Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., Persson,
   K.A., Ceder, G., Jain, A.: Unsupervised word embeddings capture latent knowledge
   from materials science literature. Nature 571(7763), 95–98 (2019)
6. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
   u., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International
   Conference on Neural Information Processing Systems. p. 6000–6010. NIPS’17, Cur-
   ran Associates Inc., Red Hook, NY, USA (2017)
7. Werner, S., Rettinger, A., Halilaj, L., Lüttin, J.: RETRA: Recurrent transformers for
   learning temporally contextualized knowledge graph embeddings. In: The Semantic
   Web. pp. 425–440. Springer International Publishing, Cham (2021)