=Paper=
{{Paper
|id=Vol-3160/short12
|storemode=property
|title=Towards Unsupervised Machine Learning Approaches for Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-3160/short12.pdf
|volume=Vol-3160
|authors=Filippo Minutella,Fabrizio Falchi,Paolo Manghi,Michele De Bonis,Nicola Messina
|dblpUrl=https://dblp.org/rec/conf/ircdl/MinutellaFMBM22
}}
==Towards Unsupervised Machine Learning Approaches for Knowledge Graphs==
<pdf width="1500px">https://ceur-ws.org/Vol-3160/short12.pdf</pdf>
<pre>
Towards Unsupervised Machine Learning Approaches
for Knowledge Graphs
Filippo Minutella1 , Fabrizio Falchi2 , Paolo Manghi2 , Michele De Bonis2 and
Nicola Messina2
1
    Larus Business Automation, Mestre (VE) 30174, Italy
2
    Consiglio Nazionale delle Ricerche, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Pisa (PI) 56124, Italy


                                         Abstract
                                         Nowadays, a lot of data is in the form of Knowledge Graphs aiming at representing information as a
                                         set of nodes and relationships between them. This paper proposes an efficient framework to create
                                         informative embeddings for node classification on large knowledge graphs. Such embeddings capture
                                         how a particular node of the graph interacts with his neighborhood and indicate if it is either isolated or
                                         part of a bigger clique. Since a homogeneous graph is necessary to perform this kind of analysis, the
                                         framework exploits the metapath approach to split the heterogeneous graph into multiple homogeneous
                                         graphs. The proposed pipeline includes an unsupervised attentive neural network to merge different
                                         metapaths and produce node embeddings suitable for classification. Preliminary experiments on the
                                         IMDb dataset demonstrate the validity of the proposed approach, which can defeat current state-of-the-art
                                         unsupervised methods.

                                         Keywords
                                         Knowledge Graphs, Unsupervised Machine Learning, Neural Networks


1. Introduction
Today, graphs are widely used to represent data in many applications over the Internet. Social
networks, transaction networks, collaboration networks, and all those cases in which data
is composed of entities and relations between them take advantage of the graph structure.
One of the main fields in which this kind of structure is deeply used is the scholarly commu-
nication, where research products are organized in graphs, such as the OpenAIRE Research
Graph [1][2][3]. Algorithms operating on such graphs need to exploit the links among nodes to
understand the whole spectrum of relationships among the different entities. With the advent
of deep learning, many architectures were proposed to explicitly deal with relationships, for
example in the context of information retrieval [4] or multimodal matching [5]. Over the past
decade, many algorithms were proposed to operate with heterogeneous graphs, i.e. graphs that
contain different types of nodes and edges. An algorithm over a heterogeneous graph works

IRCDL 2022: 18th Italian Research Conference on Digital Libraries, February 24–25, 2022, Padova, Italy
$ filippo.minutella@larus-ba.it (F. Minutella); fabrizio.falchi@isti.cnr.it (F. Falchi); paolo.manghi@isti.cnr.it
(P. Manghi); michele.debonis@isti.cnr.it (M. De Bonis); nicola.messina@isti.cnr.it (N. Messina)
 https://www.larus-ba.it/ (F. Minutella)
 0000-0001-6258-5313 (F. Falchi); 0000-0001-7291-3210 (P. Manghi); 0000-0003-2347-6012 (M. De Bonis);
0000-0003-3011-2487 (N. Messina)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
by extracting its homogeneous forms using the metapath approach. This approach consists in
replacing the link chain between two entities of the same type with a direct link. For example,
in a graph with actors and movies in which a relation between the entities indicates that the
actor played a role in the movie, the actor-movie-actor metapath extracts the homogeneous
form that contains only actor nodes, with edges encoding the played a role in the same movie
relationship. Although Graph Neural Networks (GNN) seem very prominent in this field, their
applicability is limited in large knowledge graphs. In many cases, in fact, a subgraph sampling
may be required when the graph is dense, while the addition of virtual nodes may be necessary
when the graph is too sparse.
   In the light of these observations, we propose an efficient and scalable pipeline to process
very large heterogeneous knowledge graphs. Our objective consists in classifying the nodes in
the graph given the node attributes and the node neighborhood. We target the IMDb dataset,
the world’s most popular and authoritative database for movie, TV, and celebrity content,
where the target movie classes to infer are Action, Drama or Comedy. The proposed approach
leverages the metapath approach to obtain multiple but simpler homogeneous graphs and
constructs node embeddings using FastRP, a widely-used random projection algorithm. Then,
an attentive neural network is trained in an unsupervised manner to aggregate information
from different metapaths and produce embeddings suitable for effective node classification. We
aim to train the neural network in an unsupervised way to emulate the scarcity of annotated
data, a widespread scenario in large knowledge graphs scraped from the Internet. Furthermore,
forging informative node embeddings without direct supervision enables the creation of features
suitable for multiple downstream tasks.
   We show that this simple approach can obtain state-of-the-art results on node classification
in the unsupervised regime on IMDb.


2. Related Work
Deep learning on heterogeneous and homogeneous graphs has been deeply studied in literature
from many points of view. Many of the approaches take advantage of Graph Neural Networks
(GNNs) [6]. A GNN is a class of deep learning methods designed to perform inference on
data described by graphs. They provide an easy way to perform node-level, edge-level, and
graph-level prediction tasks. The advantage of GNNs is that they can use features and attributes
of nodes in the neighborhood to create an embedding that captures the graph’s topology.
   Differently from GNNs, different approaches try to exploit explicit mathematical formulations
to aggregate information from the neighborhood. The simplest approach consists of extracting
features from the nodes’ observable properties in the graph, such as degree, centrality, or
betweenness. Other approaches try to take advantage of the adjacency matrix using dimension-
ality reduction techniques to extract dense vectors for each node. An example included in this
category is the FastRP algorithm[7]. Finally, the last class of approaches uses random walks,
consisting of random traversals of the graph to extract sequences of nodes. This approach is
very similar to word2vec algorithm on texts. Some of the methods included in this category are
DeepWalk [8], Node2Vec [9], and LINE [10].
                                                                                                          class 1
                                                         FastRP


                        metapaths
                                                         FastRP                                                class 2

                                                                                                class 3


  Heterogeneous graph         Homogeneous graphs   Random projections   Metapaths Aggregation    Classification


Figure 1: Overall pipeline.


3. Architecture
The proposed methodology is based on a three-step pipeline, consisting of (𝑖) the definition of
metapaths, (𝑖𝑖) the extraction of the embeddings using FastRP[7], and (𝑖𝑖𝑖) the training of the
neural network to intelligently aggregate information from different metapaths. An overview
of the approach is shown in Figure 1. Steps (𝑖) and (𝑖𝑖) can be considered as pre-processing
steps, while the step (𝑖𝑖𝑖) is the core of the unsupervised node embedding learning for node
classification.

3.1. Pre-processing
In this work, we use the IMDb knowledge graph. We extract three different metapaths to obtain
three homogeneous graphs: the movies linked by the same actors, the movies linked by the
same directors, and the movies linked by the same plot keywords, using the movie-actor-movie,
movie-director-movie, and movie-keyword-movie metapaths, respectively.
   In order to account for the attributes of nodes — the genre, the duration or the year of a
movie, for example — virtual nodes and virtual edges are used. Those virtual elements define
additional metapaths that capture the topological information from the point of view of node
attributes. A feature can be categorical — for example, when the value is taken from a list that
encodes the genre — or numeric. A categorical feature can be represented in the graph by
adding a virtual node for each value that the feature can assume. Differently, a numeric feature
can be represented in the graph as a single node. The value that an actual node assumes for
that feature is represented as a weighted link, with the weight indicating the numeric value
for that feature. The newly added virtual nodes define new metapaths that are treated as the
standard metapaths.
   At this point, dense vectors computed for each node can be propagated through the graph
links to neighboring nodes using a message-passing algorithm. In this work, we use FastRP [7],
a very fast node embedding algorithm based on random projections.

3.2. Unsupervised Metapaths Aggregation
At the end of the pre-processing procedure, we have a number of dense vectors encoding
neighboring information for each target node. Specifically, we have a number of dense vectors
             metapath 1 metapath 2 metapath 3             Metapath                              Metapath
                                                           Gating                               Attention


   Block 1              Block 1       …         Block K     GLU      GLU   GLU   FC     FC         FC


                                                                                      softmax


Figure 2: Neural Network architecture. Thicker lines carry data for multiple metapaths.


for each node equal to the number of metapaths plus the number of the features of target nodes.
   The node embeddings obtained from different metapaths are aggregated through an attentive
neural network that creates a very informative representation of each node suitable for node
classification. We aim at training this neural network in an unsupervised way, emulating
the scarcity of annotated data, a very common scenario in large knowledge graphs. The
unsupervised training is performed using an approach very similar to masked language model
pre-training, like the one employed in BERT [11]. Specifically, one of the input vectors is
randomly masked by setting it to zero, and the neural network is forced to predict the values of
all the vectors, including the masked one.
   The neural network designed in this research is inspired by Tabnet [12], and it is detailed
in Figure 2. The network is composed of K blocks. Each block is fed with the input vectors,
aggregates them using an attentive aggregation and outputs the aggregated vector. Specifically,
each block is composed of two submodules, called metapath gating and metapath attention.
The metapath gating submodule is composed of a GLU[13] (Gated Linear Unit) component,
which internally performs an attentive gating of the input vectors. The second submodule is
composed of a series of dense layers that return an attentive value for each of the examined
metapaths. These scores are normalized to sum to 1 using a softmax output layer. The output
of the entire block is the weighted average of the vectors from the gating submodule using
the weights computed by the attention submodule. Finally, the K vectors computed by each
block are then summed together to obtain the final node embedding used for the masked node
reconstruction.
   The general idea of this neural network is to try to pass the input in simple transformations
(for this the choice of the GLU). In this way, the attention weights created in the second path of
each block can be used to inspect which metapath contributes majorly during the reconstruction
phase.


4. Preliminary Experiments
We used the IMDb dataset to train and evaluate our architecture. IMDb (an acronym for Internet
Movie Database) is an online database of information related to films, television programs, home
                                                       Unsupervised Methods
 Metrics      Train %
                             LINE       node2vec         ESim      metapath   HERec    Ours
                                                                     2vec
                 20%         44.04        49.00          48.37       46.05    45.61    50.88
 Macro-F1
                 80%         47.49        51.49          51.37       49.99    47.73    53.94
                 20%         45.21        49.94          49.32       47.22    46.23    50.69
 Micro-F1
                 80%         48.98        52.72          52.54       50.50    49.11    53.76

Table 1
Results for node classification on the IMDb dataset.


videos, video games, and streaming content online. For the purpose of this research, we used
the subset containing movies, actors, directors, and keywords of the movie plot. Each movie of
the dataset has only one director, the three main actors, and a variable number of keywords.
The goal is to infer the movie genre (Action, Drama or Comedy), so this task is framed as a
node classification problem.
   We compared our approach with other unsupervised methods from the literature, namely
Node2Vec [9], LINE [10], ESim [14], metapath2vec [15], and HERec [16]. The standard evaluation
protocol consists in inferring the node embeddings on the test set and training in a supervised
way a linear support vector machine (SVM) with varying training proportions. We report
the average Macro-F1 and Micro-F1 of 10 runs of each embedding model in Table 1. Since
each movie can have only one label, the Micro-F1 corresponds to the accuracy while Macro-F1
is the average of the F1 over each class. As it can be noticed, our approach defeats current
unsupervised node embedding approaches, obtaining a performance increase of around 4.7%
and 2.0% on Macro-F1 and Micro-F1, respectively, relative to the previous best performing model
(node2vec).


5. Conclusions
In this paper, we developed a framework to perform node classification on large heterogeneous
knowledge graphs. The proposed approach employs the metapath approach to transform an
heterogeneous graph into a set of homogeneous graphs that are then analyzed using a node
embedding algorithm. Inspired by neural networks working on tabular data, we developed an
attentive neural network that can smartly aggregate node embeddings from different metapaths.
This network does not require direct supervision using the node labels; instead, it is trained in
an unsupervised way by performing masked node embedding reconstruction. The final classes
are learned by training a simple SVM on a slice of the test set. We compared our approach with
other unsupervised methods that use the same training and evaluation protocols on the IMDb
dataset, and we obtained the best results on both Macro-F1 and Micro-F1 metrics.


Acknowledgments
This work was supported by “Intelligenza Artificiale per il Monitoraggio Visuale dei Siti Cultur-
ali" (AI4CHSites) CNR4C program, CUP B15J19001040004, and by the OpenAIRE-Nexus project,
funded by the EC (H2020 - grant agreement No 101017452).


References
 [1] P. Manghi, N. Houssos, M. Mikulicic, B. Jörg, The data model of the openaire scientific
     communication e-infrastructure, in: Research Conference on Metadata and Semantic
     Research, Springer, 2012, pp. 168–180.
 [2] P. Manghi, A. Bardi, C. Atzori, M. Baglioni, N. Manola, J. Schirrwagen, P. Principe, M. Artini,
     A. Becker, M. De Bonis, et al., The openaire research graph data model, Zenodo (2019).
 [3] P. Manghi, C. Atzori, A. Bardi, M. Baglioni, J. Schirrwagen, H. Dimitropoulos, S. La Bruzzo,
     I. Foufoulas, A. Löhden, A. Bäcker, A. Mannocci, M. Horst, P. Jacewicz, A. Czerniak,
     K. Kiatropoulou, A. Kokogiannaki, M. De Bonis, M. Artini, E. Ottonello, A. Lempesis,
     A. Ioannidis, N. Manola, P. Principe, Openaire research graph dump, 2020. URL: https:
     //doi.org/10.5281/zenodo.4201546. doi:10.5281/zenodo.4201546.
 [4] N. Messina, G. Amato, F. Carrara, F. Falchi, C. Gennaro, Learning visual features for
     relational cbir, International Journal of Multimedia Information Retrieval (2019) 1–12.
 [5] N. Messina, G. Amato, A. Esuli, F. Falchi, C. Gennaro, S. Marchand-Maillet, Fine-grained
     visual textual alignment for cross-modal retrieval using transformer encoders, ACM
     Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17
     (2021) 1–23.
 [6] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, M. Sun, Graph neural
     networks: A review of methods and applications, AI Open 1 (2020) 57–81.
 [7] H. Chen, S. F. Sultan, Y. Tian, M. Chen, S. Skiena, Fast and accurate network embeddings via
     very sparse random projection, in: Proceedings of the 28th ACM International Conference
     on Information and Knowledge Management, 2019, pp. 399–408.
 [8] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in:
     Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery
     and data mining, 2014, pp. 701–710.
 [9] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings
     of the 22nd ACM SIGKDD international conference on Knowledge discovery and data
     mining, 2016, pp. 855–864.
[10] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network
     embedding, in: Proceedings of the 24th international conference on world wide web, 2015,
     pp. 1067–1077.
[11] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, in: NAACL-HLT (1), 2019.
[12] S. Ö. Arik, T. Pfister, Tabnet: Attentive interpretable tabular learning, in: Proceedings of
     the AAAI Conference on Artificial Intelligence, volume 35, 2021, pp. 6679–6687.
[13] Y. N. Dauphin, A. Fan, M. Auli, D. Grangier, Language modeling with gated convolutional
     networks, in: International conference on machine learning, PMLR, 2017, pp. 933–941.
[14] J. Shang, M. Qu, J. Liu, L. M. Kaplan, J. Han, J. Peng, Meta-path guided embedding for
     similarity search in large-scale heterogeneous information networks, CoRR abs/1610.09769
     (2016). URL: http://arxiv.org/abs/1610.09769. arXiv:1610.09769.
[15] Y. Dong, N. V. Chawla, A. Swami, Metapath2vec: Scalable representation learning for
     heterogeneous networks (2017) 135–144. URL: https://doi.org/10.1145/3097983.3098036.
     doi:10.1145/3097983.3098036.
[16] C. Shi, B. Hu, W. X. Zhao, P. S. Yu, Heterogeneous information network embedding
     for recommendation, IEEE Transactions on Knowledge and Data Engineering 31 (2019)
     357–370. doi:10.1109/TKDE.2018.2833443.

</pre>