<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Unsupervised Machine Learning Approaches for Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filippo Minutella</string-name>
          <email>filippo.minutella@larus-ba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Falchi</string-name>
          <email>fabrizio.falchi@isti.cnr.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Manghi</string-name>
          <email>paolo.manghi@isti.cnr.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele De Bonis</string-name>
          <email>michele.debonis@isti.cnr.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Messina</string-name>
          <email>nicola.messina@isti.cnr.it</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Larus Business Automation</institution>
          ,
          <addr-line>Mestre (VE) 30174</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <issue>101017452</issue>
      <abstract>
        <p>Nowadays, a lot of data is in the form of Knowledge Graphs aiming at representing information as a set of nodes and relationships between them. This paper proposes an eficient framework to create informative embeddings for node classification on large knowledge graphs. Such embeddings capture how a particular node of the graph interacts with his neighborhood and indicate if it is either isolated or part of a bigger clique. Since a homogeneous graph is necessary to perform this kind of analysis, the framework exploits the metapath approach to split the heterogeneous graph into multiple homogeneous graphs. The proposed pipeline includes an unsupervised attentive neural network to merge diferent metapaths and produce node embeddings suitable for classification. Preliminary experiments on the IMDb dataset demonstrate the validity of the proposed approach, which can defeat current state-of-the-art unsupervised methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graphs</kwd>
        <kwd>Unsupervised Machine Learning</kwd>
        <kwd>Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Today, graphs are widely used to represent data in many applications over the Internet. Social
networks, transaction networks, collaboration networks, and all those cases in which data
is composed of entities and relations between them take advantage of the graph structure.
One of the main fields in which this kind of structure is deeply used is the scholarly
communication, where research products are organized in graphs, such as the OpenAIRE Research
Graph [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Algorithms operating on such graphs need to exploit the links among nodes to
understand the whole spectrum of relationships among the diferent entities. With the advent
of deep learning, many architectures were proposed to explicitly deal with relationships, for
example in the context of information retrieval [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or multimodal matching [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Over the past
decade, many algorithms were proposed to operate with heterogeneous graphs, i.e. graphs that
contain diferent types of nodes and edges. An algorithm over a heterogeneous graph works
by extracting its homogeneous forms using the metapath approach. This approach consists in
replacing the link chain between two entities of the same type with a direct link. For example,
in a graph with actors and movies in which a relation between the entities indicates that the
actor played a role in the movie, the actor-movie-actor metapath extracts the homogeneous
form that contains only actor nodes, with edges encoding the played a role in the same movie
relationship. Although Graph Neural Networks (GNN) seem very prominent in this field, their
applicability is limited in large knowledge graphs. In many cases, in fact, a subgraph sampling
may be required when the graph is dense, while the addition of virtual nodes may be necessary
when the graph is too sparse.
      </p>
      <p>In the light of these observations, we propose an eficient and scalable pipeline to process
very large heterogeneous knowledge graphs. Our objective consists in classifying the nodes in
the graph given the node attributes and the node neighborhood. We target the IMDb dataset,
the world’s most popular and authoritative database for movie, TV, and celebrity content,
where the target movie classes to infer are Action, Drama or Comedy. The proposed approach
leverages the metapath approach to obtain multiple but simpler homogeneous graphs and
constructs node embeddings using FastRP, a widely-used random projection algorithm. Then,
an attentive neural network is trained in an unsupervised manner to aggregate information
from diferent metapaths and produce embeddings suitable for efective node classification. We
aim to train the neural network in an unsupervised way to emulate the scarcity of annotated
data, a widespread scenario in large knowledge graphs scraped from the Internet. Furthermore,
forging informative node embeddings without direct supervision enables the creation of features
suitable for multiple downstream tasks.</p>
      <p>We show that this simple approach can obtain state-of-the-art results on node classification
in the unsupervised regime on IMDb.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Deep learning on heterogeneous and homogeneous graphs has been deeply studied in literature
from many points of view. Many of the approaches take advantage of Graph Neural Networks
(GNNs) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A GNN is a class of deep learning methods designed to perform inference on
data described by graphs. They provide an easy way to perform node-level, edge-level, and
graph-level prediction tasks. The advantage of GNNs is that they can use features and attributes
of nodes in the neighborhood to create an embedding that captures the graph’s topology.
      </p>
      <p>
        Diferently from GNNs, diferent approaches try to exploit explicit mathematical formulations
to aggregate information from the neighborhood. The simplest approach consists of extracting
features from the nodes’ observable properties in the graph, such as degree, centrality, or
betweenness. Other approaches try to take advantage of the adjacency matrix using
dimensionality reduction techniques to extract dense vectors for each node. An example included in this
category is the FastRP algorithm[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Finally, the last class of approaches uses random walks,
consisting of random traversals of the graph to extract sequences of nodes. This approach is
very similar to word2vec algorithm on texts. Some of the methods included in this category are
DeepWalk [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Node2Vec [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and LINE [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>metapaths
FastRP
class 3
class 2
Heterogeneous graph</p>
      <p>Homogeneous graphs</p>
      <p>Random projections</p>
      <p>Metapaths Aggregation</p>
      <p>Classification</p>
    </sec>
    <sec id="sec-3">
      <title>3. Architecture</title>
      <p>
        The proposed methodology is based on a three-step pipeline, consisting of () the definition of
metapaths, () the extraction of the embeddings using FastRP[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and () the training of the
neural network to intelligently aggregate information from diferent metapaths. An overview
of the approach is shown in Figure 1. Steps () and () can be considered as pre-processing
steps, while the step () is the core of the unsupervised node embedding learning for node
classification.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Pre-processing</title>
        <p>In this work, we use the IMDb knowledge graph. We extract three diferent metapaths to obtain
three homogeneous graphs: the movies linked by the same actors, the movies linked by the
same directors, and the movies linked by the same plot keywords, using the movie-actor-movie,
movie-director-movie, and movie-keyword-movie metapaths, respectively.</p>
        <p>In order to account for the attributes of nodes — the genre, the duration or the year of a
movie, for example — virtual nodes and virtual edges are used. Those virtual elements define
additional metapaths that capture the topological information from the point of view of node
attributes. A feature can be categorical — for example, when the value is taken from a list that
encodes the genre — or numeric. A categorical feature can be represented in the graph by
adding a virtual node for each value that the feature can assume. Diferently, a numeric feature
can be represented in the graph as a single node. The value that an actual node assumes for
that feature is represented as a weighted link, with the weight indicating the numeric value
for that feature. The newly added virtual nodes define new metapaths that are treated as the
standard metapaths.</p>
        <p>
          At this point, dense vectors computed for each node can be propagated through the graph
links to neighboring nodes using a message-passing algorithm. In this work, we use FastRP [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ],
a very fast node embedding algorithm based on random projections.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Unsupervised Metapaths Aggregation</title>
        <p>At the end of the pre-processing procedure, we have a number of dense vectors encoding
neighboring information for each target node. Specifically, we have a number of dense vectors
metapath 1 metapath 2 metapath 3
Block 1</p>
        <p>Block 1
…</p>
        <p>Block K</p>
        <p>GLU</p>
        <p>GLU</p>
        <p>GLU</p>
        <p>FC</p>
        <p>FC</p>
        <p>FC
softmax
for each node equal to the number of metapaths plus the number of the features of target nodes.</p>
        <p>
          The node embeddings obtained from diferent metapaths are aggregated through an attentive
neural network that creates a very informative representation of each node suitable for node
classification. We aim at training this neural network in an unsupervised way, emulating
the scarcity of annotated data, a very common scenario in large knowledge graphs. The
unsupervised training is performed using an approach very similar to masked language model
pre-training, like the one employed in BERT [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Specifically, one of the input vectors is
randomly masked by setting it to zero, and the neural network is forced to predict the values of
all the vectors, including the masked one.
        </p>
        <p>
          The neural network designed in this research is inspired by Tabnet [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and it is detailed
in Figure 2. The network is composed of K blocks. Each block is fed with the input vectors,
aggregates them using an attentive aggregation and outputs the aggregated vector. Specifically,
each block is composed of two submodules, called metapath gating and metapath attention.
The metapath gating submodule is composed of a GLU[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (Gated Linear Unit) component,
which internally performs an attentive gating of the input vectors. The second submodule is
composed of a series of dense layers that return an attentive value for each of the examined
metapaths. These scores are normalized to sum to 1 using a softmax output layer. The output
of the entire block is the weighted average of the vectors from the gating submodule using
the weights computed by the attention submodule. Finally, the K vectors computed by each
block are then summed together to obtain the final node embedding used for the masked node
reconstruction.
        </p>
        <p>The general idea of this neural network is to try to pass the input in simple transformations
(for this the choice of the GLU). In this way, the attention weights created in the second path of
each block can be used to inspect which metapath contributes majorly during the reconstruction
phase.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Preliminary Experiments</title>
      <p>We used the IMDb dataset to train and evaluate our architecture. IMDb (an acronym for Internet
Movie Database) is an online database of information related to films, television programs, home</p>
      <sec id="sec-4-1">
        <title>Metrics</title>
      </sec>
      <sec id="sec-4-2">
        <title>Train %</title>
      </sec>
      <sec id="sec-4-3">
        <title>Macro-F1</title>
      </sec>
      <sec id="sec-4-4">
        <title>Micro-F1</title>
        <p>45.61
47.73
46.23
49.11</p>
        <p>Ours
videos, video games, and streaming content online. For the purpose of this research, we used
the subset containing movies, actors, directors, and keywords of the movie plot. Each movie of
the dataset has only one director, the three main actors, and a variable number of keywords.
The goal is to infer the movie genre (Action, Drama or Comedy), so this task is framed as a
node classification problem.</p>
        <p>
          We compared our approach with other unsupervised methods from the literature, namely
Node2Vec [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], LINE [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], ESim [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], metapath2vec [15], and HERec [16]. The standard evaluation
protocol consists in inferring the node embeddings on the test set and training in a supervised
way a linear support vector machine (SVM) with varying training proportions. We report
the average Macro-F1 and Micro-F1 of 10 runs of each embedding model in Table 1. Since
each movie can have only one label, the Micro-F1 corresponds to the accuracy while Macro-F1
is the average of the F1 over each class. As it can be noticed, our approach defeats current
unsupervised node embedding approaches, obtaining a performance increase of around 4.7%
and 2.0% on Macro-F1 and Micro-F1, respectively, relative to the previous best performing model
(node2vec).
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper, we developed a framework to perform node classification on large heterogeneous
knowledge graphs. The proposed approach employs the metapath approach to transform an
heterogeneous graph into a set of homogeneous graphs that are then analyzed using a node
embedding algorithm. Inspired by neural networks working on tabular data, we developed an
attentive neural network that can smartly aggregate node embeddings from diferent metapaths.
This network does not require direct supervision using the node labels; instead, it is trained in
an unsupervised way by performing masked node embedding reconstruction. The final classes
are learned by training a simple SVM on a slice of the test set. We compared our approach with
other unsupervised methods that use the same training and evaluation protocols on the IMDb
dataset, and we obtained the best results on both Macro-F1 and Micro-F1 metrics.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by “Intelligenza Artificiale per il Monitoraggio Visuale dei Siti
Culturali" (AI4CHSites) CNR4C program, CUP B15J19001040004, and by the OpenAIRE-Nexus project,
[15] Y. Dong, N. V. Chawla, A. Swami, Metapath2vec: Scalable representation learning for
heterogeneous networks (2017) 135–144. URL: https://doi.org/10.1145/3097983.3098036.
doi:10.1145/3097983.3098036.
[16] C. Shi, B. Hu, W. X. Zhao, P. S. Yu, Heterogeneous information network embedding
for recommendation, IEEE Transactions on Knowledge and Data Engineering 31 (2019)
357–370. doi:10.1109/TKDE.2018.2833443.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Houssos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikulicic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jörg</surname>
          </string-name>
          ,
          <article-title>The data model of the openaire scientific communication e-infrastructure</article-title>
          ,
          <source>in: Research Conference on Metadata and Semantic Research</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>168</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Manola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schirrwagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Principe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Artini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Bonis</surname>
          </string-name>
          , et al.,
          <article-title>The openaire research graph data model</article-title>
          ,
          <source>Zenodo</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schirrwagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dimitropoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. La</given-names>
            <surname>Bruzzo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Foufoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Löhden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bäcker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mannocci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Horst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jacewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Czerniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kiatropoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kokogiannaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Bonis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Artini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Ottonello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lempesis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ioannidis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Manola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Principe</surname>
          </string-name>
          , Openaire research graph dump,
          <year>2020</year>
          . URL: https: //doi.org/10.5281/zenodo.4201546. doi:
          <volume>10</volume>
          .5281/zenodo.4201546.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Messina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Carrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gennaro</surname>
          </string-name>
          ,
          <article-title>Learning visual features for relational cbir</article-title>
          ,
          <source>International Journal of Multimedia Information Retrieval</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Messina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gennaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchand-Maillet</surname>
          </string-name>
          ,
          <article-title>Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders</article-title>
          ,
          <source>ACM Transactions on Multimedia Computing</source>
          , Communications, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (TOMM)
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , G. Cui,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Graph neural networks: A review of methods and applications</article-title>
          ,
          <source>AI</source>
          Open 1
          <article-title>(</article-title>
          <year>2020</year>
          )
          <fpage>57</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Sultan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          ,
          <article-title>Fast and accurate network embeddings via very sparse random projection</article-title>
          ,
          <source>in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Perozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          , Deepwalk:
          <article-title>Online learning of social representations</article-title>
          ,
          <source>in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>701</fpage>
          -
          <lpage>710</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mei</surname>
          </string-name>
          , Line:
          <article-title>Large-scale information network embedding</article-title>
          ,
          <source>in: Proceedings of the 24th international conference on world wide web</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1067</fpage>
          -
          <lpage>1077</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL-HLT (1)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ö. Arik</surname>
          </string-name>
          , T. Pfister, Tabnet:
          <article-title>Attentive interpretable tabular learning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>6679</fpage>
          -
          <lpage>6687</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grangier</surname>
          </string-name>
          ,
          <article-title>Language modeling with gated convolutional networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>933</fpage>
          -
          <lpage>941</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          , J. Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Peng,
          <article-title>Meta-path guided embedding for similarity search in large-scale heterogeneous information networks</article-title>
          ,
          <source>CoRR abs/1610</source>
          .09769 (
          <year>2016</year>
          ). URL: http://arxiv.org/abs/1610.09769. arXiv:
          <volume>1610</volume>
          .
          <fpage>09769</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>