<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Automatic Domain Classi cation of LOV Vocabularies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>MONDECA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>boulevard de Strasbourg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paris</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France. &lt;firstname.lastname@mondeca.com&gt;</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Assigning a topic or a domain to a vocabulary in a catalog is not always a trivial task. Fortunately, ontology experts can use their previous experience to easily achieve this task. In the case of Linked Open Vocabularies (LOV), a few number of curators (only 4 people) and the high number of submissions lead to nd automatic solutions to suggest to curators a domain in which to attach a newly submitted vocabulary. This paper proposes a machine learning approach to automatically classify new submitted vocabularies into LOV using statistical models which take any texts description found in a vocabulary. The results show that the Support Vector Machine (SVM) model gives the best micro F1-score of 0:36. An evaluation with twelve vocabularies used for testing the classi er shades light for a possible integration of the results to assist curators in assigning domains to vocabularies in the future.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontologies</kwd>
        <kwd>Classi cation</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Linked Open Vocabularies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Linked Open Data (LOD) refers to the ecosystem of all the open source
structured data which follows the standard web technologies such as RDF, URIs and
HTTP. As the number of available data grows with time, new datasets following
these principles appear. Linked Open Vocabulary (LOV) 1 is an initiative which
aims to reference all the available vocabularies published on the Web
following best practices guided by the FAIR (Findable - Accessible - Interoperable
Reproducible) principles. Each vocabulary can be seen as a knowledge graph,
describing the properties and the purpose of the vocabulary, and which can be
connected to other vocabularies by di erent types of links. Therefore, LOV can
be seen as a knowledge graph of interlinked vocabularies [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] accessible on the
Web of data.
      </p>
      <p>When a new ontology is submitted for integration into LOV, a curator needs
to assign at least one tag representing a domain or a category to the vocabulary
Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>1 https://lov.linkeddata.es/dataset/lov/</title>
      <p>among existing 43 categories, such as \Environment", \Music" or \W3C REC".
A category aims at grouping ontologies according to a domain. For example, the
tag \W3C REC" represents ontologies recommended by the W3 Consortium,
such as rdf or owl. As the number of domains increases and some vocabularies2
can be relatively small, the tagging process can be biased. Figure. 1 depicts
the list of the tags available in LOV as the time of writing this paper, while
Figure 2 depicts their distribution. One of the bene t of assigning a tag to a
vocabulary is to index it according to a domain and make it easy to access
from the interface. For example, to access to vocabularies in the IOT domain,
the direct URL in LOV is https://lov.linkeddata.es/dataset/lov/vocabs?
tag=IoT. Additionally, any newly added vocabulary should belong to at least
one domain.</p>
      <p>We propose a machine learning approach to automatically classify newly
submitted vocabularies with statistical models which take texts describing the
subjects of the vocabularies as input. Indeed, the majority of the graphs contains
a lot of text describing the subjects and the properties of the vocabularies, in
the form of string literals. For example, the URI in a given ontology (Class or
Property) is often described by the predicate rdfs:comment with a text
mentioning the comment of a given resource. Other predicates are often linked to texts
containing information, such as rdfs:label or dct:description. We used all
this text information to train several machine learning models in the purpose of
classifying the vocabularies into di erent categories. This paper is structured as
follows: Section 2 describes related work in graph classi cation, followed by the
2 In this paper, the terms ontology and vocabulary are interchangeable
machine learning approach to build the classi er in Section 3. Section 5 provides
an evaluation of our approach and a brief conclusion in Section 6
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          Graph classi cation is a problem well studied in the literature. Several
strategies have been developed to tackle this problem such as kernel methods or more
recently graph neural networks [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. However, there is way less work made in
knowledge graph classi cation. What comes closest are the entity or triples
classi cation problems which consist in the categorization of a really small subset of
a knowledge graph [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. It is because these types of graphs are mainly described
by their entities and relations, so it would be very di cult to nd similarities or
dissimilarities between knowledge graphs which share very little or not a
common entity or relation, like it is often the case. This is why we used a di erent
approach of traditional graph classi cation methods for our problem, using a
text mining strategy. Indeed, a lot of work has been made in document
classi cation [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Various processing methods have been elaborated such as Bag of
Words or Latent Semantic Analysis (LSA), whose input can be easily exploited
by machine learning algorithms.
        </p>
        <p>
          Classifying datasets created using semantic technologies has been applied
in the literature. The most closest work in the literature is described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
and [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Meusel et al. present a methodology to automatically classify LOD
datasets based on the di erent categories presented in the LOD cloud diagram.
The paper uses eight feature sets from the LOD datasets, among others are text
from rdfs:label. One of the main conclusions of the paper is that
vocabularylevel features are good indicator for the topical domain.
        </p>
        <p>While the mentioned approach uses supervised learning, we apply two more
steps in preparing the corpus for input of the classi er, using Bag-of-Word and
a Truncated SVD transformation. Additionally, we have a very small amount of
corpus inherent to the size of vocabularies compared to the entire LOD datasets,
and a higher number of available tags (43 in LOV compared to 8 for the LOD
cloud).
3
3.1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Preparation and Machine Learning Models</title>
        <sec id="sec-2-2-1">
          <title>Data Preparation</title>
          <p>Our approach has been to use the texts contained in the vocabularies to classify
them into categories. Indeed, usually the subject of a RDF graph and the purpose
of its entities are described in string literals following some speci c predicates.
We rst extract this relevant textual information (string or literal) inside each
graph (a dump representing the latest version of the vocabulary in N3), and
concatenate it into one paragraph describing their subjects. To this end, we rst
download each recent version of the vocabulary from LOV SPARQL endpoint
(taking the most recent version tracked by LOV) and import them into graph
objects with RDFLib3. Listing 1.1 depicts the SPARQL query used to retrieve
the latest version of each vocabulary, alongside with their domains and unique
pre x.</p>
          <p>SELECT DISTINCT ? v o c a b P r e f i x ? domain ? v e r s i o n U r i f
GRAPH &lt;h t t p s : / / l o v . l i n k e d d a t a . e s / d a t a s e t / lov &gt;f
? vocab a v o a f : Vocabulary .
? vocab vann : p r e f e r r e d N a m e s p a c e P r e f i x ? v o c a b P r e f i x .</p>
          <p>? vocab dcterms : m o d i f i e d ? m o d i f i e d .
? vocab dcat : keyword ? domain .
? vocab dcat : d i s t r i b u t i o n ? v e r s i o n U r i .</p>
          <p>BIND ( STRAFTER(STR( ? v e r s i o n U r i ) , "/ v e r s i o n s /" ) as ?v )
BIND(STRBEFORE(STR( ? v ) , " . " ) as ? v1 )
BIND (STR( ? m o d i f i e d ) as ? date )</p>
          <p>FILTER ( ? date = ? v1 )
gg GROUP BY ? v o c a b P r e f i x ? domain ? v e r s i o n U r i
ORDER BY ? v o c a b P r e f i x ? domain ? v e r s i o n U r i</p>
          <p>Listing 1.1: SPARQL query to retrieve the latest versions of vocabularies
stored in LOV</p>
          <p>We then concatenate all the strings followed by the predicates having one of
these su xes : comment, description, label and definition. The predicate
rdfs:label is often used to give a name of an URI in natural language, while
the su xes comment, description and definition are used to give insight on
the meaning and purpose of a given ontology or entity. The result of this step has
been the generation of a paragraph for each vocabulary. As the texts describe the</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 https://github.com/RDFLib/rdflib</title>
      <p>RDF properties of the graphs, they often contain the su xes of these properties
formed of several words not separated by spaces, in camel case format. For
example, if an extracted text mentions the property \UnitPriceSpeci cation", this
expression will remains as a single unit in the nal text. However, it can imply
a bias on the statistical model to be applied on this data. Consequently, we
separate all these types of expression with spaces, when a uppercase occurs in the
middle of a word. Therefore, by using this method, the expression
\UnitPriceSpeci cation" will be transformed to "Unit Price Speci cation" in the nal text.
After this transformation, the whole corpus' vocabulary is formed of 21; 435
different words. The mean word count for the paragraphs is 1168:5, the maximum
is 86208 and the minimum 0. Two paragraphs were empty and 25 of them have
less than 20 words. The text describing the rooms vocabulary 4 obtained with
the pre-processing step described in this section is presented in Listing 1.2. This
ontology describes the rooms one can nd in a building and has the following
assigned tags in LOV: Geography and Environment.</p>
      <p>Floor Section. Contains. Desk. Building. Floor. A space inside a
structure, typically separated from the outside by exterior walls and
from other rooms in the same structure by internal walls. A
humanmade structure used for sheltering or continuous occupancy. Site. A
simple vocabulary for describing the rooms in a building. An agent that
generally occupies the physical area of the subject resource. Having this
property implies being a spatial object. Being the object of this property
implies being an agent. Intended for use with buildings, rooms, desks,
etc. Room. The object resource is physically and spatially contained in
the subject resource. Being the subject or object of this property implies
being a spatial object. Intended for use in the context of buildings,
rooms, etc. A table used in a work or o ce setting, typically for reading,
writing, or computer use. A named part of a oor of a building. Typically
used to denote several rooms that are grouped together based on spatial
arrangement or use. A level part of a building that has a permanent roof.
A storey of a building. Occupant. An area of land with a designated
purpose, such as a university Campus, a housing estate, or a building
site.</p>
      <p>Listing 1.2: Paragraph describing the rooms vocabulary, obtained with the
preprocessing pipeline described in Section 3.</p>
    </sec>
    <sec id="sec-4">
      <title>4 https://lov.linkeddata.es/dataset/lov/vocabs/rooms</title>
      <p>3.2</p>
      <sec id="sec-4-1">
        <title>Machine Learning Models</title>
        <p>
          As we cannot feed directly text paragraphs to the machine learning models, we
applied a processing pipeline for transforming the texts into xed-size vectors
of attributes. For this purpose, we used several techniques described in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] : we
rst apply a Bag-of-Words (BoW) transformation, mapping the texts to vectors
containing the frequencies of each word and ngram made of 2 and 3 words in the
documents which have a frequency value between 0.025 and 0.25. Then, a Term
Frequency-Inverse Document Frequency (TF-IDF) is applied to normalize the
frequencies of the words and ngrams by the length of each document. Finally, we
apply a Latent Semantic Analysis (LSA) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] which is a dimensionality reduction
technique using a linear algebra method called truncated SVD, to map the space
of word frequencies to a smaller space of concepts. Indeed, the dimension of
the TF-IDF vectors is big, as it corresponds to the number of words used in
the whole corpus plus the frequent ngrams (21,435). It is well-known in the
literature that a high number of attributes often impact negatively a machine
learning approach [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We tried di erent values of n representing the dimension
of the vector space : 50, 150 and 300. These vectors of attributes are then used
as input for the machine learning classi ers. The entire processing pipeline is
summarized in Figure 3.
        </p>
        <p>Fig. 3: Schematic view of the processing pipeline. From left to right, the
diagram depicts the di erent steps: 1-Text extraction from Vocabulary dump;
2-BoW Transformation; 3-Normalization with TF-IDF; 4-Vector dimension
reduction and nally the classi ers.</p>
        <p>We then separated the data in two subsets composing of a training set (80% of
the vocabularies) and a test set (the remaining 20%). In this paper, the dataset
version of LOV used for the experiment is the snapshot as of May 7th, 2019
5, containing 666 vocabularies. We claim that the approach described in this
paper can be replicated to any type of machine learning multi-label task with a
knowledge graph as input.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 https://tinyurl.com/lovdataset</title>
      <p>
        As each vocabulary can have one to many tags, we tackle the problem as
a multi-label classi cation task. A machine learning model is trained on the
training set, trying to nd relation between the attributes describing the graphs
and their labels. The trained model is then applied to the test set. The predicted
labels are nally compared to the ones tagged by human curators, and the micro
precision, recall and f1-measure are computed, which are current supervised
learning metrics [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We have tested several machine learning models with the
python library scikit-learn [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], with an emphasis on the Support Vector Machine
(SVM) and the Multi Layer Perceptron (MLP) which are ranked among the
best classi ers for text classi cation task, mainly because they can handle large
feature spaces [
        <xref ref-type="bibr" rid="ref12 ref4">4, 12</xref>
        ]. The K-Nearest-Neighbors (KNN) and the Random Forest
(RF) classi ers have been tested as well, because they natively support
multilabel classi cation, as well as the MLP.
      </p>
      <p>
        However, we had to apply a One-vs-Rest strategy for the SVM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which
consists in training a separate binary classi er for each label. The MLP had one
hidden layer of size 100 with a Recti ed Linear Unit (ReLU)6 activation function.
Similarly, we set the parameters C = 10, gamma = 1 for the SVM, with a radial
basis function kernel (RBF kernel)7 and weighting the classes uniformly. We
chose k = 7 for the KNN model.
4
      </p>
      <sec id="sec-5-1">
        <title>Results</title>
        <p>The results of the classi cation for the 4 machine learning models, using k =
50; 150; 300 for the truncated SVD are presented in Table 1. The MLP and the
SVM give the best micro F1-score respectively of 0.34 and 0.36, with n = 150.
In this section, we describe the evaluation of the classi er on newly submitted
ontologies in LOV, and we discuss the results obtained comparing with manual
assignment by two curators.
6 The ReLU is the most used activation function in neural network. f(z) is zero when
z is less than zero and f(z) is equal to z when z is above or equal to zero.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7 https://en.wikipedia.org/wiki/Radial_basis_function_kernel</title>
      <p>5.1</p>
      <sec id="sec-6-1">
        <title>Evaluation</title>
        <p>For evaluating our model, we took a list of 12 vocabularies in the back-end of
LOV and asked two curators to assign domains to each of the vocabulary. Then,
we passed the same vocabularies to the SVM classi er. The classi er's results is
then compared with the human assignment tags as presented in Table 2.</p>
        <p>As the main goal of the system is to suggest recommendation to a curator,
we compute a soft accuracy metric, corresponding to the number of graph with
at least one match between one of the curator tags and the classi er suggestions,
divided by the total number of tested vocabularies.</p>
        <p>For a vocabulary i, its associated tags yi = fyi1; yi2; :::; yilg and the prediction
of the classi er yipred = nyip1red; yip2red; :::; yim
predo, we say that the classi er is softly
accurate for the vocabulary i if 9yipkred 2 yipred such that yipkred 2 yi. The soft
accuracy is then computed by the ratio of the number p of outputs softly accurate
on the total number n of inputs. We get a result of 0:33 for this evaluation.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Discussion</title>
        <p>The results seem average regarding the precision in the detection from the
classi er, compared to the curator. Their could be several explanations, like the
disparity between the tags in the dataset (13 labels are used in less than 10
vocabularies), or the di erence of subjects in vocabularies tagged by the same
label. For example, the "geography" tag is used for the rooms and the Postcode
8 ontologies, whereas they both describe completely di erent things, thus we can
expect di erent words usage and very di erent feature vectors.</p>
        <p>
          Furthermore, multi-label classi cation for tagging recommendation is a hard
task, especially when the number of possible tag is high (43) and the number of
examples is low (666) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] like in this particular setting . It has been demonstrated
that SVM classi ers work well for text classi cation problem, however their
performance decrease strongly as the number of labels increases [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The list
of domains grows depending on the need and some have a more organizational
function. For example, LOV curators introduced the IOT tag to group all the
vocabularies related to the IoT domain. Historically, some of the tags are related
to W3C vocabularies recommendations (W3C Rec).
6
        </p>
        <sec id="sec-6-2-1">
          <title>Conclusion and Future Work</title>
          <p>This paper addresses one main issue: build and evaluate a classi er based on the
content of LOV catalog using machine learning technique. The nal goal of this
work is to help the human curator of vocabularies to have a list of
recommendations for a new ontology submitted in the back-end. The classi er implemented
gives a micro F1-score of 36%. Although this score seems low, the system will
not be used without a human that validates or not the suggested tag. We do
not intend to compare the system with the human curator. Instead, we want
to have a system that reduce possible risk of bias when assigning domains to
vocabularies and suggest tags to the curator. Future work includes ingesting the
feedback from the curators into the classi er to learn from newly added
vocabularies for a continuous learning work ow, and test deep learning models with a
transfer learning strategy to overcome the low-frequency of training examples.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8 https://lov.linkeddata.es/dataset/lov/vocabs/postcode</title>
      <p>
        Indeed, deep learning approach can perform well on multi-label classi cation,
but it needs a lot of training examples [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Allahyari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pouriyeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Asse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Safaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Trippe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kochut</surname>
          </string-name>
          .
          <article-title>A brief survey of text mining: Classi cation, clustering and extraction techniques</article-title>
          .
          <source>arXiv preprint arXiv:1707.02919</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Evangelista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Embrechts</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          .
          <article-title>Taming the curse of dimensionality in kernels and novelty detection</article-title>
          .
          <source>In Applied soft computing technologies: The challenge of complexity</source>
          , pages
          <volume>425</volume>
          {
          <fpage>438</fpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Halko</surname>
          </string-name>
          , P.-G. Martinsson, and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Tropp</surname>
          </string-name>
          .
          <article-title>Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions</article-title>
          .
          <source>SIAM review</source>
          ,
          <volume>53</volume>
          (
          <issue>2</issue>
          ):
          <volume>217</volume>
          {
          <fpage>288</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          .
          <source>In European conference on machine learning</source>
          , pages
          <volume>137</volume>
          {
          <fpage>142</fpage>
          . Springer,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>I.</given-names>
            <surname>Katakis</surname>
          </string-name>
          , G. Tsoumakas,
          <string-name>
            <surname>and I. Vlahavas.</surname>
          </string-name>
          <article-title>Multilabel text classi cation for automated tag suggestion</article-title>
          .
          <source>In Proceedings of the ECML/PKDD</source>
          , volume
          <volume>18</volume>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>T</surname>
            .-
            <given-names>Y.</given-names>
            Liu, Y.
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wan</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Zeng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , and W.-Y. Ma.
          <article-title>Support vector machines classi cation with a very large-scale taxonomy</article-title>
          .
          <source>Acm Sigkdd Explorations Newsletter</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <volume>36</volume>
          {
          <fpage>43</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Meusel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Towards automatic topical classi cation of lod datasets</article-title>
          .
          <source>In Workshop on Linked Data on the Web</source>
          ,
          <article-title>LDOWco-located with the 24th</article-title>
          <source>International World Wide Web Conference, WWW 19 may</source>
          , volume
          <volume>1409</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Menc</surname>
          </string-name>
          <article-title>a, I. Gurevych, and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          <article-title>rnkranz. Large-scale multilabel text classi cation|revisiting neural networks</article-title>
          .
          <source>In Joint european conference on machine learning and knowledge discovery in databases</source>
          , pages
          <volume>437</volume>
          {
          <fpage>452</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>M.-E. Nilsback</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Automated ower classi cation over a large number of classes</article-title>
          .
          <source>In 2008 Sixth Indian Conference on Computer Vision</source>
          ,
          <source>Graphics &amp; Image Processing</source>
          , pages
          <volume>722</volume>
          {
          <fpage>729</fpage>
          . IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          {
          <fpage>2830</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>D. M. Powers</surname>
          </string-name>
          .
          <article-title>Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation</article-title>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. E. Ruiz</surname>
            and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Srinivasan</surname>
          </string-name>
          .
          <article-title>Automatic text categorization using neural networks</article-title>
          .
          <source>In Proceedings of the 8th ASIS SIG/CR Workshop on Classi cation Research</source>
          , pages
          <volume>59</volume>
          {
          <fpage>72</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Tsoi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagenbuchner</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Monfardini.</surname>
          </string-name>
          <article-title>The graph neural network model</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <volume>61</volume>
          {
          <fpage>80</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <article-title>Machine learning in automated text categorization</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>47</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Meusel</surname>
          </string-name>
          .
          <article-title>Topic pro ling benchmarks in the linked open data cloud: Issues and lessons learned</article-title>
          .
          <source>Semantic Web</source>
          , (Preprint):
          <volume>1</volume>
          {
          <fpage>20</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. P.-Y. Vandenbussche,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Atemezing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Vatant</surname>
          </string-name>
          .
          <article-title>Linked open vocabularies (lov): a gateway to reusable semantic vocabularies on the web</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>437</volume>
          {
          <fpage>452</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Feng, and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Knowledge graph embedding by translating on hyperplanes</article-title>
          .
          <source>In Twenty-Eighth AAAI conference on arti cial intelligence</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>