<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Discovering Deterministic Relationships in Multi-Label Learning via Linked Open Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eirini Papagiannopoulou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigorios Tsoumakas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Bassiliades</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Aristotle University of Thessaloniki</institution>
          ,
          <addr-line>54124 Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In multi-label learning, each instance can be related with one or more binary target variables. Multi-label learning problems are commonly found in many applications, e.g. in text classi cation where a news article is possible to be both on politics and nance. The main motivation of multi-label learning algorithms is the exploitation of label dependencies in order to improve prediction accuracy. In this paper, we present ongoing work on a method that uses the linked open data cloud to detect relationships between labels, enriches the set of labels with new concepts which are super classes of two or more labels, trains a model on the enhanced training set and nally, makes predictions on the enhanced test set in order to improve the prediction accuracy of the initial labels.</p>
      </abstract>
      <kwd-group>
        <kwd>multi-label learning</kwd>
        <kwd>linked open data</kwd>
        <kwd>semantics</kwd>
        <kwd>WordNet</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In multi-label data, instances are related to one ore more binary target variables,
commonly called labels. Learning from such data has received a lot of attention
from the machine learning and data mining communities in recent years. This
is due to the multitude of practical applications it arises in, and the interesting
research challenges it presents, such as exploiting label dependencies, learning
from rare labels and scaling up to large number of labels [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In several multi-label learning problems, the labels are organized as a tree or
a directed acyclic graph, and there exist approaches that exploit such structure
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, in most multi-label learning problems, at labels are only provided
without any accompanying structure. Yet, it is often the case that implicit
deterministic relationships exist among the labels. For example, in the ImageCLEF
2011 photo annotation task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the learning problem involved 99 labels without
any accompanying semantic meta-data, among which certain deterministic
relationships did exist. In particular, there were several groups of mutually exclusive
labels, such as the four seasons autumn, winter, spring, summer and the
personrelated labels single person, small group, big group, no persons. There were also
several positive entailment (consequence) relationships, such as river ! water
and car ! vehicle.
      </p>
      <p>
        These observations motivated us to consider the automated discovery of such
deterministic relationships as potentially interesting and useful knowledge, and
the exploitation of this explicit knowledge for improving the accuracy of
multilabel learning algorithms. In our previous work in this direction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we
discovered such relationships from the data using techniques based on association rule
mining and exploited them via a deterministic Bayesian network representation.
Here, we investigate the possibility of exploiting Linked Open Data (LOD) for
discovering such relationships. In particular we aim at nding common
ancestor concepts among the labels. We then introduce these ancestors as additional
labels, expanding thus the label space, inspired by approaches that similarly
expand the feature space for data mining problems based on LOD [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Finally,
we apply traditional multi-label learning algorithms that indirectly exploit
relationships and notice that accuracy gains are often achieved this way.
      </p>
      <p>The rest of this paper is organized as follows. Section 2 discusses related work
on discovering and exploiting deterministic relationships in multi-label learning.
In Section 3, we present our method in detail, giving examples to be
comprehensible. In Section 4, we present the evaluation approach we follow and describe
the results of our experiments. Finally, in the last Section, we set open research
issues for discussion and some future extensions for this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The idea of discovering and exploiting deterministic label relationships from
multi-label data was rst discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where relationships were referred to as
constraints. An interesting general point of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was that label constraints can be
exploited either at the learning phase or at a post-processing phase. In addition,
it presented four basic types of constraints and noted that more complex types
of constraints can be represented by combining these basic constraints with
logical connectors. For discovering constraints, it proposed association rule mining,
followed by removal of redundant rules that were more general than others. For
exploiting constraints, it proposed two post-processing approaches for the label
ranking task in multi-label learning. These approaches correct a predicted
ranking when it violates the constraints by searching for the nearest ranking that is
consistent with the constraints. They only di er in the function used to evaluate
the distance between the invalid and a valid ranking. Results on synthetic data
with known constraints showed that constraint exploitation can be helpful, but
results on real-world data and automatically discovered constraints did not lead
to predictive performance improvements.
      </p>
      <p>
        A rule-learning approach for discovering and exploiting deterministic label
relationships in multi-label learning was recently proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In particular,
for each label a separate rule model is constructed, but this model uses the rest
of the labels as additional features. This can lead to rules that include labels
with/without ordinary input features in the preconditions of a rule deriving a
particular label. Such rules are a natural representation of entailment
relationships among labels (and input features).
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <p>Our general goal is to discover deterministic relationships among labels via the
LOD cloud. In this rst step towards our goal, we focused speci cally on
discovering common ancestors of existing labels, with which we extend the existing
label space of a learning problem. We expect standard multi-label learning
algorithms to bene t from this explicit representation of latent relationships among
labels. We also focused on WordNet1 as the LOD resource to search for these
relationships, and leave as future work the exploitation of DBPedia and other
resources.</p>
      <p>
        Names of label attributes sometimes contain an auxiliary keyword that
emphasizes their role as target variables. This typically arises in textual datasets,
where the name of label attributes may coincide with names of input attributes
(words). For example, all labels of the bibtex and bookmarks datasets [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the
delicious dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the EUR-Lex datasets [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] start with \tag ". To deal with
this issue automatically, we rst tokenize label names with a standard set of
delimiters, such as dash, underscore, space, comma and period. Tokens that appear
in all label names are then removed and the rest of the tokens are concatenated
again with a space separator between them.
      </p>
      <p>We then look up each label name in WordNet. If a label name has multiple
senses, we assume that the correct sense is the most frequent one according
to WordNet, which is based on semantically tagged corpora. Next, we obtain
recursively the hypernym-synsets of the determined sense of the label (i.e. the
broader concepts) up to the root of WordNet. We then examine all pairs of
labels that we managed to nd in WordNet and their common ancestors are
used for expanding the original label space of the learning problem that we are
examining. In section 4, we describe two di erent approaches that ignore some
very general senses typically arising as common parents of labels that seem to
have no semantic relationship.</p>
      <p>
        To exemplify our approach, let's consider we are examining the pair of labels
(Winter, Summer) from the ImageCLEF 2011 photo annotation task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. No
auxiliary keywords are used inside these label names, so the rst pre-processing
step leaves them unaltered. We then examine the senses that are related to these
labels (the number inside parentheses at the end of the sense description denotes
the sense's frequency):
{ Winter
1. the coldest season of the year; in the northern hemisphere it extends
from the winter solstice to the vernal equinox (24)
2. spend the winter (2)
{ Summer
1. the warmest season of the year; in the northern hemisphere it extends
from the summer solstice to the autumnal equinox (58)
2. the period of nest development, happiness, or beauty (0)
3. spend the summer (0)
1 http://wordnet.princeton.edu/
      </p>
      <p>Based on their frequency, we assume that the 1st sense of each label is the
correct one and continue to obtain the following branches of hypernym-synsets
of these senses up to the root of WordNet:
{ Winter:wintertime ! season ! period !measure:quantity:amount !
abstraction ! entity
{ Summer:summertime ! season ! period ! measure:quantity:amount !
abstraction ! entity</p>
      <p>This will lead to the addition of the following labels: season, period,
measure:quantity:amount, abstraction and entity. Recall that our method ignores
some common parents that appear at the top of the WordNet hierarchy, because
they will be added for all label pairs (or for most of them) and thus will not bring
any new information. In this example we would not add as labels abstraction and
entity.</p>
      <p>After the label-addition process we should ll in the values of the new labels:
for every new label of each instance if at least one of its children is true then the
speci c new label will be true. In all other cases the new label will be false for
the speci c instance. In our running example, if label Winter or label Summer
is true, then Season will be true, too. Otherwise, Season will be false.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        We use the Calibrated Label Ranking (CLR) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] problem transformation method
for learning multi-label models, which we have mentioned above (Section 2). For
the binary classi cation problems we employ linear Support Vector Machines.
We use the implementations from Mulan [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Weka [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] respectively. We
experiment on the 6 multi-label datasets that are shown in Table 1. Those datasets
were selected because their labels don't have obscure names (e.g. label1, label2
or class1, class2, etc). We split each dataset in train set and test set with
percentage 70% - 30%, respectively. We discuss the results in terms of Mean Average
Precision (MAP) and Logarithmic Loss (LL).
      </p>
      <p>Table 1 presents the above measures for CLR applied to: (i) the original
data, and (ii) the expanded label space via two versions of our approach. In the
1st version, called LOD1, we take into consideration the hypernym-synsets of
the determined sense of the label, for up to two layers, i.e. we obtain parent
and grandparent senses of the determined sense and we ignore the following 32
very general senses that typically arise as common parents of labels that seem to
have no semantic relationship: substance, content, message, theme, topic, subject,
whole, unit, object, entity, abstraction, domain, activity, individual, someone,
somebody, mortal, soul, organism, being, cause, go, locomote, formation, alter,
modify, change, alteration, modi cation, happening, occurence, occurent. These
were selected based on early experiments. In the 2nd version of our approach,
called LOD2, we take into consideration the hypernym-synsets of the determined
sense of the label for all layers up to the root of WordNet and we only ignore
the following 6 general senses: whole, unit, object, entity, abstraction, tag.</p>
      <p>In Table 1, we notice that in all datasets LOD1 leads to improvements in the
Log-Loss measure (smaller is better). In addition, there are 4 datasets (bibtex,
delicious, bookmarks, IMDB-F) where our method improves MAP but there are
also 2 (IC2011, corel5k) where we observe reduction in this measure.
Furthermore, we notice that in almost all datasets LOD2 leads also to improvements
in the Log-Loss measure (except for ImageCLEF2011 where we have a
reduction). Moreover, there are 3 datasets (bibtex, delicious, bookmarks) where our
method improves MAP but in the other 3 (IC2011, corel5k, IMDB-F) we observe
reduction in this measure.</p>
      <p>In Table 1, we also give the number of additional labels for the LOD1 and
LOD2 approach per dataset in order to have an impression of the increase in
the number of labels. We also give the proportion of labels that are found in
WordNet for each dataset.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>
        This work has introduced a method that detects relationships among labels
within multi-label datasets using WordNet (as LOD resource) and exploits them,
enriching the set of labels with the common parents of them in order to make
the learning process more accurate. It is in our immediate plans to conduct
experiments where we will add the lowest common subsumer (LCS) instead of
all common parents for each pair of labels. It would be useful to insert a criterion
that determines whether or not to reject the LCS depending on how far it is from
the root or the initial label of the dataset (its child). We believe that our approach
can be further extended and improved by exploiting additional resources of the
LOD cloud, such as DBPedia, LinkedGeoData, Geospecies knowledge base and
Bio2RDF. It is also in our immediate plans to include a general rst check of all
labels in order to detect the domain that they are referred to. In this way, we will
select the appropriate sense of a label based on the domain of the dataset's labels
and not on the assumption that the correct sense is the most common among
all the senses. Another important direction is the generalization of our approach
so as to be able to discover additional types of relationships among the existing
labels (e.g. mutual exclusion, spouse). In this work, we simply extended the
original label space with the ancestor labels and relied on standard multi-label
learning algorithms to exploit the discovered knowledge. In the future we can
also investigate direct exploitation of this knowledge via techniques such as [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Finally, we intend to apply our approach to additional datasets and to employ
additional evaluation measures using cross-validation as well as including some
statistical tests for the signi cance of our method's improvements.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>M.L.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.H.</surname>
          </string-name>
          :
          <article-title>Introduction to the special issue on learning from multi-label data</article-title>
          .
          <source>Machine Learning</source>
          <volume>88</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>2012</year>
          ) 1{
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vens</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Struyf</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schietgat</surname>
          </string-name>
          , L., Dzeroski, S.,
          <string-name>
            <surname>Blockeel</surname>
          </string-name>
          , H.:
          <article-title>Decision trees for hierarchical multi-label classi cation</article-title>
          .
          <source>Machine Learning 73(2)</source>
          (
          <year>2008</year>
          )
          <volume>185</volume>
          {
          <fpage>214</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Nowak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liebetrau</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The clef 2011 photo annotation and conceptbased retrieval tasks</article-title>
          . In: CLEF (Notebook Papers/Labs/Workshop). (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Papagianopoulou</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsamardinos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Discovering and exploiting entailment relationships in multi-label learning</article-title>
          .
          <source>arXiv preprint arXiv:1404.4038 [cs.LG]</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Furnkranz, J.:
          <article-title>Unsupervised generation of data mining features from linked open data</article-title>
          .
          <source>In: 2nd International Conference on Web Intelligence</source>
          , Mining and Semantics, WIMS '12,
          <string-name>
            <surname>Craiova</surname>
          </string-name>
          , Romania, June 6-8,
          <year>2012</year>
          . (
          <year>2012</year>
          )
          <fpage>31</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <article-title>Furnkranz, J.: Multi-label classi cation with label constraints</article-title>
          .
          <source>In: ECML PKDD 2008 Workshop on Preference Learning</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Loza</given-names>
            <surname>Menc</surname>
          </string-name>
          a, E.,
          <string-name>
            <surname>Janssen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Stacking label features for learning multilabel rules</article-title>
          .
          <source>In: Discovery Science - 17th International Conference, DS</source>
          <year>2014</year>
          , Bled, Slovenia, October 8-
          <issue>10</issue>
          ,
          <year>2014</year>
          ,
          <source>Proceedings. Volume 8777 of Lecture Notes in Computer Science</source>
          . Springer (
          <year>2014</year>
          )
          <volume>192</volume>
          {
          <fpage>203</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Katakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlahavas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Multilabel text classi cation for automated tag suggestion</article-title>
          .
          <source>In: Proceedings of the ECML/PKDD 2008 Discovery Challenge</source>
          , Antwerp, Belgium (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlahavas</surname>
          </string-name>
          , I.:
          <article-title>E ective and e cient multilabel classi cation in domains with large number of labels</article-title>
          .
          <source>In: Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD'08)</source>
          . (
          <year>2008</year>
          )
          <volume>30</volume>
          {
          <fpage>44</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Loza</given-names>
            <surname>Mencia</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          , Furnkranz, J.:
          <article-title>E cient pairwise multilabel classi cation for large scale problems in the legal domain</article-title>
          .
          <source>In: 12th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD</source>
          <year>2008</year>
          , Antwerp, Belgium (
          <year>2008</year>
          )
          <volume>50</volume>
          {
          <fpage>65</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Furnkranz, J., Hullermeier, E.,
          <string-name>
            <surname>Mencia</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brinker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Multilabel classi cation via calibrated label ranking</article-title>
          .
          <source>Machine Learning 73(2)</source>
          (
          <year>2008</year>
          )
          <volume>133</volume>
          {
          <fpage>153</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spyromitros-Xiou s</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Vilcek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlahavas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Mulan: A java library for multi-label learning</article-title>
          .
          <source>Journal of Machine Learning Research (JMLR) 12 (July 12</source>
          <year>2011</year>
          )
          <volume>2411</volume>
          {
          <fpage>2414</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>The weka data mining software: An update</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>11</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>