<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting Linked Open Data as Background Knowledge in Data Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Heiko Paulheim</string-name>
          <email>heiko@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Mannheim, Germany Research Group Data and Web Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many data mining problems can be solved better if they are augmented with additional background knowledge. This paper discusses a framework of adding background knowledge from Linked Open Data to a given data mining problem in a fully automatic, unsupervised manner. It introduces the FeGeLOD framework and its latest implementation, the RapidMiner Linked Open Data extension. We show the use of the approach in di erent problem domains and discuss current research directions.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Mining</kwd>
        <kwd>Background Knowledge</kwd>
        <kwd>Linked Open Data</kwd>
        <kwd>RapidMiner</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Data Mining is the process of identifying novel, valid, and interesting patterns
in data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Using background knowledge can help discovering those patterns,
as well as nding completely new patterns that originate from combining the
original data with additional data from di erent sources.
      </p>
      <p>
        In the recent years, Linked Open Data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has grown to a large collection
of open datasets following well-de ned standards such as RDF. Using explicit
semantics, that data is made interpretable by machines. This paper discusses
a framework and use cases of using that large data collection as background
knowledge in data mining, showing di erent use cases.
      </p>
      <p>
        There are two principle strategies for using Linked Open Data for data
mining:
1. Developing specialized mining methods for Linked Open Data. Examples
include operators for rule learning algorithms, e.g., DL-Learner [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or
specialized kernel functions for support vector machines [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
2. Pre-processing Linked Open Data so that it can be accessed with traditional
(e.g., propositional) data mining methods (e.g., [
        <xref ref-type="bibr" rid="ref15 ref8">8,15</xref>
        ]).
      </p>
      <p>This paper introduces a method that follows the second principle, i.e., a
preprocessing strategy. The rationale is that such a strategy allows for re-using
many existing data mining algorithms and tools, and is thus considered more
versatile.</p>
      <p>The rest of this paper is structured as follows. Section 2 introduces our
theoretical framework, for which the current implementation is discussed in section 3.
Section 4 discusses di erent example applications which use the framework
introduced in this paper, i.e., text classi cation and interpreting statistics. We
conclude with a review of current challenges in section 5, and a short summary.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical Framework</title>
      <p>
        To augment a dataset with Linked Open Data, we propose a general pipeline
comprising three steps:
1. First, entities in Linked Open Data have to be recognized that correspond
entities in the original dataset. For example, in a dataset about cities,
DBpedia1 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or Geonames2 URIs identifying those cities are added to the
dataset.
2. Second, data about the entities in question is extracted using the previously
identi ed URIs. This results in generating additional features and adding
them the original dataset.
3. Since the previous step may, depending on the dataset and the strategies
used, create an abundance of features, employing feature selection is often
necessary before further processing the data with data mining algorithms.
      </p>
    </sec>
    <sec id="sec-3">
      <title>The RapidMiner Linked Open Data Extension</title>
      <p>We have implemented the essential steps of our theoretical framework as an
extension to RapidMiner 3, an open data mining platform4 (except for feature
selection, which is already covered by RapidMiner and other extensions).</p>
      <p>For the entity linking step, di erent strategies are currently supported by
our implementation:
{ Creating links based on custom URI patterns, e.g., appending the string
value of an attribute containing city names to a constant, such as
http://dbpedia.org/resource/.</p>
      <sec id="sec-3-1">
        <title>1 http://dbpedia.org</title>
      </sec>
      <sec id="sec-3-2">
        <title>2 http://www.geonames.org</title>
      </sec>
      <sec id="sec-3-3">
        <title>3 http://dws.informatik.uni-mannheim.de/en/research/</title>
        <p>rapidminer-lod-extension/</p>
      </sec>
      <sec id="sec-3-4">
        <title>4 http://rapid-i.com/content/view/181/</title>
        <p>ISBN City
3-2347-3427-1 Darmstadt</p>
        <p>
          { Discovering links by full text search with SPARQL statements.
{ Using the DBpedia lookup service5, optionally with type restrictions, and
di erent disambiguation strategies.
{ Using the DBpedia Spotlight service6 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] for text processing.
        </p>
        <p>With respect to feature generation, di erent generators are supported by our
implementation:
{ Adding datatype properties as features. This generator includes heuristic
guessing of appropriate attribute types (e.g., recognizing numerics and dates).
{ Adding direct types as boolean features.
{ Adding boolean or numeric features for incoming and outgoing relations.
{ Adding boolean or numeric features for incoming and outgoing relations plus
their type, i.e., using quali ed relations.</p>
        <p>{ Adding features using custom SPARQL queries.</p>
        <p>
          Details on those generators can be found in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Fig. 2 shows an example
RapidMiner process using operators of the Linked Open Data extension.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Example Applications</title>
      <p>Background knowledge from Linked Open Data can be helpful in di erent tasks.
In the following, we show examples from di erent domains, which have been
built using either the RapidMiner Linked Open Data extension or one of its
predecessors.</p>
      <sec id="sec-4-1">
        <title>5 http://lookup.dbpedia.org</title>
      </sec>
      <sec id="sec-4-2">
        <title>6 http://spotlight.dbpedia.org/</title>
        <p>
          4
In [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], a dataset of events extracted from Wikipedia has been introduced. The
dataset contains events that are harvested from Wikipedia pages for years and
months in di erent languages. Each event is represented by a time, a short text,
and links to entities involved in the event (which are DBpedia links for the
Wikipedia pages linked from the event text).
        </p>
        <p>In order to further enrich the dataset, we have assigned classes to the events
automatically. Since some of the events already have classes (since they are
harvested from pages with a topical structure), we had a training and testing
set for English language events available.</p>
        <p>As features for the classi cation, we have used direct types and categories
of the resources involved in an event. The rationale is that for example, sports
events can be identi ed by athletes and/or stadiums being involved, while politics
events can be identi ed by politicians being involed. Using only such binary
features, we were able to achieve an accuracy of 80% for a problem comprising
more ten di erent categories and a training and test set of 1,000 instances.</p>
        <p>
          Furthermore, since we did not use any textual features, but only structured
content from DBpedia, we were able to apply a model trained on the English
dataset on event datasets in other languages as well. We have shown that the
classi cation accuracy is still the same when applying the model trained on the
English dataset to a set of events in German language [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          A similar use case is the classi cation of social media texts, such as Tweets,
for example for the use in emergency management applications [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we
have used di erent textual features for identifying Tweets that talk about car
accidents, being able to achieve a classi cation accuracy of 90%.
        </p>
        <p>In that experiment, we had gathered training data using Tweets from a
particular city. However, when using the trained model on data from another
city, the classi cation accuracy dropped to 85%. The reason is that an over tting
e ect occurs, e.g., the names of major streets are used as indicators for identifying
Tweets about car accidents.</p>
        <p>To avoid the over tting, we used features from DBpedia, rst preprocessing
the Tweets with DBpedia spotlight, and then adding additional types and
categories for the identi ed concepts, just like in the event classi cation experiment
above. Using those abstract concepts (e.g., dbpedia-owl:Road instead of the
name of a particular road) remedies the over tting e ect and keeps the accuracy
up on the same level when applying a model learned on data from one city to
data from a di erent one.
4.2</p>
        <sec id="sec-4-2-1">
          <title>Explaining Statistics</title>
          <p>In most data mining scenarios, there is already some data available. However,
there are also cases where the amount of data is scarce. One typical example are
statistics, where usually only one or a few target variables are produced, e.g.,
conducting a survey on the quality of living in di erent cities, or gathering data
on drug abuse in di erent countries.</p>
          <p>
            In most cases, people working with these statistics, such as journalists, are
interested in nding reasons for the e ects reported in those statistics. In [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ],
we have introduced the prototype tool Explain-a-LOD, which uses the pipeline
discussed above for enriching statistics les. For example, for a statistical dataset
on cities, Linked Open Data sources can provide relevant background knowledge
such as population, climate, major companies, etc.
          </p>
          <p>
            Having enriched the statistics at hand with background information, we
analyze the generated features for correlation with the target variables, as well as
perform rule learning to nd more complex patterns. Fig. 3 depicts a screenshot
of the tool. Further details and examples can be found in [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
5
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Challenges</title>
      <p>The examples above have shown how Linked open Data provides additional value
in di erent data mining problems. However, there are also some challenges to be
addressed.
5.1</p>
      <sec id="sec-5-1">
        <title>Dealing with the Variety of Linked Open Data</title>
        <p>Despite Linked Open Data being built on well-de ned standards, there are
different ways to provide Linked Open Data. In its current version, the RapidMiner
Linked Open Data extension exploits SPARQL endpoints. However, there are
datasets which do not have SPARQL endpoints, but which could provide
interesting background knowledge in many cases, e.g., Freebase7 or OpenCyc8.
For such datasets, di erent implementation of the generation algorithms are
required.</p>
        <p>
          Furthermore, there are non-standard SPARQL constructs, such as COUNT
or the asterisk operator for computing the transitive closure [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] which are
supported by some, but not all endpoints. Such constructs may help computing
        </p>
        <sec id="sec-5-1-1">
          <title>7 http://www.freebase.com/</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>8 http://www.cyc.com/platform/opencyc</title>
          <p>
            certain features in a more performant way. However, it is di cult to determine
the features supported by a SPARQL endpoint automatically, in particular since
vocabularies such as VoID do not provide means to describe technical details of
SPARQL endpoints [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
          <p>Besides technical variations, there are also di erent ways to represent data.
For example, the current generator versions rely on data values being direct
properties of the entities at hand. However, there are di erent cases, as this
example from CORDIS9 shows:
EU18931 a Funding .</p>
          <p>EU18931 has-grant-value [
has-amount 1300000 .</p>
          <p>
            has-unit-of-measure EUR .
]
A similar challenge exists for data using the data cube vocabulary [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ], or time
indexed data, which can exist in various fashions [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ].
5.2
          </p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Discovering Datasets and Exploiting Links</title>
        <p>While our approach is unsupervised in that the user does not need to know
many details about the datasets to use, this does not hold for selecting the</p>
        <sec id="sec-5-2-1">
          <title>9 http://cordis.rkbexplorer.com/</title>
          <p>datasets themselves. Since there are no universal means to nd datasets that
hold information on certain entities, this is di cult to circumvent, nevertheless,
it would be a desirable feature of a fully automatic approach.</p>
          <p>
            A possible way to deal with this problem could be to start with one dataset,
e.g., DBpedia, for which entity recognition can be performed in high quality.
From those datasets, links to other datasets (directly via owl:sameAs, or using
link repositories such as sameas.org 10) can be followed to successively add further
datasets. In that case, SPARQL endpoints should be added automatically and
dynamically, which, given only a URI, has been shown to be a di cult issue [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
5.3
          </p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Improving Entity Linking</title>
        <p>Entity linking is the rst step in our pipeline. It should therefore work with
high quality, since errors made at this step are carried over to later steps (e.g.,
extracting features from entities that do not correspond the entity meant in the
original dataset, such as population gures for a di erent city).</p>
        <p>
          While linking is not di cult for entities such as countries or major cities, it
can become more di cult for other classes, such as universities or animals, as
discussed in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. In particular when not including additional knowledge from
the user (which would contradict the paradigm of an unsupervised approach),
there are cases that are barely distinguishable, e.g., a set of hurricanes and a
set of persons (both of which have person names as attributes). In particular in
the absence of additional attributes and meaningful column headers, it is hardly
possible to reliably nd correct links.
5.4
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>Exploiting Semantics</title>
        <p>Most datasets in Linked Open Data come with at least light-weight explicit
semantics, i.e., they use a vocabulary that contains semantic information in
the form of ontology statements. These can provide valuable information to an
approach for automatically enriching a dataset.</p>
        <p>
          Consider the case where direct types of entities are added as boolean
features. Using the schema information, the features form a hierarchy, e.g., African
Island Island. This hierarchy can be exploited, e.g., for improving feature
selection, similar to the approach described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which helps discovering only
meaningful patterns and avoiding over tting: if both African Island and Island
have the same characteristics (e.g., they are highly correlated, or they have same
information gain w.r.t. a target variable), we can prune the more speci c variable
without losing classi cation accuracy.
5.5
        </p>
      </sec>
      <sec id="sec-5-5">
        <title>Combining Feature Creation and Selection</title>
        <p>The approach discussed above proposes a pipeline of three strictly sequential
steps. In particular, it rst generates the whole set of possible features, which
are then ltered in a subsequent step.
10 http://www.sameas.org/</p>
        <p>
          This kind of approach has certain limitations with respect to the amount
of features that can be generated. In particular, we have not included a
generator which generates features for all individuals linked to a resource for
reasons of scalability. However, in a later step, it would turn out that most of
those features are not useful, and they would be removed again in the next
step. An extreme case is the generator for quali ed relations, which, when
combined with a deep class hierarchy such as YAGO [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], creates features such as
9location 1:ArtSchoolsInP aris, which is true for only one instance (i.e., Paris),
and false for all other cities.
        </p>
        <p>An even more complex strategy for feature generation we have not pursued
so far is the construction of new features from those already generated, e.g.,
the number of cinemas per inhabitants, where both number of cinemas and
population are features from the Linked Open Data set.</p>
        <p>To remedy those problems and arrive at more scalable approaches, it would
be necessary to develop algorithms that combine feature selection and creation
in a joint process. A straight forward approach could be to create features for a
sample of the data rst, determine the relevant features, and then create only
those features for the rest of the dataset. Exploiting semantics, as discussed
above, could lead to more sophisticated approaches.
5.6</p>
      </sec>
      <sec id="sec-5-6">
        <title>Dealing with Dataset Coverage and Biases</title>
        <p>
          Linked Open Data may be incomplete by design, i.e., following the open world
assumption. In many cases, we ignore the open world assumption for creating
data mining features. For example, a generator adding types to an instance adds
the value false if the type is not present { which is not in line with the semantics
of RDF using the open world assumption [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Similarly, adding a feature like
number of companies in the city, which counts the corresponding entities in the
Linked Open Dataset, neglects the open world assumption.
        </p>
        <p>
          Some of these problems may be remedied by trying to detect and ll in
missing information in a preprocessing step. For example, in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we have
introduced an approach which has been shown to complete missing type information
in DBpedia at very high quality.
        </p>
        <p>A more subtle problem is the presence of biases in datasets. In the statistics
use case discussed above, we have frequently observed explanations such as The
quality of living is high in cities where many music recordings were produced, in
particular when using DBpedia as a source of background knowledge. This is
mainly an e ect of a skewed distribution of data in DBpedia (and Wikipedia),
which contains more information on popular culture in the western world, so the
bottom line of this explanation is that cities in the western world have a high
quality of living. However, such biases are di cult to detect automatically.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper, we have introduced both a theoretical framework as well as an
implementation for using Linked Open Data as background knowledge in data
mining. The implementation uses the data mining toolkit RapidMiner, which
allows for combining the Linked Open Data speci c operators with various other
operators for data processing and mining.</p>
      <p>We have shown di erent use cases where background knowledge from Linked
Open Data can improve the results and provide new insights that could not have
been gained from the mere data without background knowledge.</p>
      <p>While Linked Open Data provides a lot of opportunities, there are also a
number of challenges to address, ranging from coping with the large variety
of data representations within Linked Open Data to dealing with complex
features in a scalable way. With this paper, we have sketched a research agenda by
enumerating some of the most prevalent challenges. We are con dent that this
research agenda will lead to a set of approaches and tools that allow for novel
data mining tools which have access to a large amount of knowledge and provide
deeper insights into data.</p>
      <sec id="sec-6-1">
        <title>Acknowledgements</title>
        <p>In the past two years, several people have contributed to the conception and/or
implementation of our framework and the RapidMiner Linked Open Data
extension. The author would like to thank Johannes Furnkranz, Raad Bahmani,
Alexander Gabriel, and Simon Holthausen for their valuable contributions. The
current team developing the extension includes Christian Bizer, Petar Ristoski,
and Evgeny Mitichkin.</p>
        <p>The author would further like to thank all the people developing the
example applications discussed in this paper, including Daniel Hienert and Dennis
Wegener at GESIS, and Axel Schulz at SAP Research.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Keith</surname>
            <given-names>Alexander</given-names>
          </string-name>
          , Richard Cyganiak,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Describing linked datasets</article-title>
          .
          <source>In Linked Data on the Web (LDOW2009)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Petar</given-names>
            <surname>Ristoski Axel Schulz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>I see a car crash: Real-time detection of small scale incidents in microblogs</article-title>
          .
          <source>In ESWC 2013 Satellite Events: Revised Selected Papers</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Usama</given-names>
            <surname>Fayyad</surname>
          </string-name>
          , Gregory Piatetsky-Shapiro,
          <string-name>
            <given-names>and Padhraic</given-names>
            <surname>Smyth</surname>
          </string-name>
          .
          <article-title>From data mining to knowledge discovery in databases</article-title>
          .
          <source>AI magazine</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <fpage>37</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Daniel Hienert, Daniel Wegener, and
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Automatic classi cation and relationship extraction for multi-lingual and multi-granular events from Wikipedia</article-title>
          . In Detection, Representation, and
          <article-title>Exploitation of Events in the Semantic Web (DeRiVE</article-title>
          <year>2012</year>
          ), volume
          <volume>902</volume>
          <source>of CEUR-WS</source>
          , pages
          <volume>1</volume>
          {
          <fpage>10</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Johannes Ho art,
          <string-name>
            <surname>Fabian M. Suchanek</surname>
          </string-name>
          , Klaus Berberich, and Gerhard Weikum.
          <article-title>YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia</article-title>
          .
          <source>Arti cial Intelligence</source>
          , (
          <volume>194</volume>
          ):
          <volume>28</volume>
          {
          <fpage>61</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Yoonjae</given-names>
            <surname>Jeong</surname>
          </string-name>
          and
          <string-name>
            <surname>Sung-Hyon Myaeng</surname>
          </string-name>
          .
          <article-title>Feature selection using a semantic hierarchy for event recognition and type classi cation</article-title>
          .
          <source>In The 6th International Joint Conference on Natural Language Processing</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Venkata</given-names>
            <surname>Narasimha Pavan Kappara</surname>
          </string-name>
          , Ryutaro Ichise, and
          <string-name>
            <given-names>O.P.</given-names>
            <surname>Vyas. Liddm</surname>
          </string-name>
          :
          <article-title>A data mining system for linked data</article-title>
          .
          <source>In Workshop on Linked Data on the Web (LDOW2011)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Jens</given-names>
            <surname>Lehmann</surname>
          </string-name>
          .
          <article-title>Dl-learner: Learning concepts in description logics</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>10</volume>
          :
          <fpage>2639</fpage>
          {
          <fpage>2642</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jens</surname>
            <given-names>Lehmann</given-names>
          </string-name>
          , Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
          <string-name>
            <surname>Pablo N Mendes</surname>
            ,
            <given-names>Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Hellmann</surname>
          </string-name>
          , Mohamed Morsey, Patrick van Kleef,
          <source>Soren Auer</source>
          , et al.
          <article-title>Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Uta Losch, Stephan Bloehdorn, and
          <string-name>
            <given-names>Achim</given-names>
            <surname>Rettinger</surname>
          </string-name>
          .
          <article-title>Graph kernels for rdf data</article-title>
          .
          <source>In The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference, ESWC</source>
          <year>2012</year>
          , Heraklion, Crete, Greece, May
          <volume>27</volume>
          -31,
          <year>2012</year>
          . Proceedings, pages
          <volume>134</volume>
          {
          <fpage>148</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pablo</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mendes</surname>
            , Max Jakob,
            <given-names>Andres</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Silva, and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Dbpedia spotlight: Shedding light on the web of documents</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Generating possible interpretations for statistics from linked open data</article-title>
          .
          <source>In 9th Extended Semantic Web Conference (ESWC)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Type inference on noisy rdf data</article-title>
          .
          <source>In 12th International Semantic Web Conference (ISWC)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>Heiko Paulheim and Johannes Furnkranz. Unsupervised Generation of Data Mining Features from Linked Open Data</article-title>
          .
          <source>In International Conference on Web Intelligence</source>
          , Mining, and
          <source>Semantics (WIMS'12)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sven</given-names>
            <surname>Hertling</surname>
          </string-name>
          .
          <article-title>Discoverability of sparql endpoints in linked open data</article-title>
          .
          <source>In Proceedings of the ISWC 2013 Posters &amp; Demonstrations Track</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Anisa</surname>
            <given-names>Rula</given-names>
          </string-name>
          , Matteo Palmonari, Andreas Harth,
          <article-title>Ste en Stadtmuller, and Andrea Maurino. On the diversity and availability of temporal information in linked open data</article-title>
          .
          <source>In The Semantic Web { ISWC</source>
          <year>2012</year>
          , pages
          <fpage>492</fpage>
          {
          <fpage>507</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Axel</surname>
            <given-names>Schulz</given-names>
          </string-name>
          , Heiko Paulheim, and
          <string-name>
            <given-names>Florian</given-names>
            <surname>Probst</surname>
          </string-name>
          .
          <article-title>Crisis information management in the web 3.0 age</article-title>
          .
          <source>In 9th International ISCRAM Conference</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. W3C.
          <article-title>Resource Description Framework (RDF): Concepts and Abstract Syntax</article-title>
          . http://www.w3.org/TR/rdf-concepts/,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. W3C.
          <article-title>SPARQL New Features and Rationale</article-title>
          . http://www.w3.org/TR/ sparql-features/,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. W3C. RDF. http://www.w3.org/TR/vocab-data-cube/,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>