<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RecLAK: Analysis and Recommendation of Interlinking Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giseli Rabello Lopes</string-name>
          <email>grlopes@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Pereira Nunes</string-name>
          <email>bnunes@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luiz André P. Paes Leme</string-name>
          <email>lapaesleme@ic.uff.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco A. Casanova</string-name>
          <email>casanova@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Institute</institution>
          ,
          <addr-line>UFF, Niterói/RJ</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departament of Informatics</institution>
          ,
          <addr-line>PUC-Rio, Rio de Janeiro/RJ</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the RecLAK , a Web application developed for the LAK Challenge 2014. RecLAK focuses on the analysis of the LAK dataset metadata and provides recommendations of potential candidate datasets to be interlinked with the LAK dataset. RecLAK follows an approach to generate recommendations based on Bayesian classi ers and on Social Networks Analysis measures. Furthermore, RecLAK generates graph visualizations that explore the LAK dataset over other datasets in the Linked Open Data cloud. The results of the experiments contribute to the understanding and improvement of the LAK dataset. Furthermore, it can also help researchers of the elds covered by LAK dataset, such as learning analytics and educational data mining.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Q1. For a dataset d, published in the LOD, is it interesting
for the publisher of d to try to link it to lak ?
Q2. For a dataset d, published in the LOD, is it interesting
for the lak administrator to try to link his dataset to
d?
In more detail, let t and di be two datasets. A link from t
to di is a triple of the form (s; p; o) such that s is de ned
in t and o is de ned in di. We say that t is linked to di, or
that di is linked from t, i there is at least a link from t to
di. We also say that di is relevant for t i there is at least
a resource de ned in di that can be linked from a resource
de ned in t.
Questions Q1 and Q2 are special cases of the dataset
interlinking recommendation problem posed as follows:
Given a nite set of datasets D and a dataset
t, compute a rank score for each dataset di 2 D
such that the rank score of di increases with the
chances of di being relevant for t.</p>
      <p>In this paper, we rst introduce two rank score functions to
address the dataset interlinking recommendation problem.
Then, we apply the functions to answer question Q2.
The remainder of this paper is organized as follows.
Section 2 presents related work. Section 3 brie y describes
the recommendation approaches. Section 4 shows the
result analysis of the metadata exploration and the generated
recommendations. Finally, Section 5 presents some nal
remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        In this paper, we use an extended version [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of previous work
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], that introduced the rank score functions based on the
Bayesian and the Social Network approaches. The extended
version also explores di erent sets of features related to the
metadata of the datasets, such as properties, classes and
vocabularies, to compute the rank score functions.
Nikolov et al. [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] propose an approach to identify
relevant datasets for interlinking applying keywords searches
and ontology matching techniques. Kuznetsov [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes
a linking system, which is responsible for discovering
relevant datasets for a given dataset and for creating instance
level linkage. When compared with these approaches, the
rank score functions applied in this paper use only
metadata and are, therefore, much simpler to compute and yet
achieve a good performance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Loscio et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Wagner et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] propose techniques to
nd relevant datasets for user queries. The rst approach is
based on information quality criteria of correctness, schema
completeness and data completeness while the second one
is based on the overlapping of sets of instances of datasets.
Oliveira et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] use application queries and user feedback
to discover relevant datasets. These papers aim at
recommending datasets with respect to user queries, which is a
problem close, but not identical to the problem discussed in
this paper.
      </p>
      <p>
        Nunes et al. [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] performed several analysis on lak but
their focus was mainly in the dataset content. They also
proposed other datasets to be interlinked with lak considering
their links with DBpedia. By contrast, this paper focuses on
analyzing the metadata for creating rankings of candidate
datasets to be interlinked with lak using di erent
recommendation techniques.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. RECOMMENDATION APPROACHES</title>
    </sec>
    <sec id="sec-4">
      <title>3.1 Bayesian ranking</title>
      <p>A rank score function, inspired on conditional probabilities,
that induces the ranking of the datasets in D (from the
largest to the smallest score), can be de ned as follows:
pa(t; di)
jDj</p>
      <p>
        X
dj2St\Pdi Pdj
1
score(di; t) =
!
X log(P (FjjDi)) + log(P (Di)) (1)
j=1::n
Based on the maximum likelihood estimate of the
probabilities [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in a training set of datasets, the above probabilities
can be estimated as follows:
P (FjjDi) =
      </p>
      <p>count(fj; di)
Pn
j=1 count(fj; di)
; P (Di) =</p>
      <p>count(di)
Pm
i=1 count(di)
where count(fj; di) is the number of datasets in the
training set that have feature fj and that are linked to di, and
count(di) is the number of datasets in the training set that
are linked to di, disregarding the feature set.</p>
      <p>
        For the score function computation, some auxiliary functions
help to avoid computing log(0) replacing this value by c,
which is a constant small enough to penalize the datasets di
that do not have datasets with features Fj linked to them
or that do not have links from other datasets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, the
idea is that, if the set of features of t is very often correlated
with datasets that are linked to di and t is not already linked
to di, then it is recommended to try to link t to di.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Social Network-based ranking</title>
      <p>
        We propose to analyze the dataset interlinking
recommendation problem in much the same way as the link prediction
problem in Social Networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Analogously, the Linked
Data network for D is a directed graph such that the nodes
are the datasets in D and there is an edge between datasets
u and v in D i there is a link from u to v. To obtain more
accurate results, we combine two measures, Preferential
Attachment (pa) and Resource Allocation (ra), into a single
score [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], de ned as follows:
score(t; di) = ra(t; di) +
(2)
pa(t; di) = jPdi j ; ra(t; di) =
where Pdi is the popularity set of a dataset di 2 D, that
is, the set of all datasets in D that have links to di, and
St is the similarity set of a dataset t, that is, the set of all
datasets in D that have features in common with t.
The combined score induces the ranking of the datasets in D
(from the largest to the smallest score) and gives priority to
the ra score; the pa score, normalized by the total number
of datasets to be ranked (jDj), will play a role when there
is a tie or when the ra value is zero.
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. RESULT ANALYSIS</title>
    </sec>
    <sec id="sec-7">
      <title>4.1 Data used in the experiments</title>
      <p>
        We selected a subset of the datasets indexed by the DataHub,
using the Learning Analytics and Knowledge dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as
the target of the recommendation. From the DataHub
catalog, we managed to obtain 295 datasets with at least one
lak
eu-parliament-media
eu-institutions
b3kat
istat-immigration
lobid-resources
oecd-linked-data world-bank-linked-data
      </p>
      <p>the-eurostat-linked-data
eprtr</p>
      <p>norwegian-geo-divisions
educationalprograms_sisvu
ontos-news-portal
grrp
red-uno-internacional-santil ana
global-hunger-index-2011</p>
      <p>euskadi-farmacias
eurostat-rdf open-data-risp
dblp-deusto-gnoss
rechtspraak
interactivemaps-gnoss
eea-rod
sandrart-net
national-diet-library-authorities
rkb-explorer-kaunas
rkb-explorer-darmstadt
rkb-explorer-jisc rkb-explorer-nsf
rkb-explorer-ibm rkb-explorer-risks
enakting-energy</p>
      <p>museosespania-gnoss
proyectoapadrina
lobid-organisaftaiormnsers-markets-geographpirco-sdpaetac-tsu-naitnedd--tsretantdess-gnoss
nextweb-gnoss
rkb-explorer-epsrc
knoesis-linked-sensor-data
gnoss
nytimes-linked-open-data
geonames-semantic-web
uk-legislation-api
rkb-explorer-ieee
rkb-explorer-pisa
rkb-erxkpbl-oerxepr-lolaraesr-southampton</p>
      <p>rkb-explorer-roma
rkb-explorer-wiki
telegraphis
rkb-explorer-newcastle
rkb-explorer-deploy rkb-explorer-eprints
rkb-explorer-kistirkb-explorer-irit
rkb-explorer-ft
rkb-explorer-rae2001
rkb-explorer-resex rkb-explorer-citeserkebr-explorer-dblp</p>
      <p>rkb-explorer-acm rkb-explorer-eurerckobm-explorer-lisbon
rkb-explorer-curriculum
rkb-explorer-ulm</p>
      <p>rkb-explorer-budapest
rkb-explorer-cordis
environment-agency-bathing-water-quality
miguiadeviajes-gnoss
garnicaplywood
fao-linked-data</p>
      <p>event-media
aegp-spanish-textile-and-clothing-association
deustoentrepreneurship</p>
      <p>green-competitiveness-gnoss
ordnance-survey-linked-data
open-data-euskadi</p>
      <p>museums-in-italy
rkb-explorer-ecs</p>
      <p>biographical-directory-of-the-united-states-congress
southampton-ecs-eprints
rkb-explorer-italy
rkb-explorer-dotac rkb-explorer-courseware
ecs
my-experiment
feature (class, property or vocabulary). Among the datasets
with links de ned, there are 139 datasets with 697 known
links. Figure 1 presents a graph representing the datasets
and their known links. In this graph, the size of a dataset
node is proportional to the number of datasets linked to it
(in-degree).</p>
      <p>
        The number of distinct features between classes and
properties was 11,868. The number of relations between datasets
and classes or properties was 16,750, where 6,447 were
references to classes and 10,303 were references to properties. For
the details on how we extracted metadata from the DataHub
catalog, see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>4.2 LAK features</title>
      <p>As features of lak, we used a selected set of classes and
properties obtained from the lak and from the LinkedUp
project Web site. We ltered out, from 51 initial features,
those that were not related to the content of the dataset
and that are used in many datasets, such as owl:sameAs,
rdf:Property, rdfs:Resource, among others. The core of the
selected set comes from the SWC ontology7 (Semantic Web
Conference), which describes academic conferences and
establishes a convention on how to use classes and properties
from other ontologies, mostly FOAF (Friend of a Friend ),
for people and organizations, and SWRC (Semantic Web for
Research Communities), for papers. It also includes
metadata from other ontologies, such as SIOC
(SemanticallyInterlinked Online Communities) and DC (Dublin Core).
The selected lak features added to 37, where 31 of them
are shared by other datasets in our set of data. A preview
of the RecLAK interface showing the selected lak classes is
presented in Figure 2.</p>
    </sec>
    <sec id="sec-9">
      <title>4.3 Datasets with LAK features</title>
      <p>The set of datasets (represented by their id in DataHub)
that have at least one feature in common with lak consists
7http://data.semanticweb.org/ns/swc/ontology
of 132 datasets, with 376 associations between datasets and
lak features. Figure 3 presents a graph representing the
datasets and their associated lak features. In this graph,
the size of a feature node is proportional to the number of
datasets having it.</p>
      <p>Among the lak features, the most popular are from
DC: dc:title, shared by 60 datasets, and dc:creator, with
56 datasets references, and from FOAF: foaf:name and
foaf:homepage with, respectively, 41 and 36 other datasets
beyond lak referring to them. The least popular features are
metadata directly from SWC and SWRC ontologies (some
of them used by only 1 dataset other than lak ).
The datasets with more than 5 features shared with lak are
shown in Table 1. The more expressive result is obtained
by the rkb-explorer-webconf dataset which shares 31
features with lak. This was the most correlated dataset with
the selected classes and properties of lak. The
rkb-explorerwebconf is a semantic repository that publishes RDF linked
data and co-reference information from the RKB Explorer
initiative. This dataset includes information about authors
and publications in several conferences, such as ESWC.</p>
    </sec>
    <sec id="sec-10">
      <title>4.4 Dataset Interlinking recommendations</title>
      <p>Using the score functions, brie y described in Section 3,
we generated recommendations for lak. A preview of the
RecLAK interface presenting the recommendations for LAK
is presented in Figure 4.</p>
      <p>The top 10 recommendations generated by each of the two
approaches (Bayesian and Social Network-based rankings)
and the respective score values estimated for each
recommended dataset are presented in Table 2. The top 10 ranked
datasets for each approach will be brie y described below.
Bayesian ranking. The topmost-ranked is a generic
dataset with concepts from the Semantic Web community.
Dataset #2 is a well-known lexical database of English.
Datasets from #3 to #6 positions of the Bayesian ranking
presented tied scores. Dataset #3 is a dataset with concepts
from tags generated by human annotators. Dataset #4
describes people, research groups and publications of the
members of the Computer Science Department at the University
of She eld. Dataset #5 is maintained by the chamber of
deputies in Italy, which is working to publish quality linked
data in several domains, including research. Dataset #6
describes the DBLP digital library, which provides
bibliographic information on major computer science journals and
proceedings. dblp also indexes the papers published in the
LAK and EDM conferences. Dataset #7 is the Geonames
dataset, which contains information about geographical
locations. Dataset #8 contains information about languages,
words, characters, and other human language-related
entities to the Linked Data Web and Semantic Web. lexvo has
links to WordNet and thesauris. Dataset #9 is a Linked
Data version of the Association for Computing Machinery
(ACM) digital library. Finally, dataset #10 is a dataset of
the Library of Congress Subject Headings (LCSH), which
catalogs materials stored by the Library of Congress and
other libraries around the United States.</p>
      <p>Social Network-based ranking. Since, there is some
overlap between the top 10 recommendations of Social
Network-based (SN-based) and Bayesian ranking, we will
comment the top 10 datasets ranked only by the SN-based
approach. Dataset #2 publishes the news vocabularies used
by The New York Times as Linked Open Data. It
covers data and resources about people, locations and
organizations. Dataset #3 covers topics related to innovation,
technology, business and education. Dataset #6 has links
catalogued in the DataHub for other bibliographic datasets
such as Citeseer, DBLP, ACM, IEEE and EPrints. Dataset
#7 was created with the objective of being capable of
networking the wide range of resources and information held by
libraries and other cultural institutions in German-speaking
countries. This dataset uses established vocabularies, such
as FOAF. Dataset #9 describes e-prints and has links
catalogued in the DataHub for other bibliographic datasets such
as Citeseer, DBLP, ACM and IEEE. Dataset #10 is also
a Linked Data version of publications information of the
DBLP digital library, similar to sweto-dblp.</p>
      <p>Discussion. Based on the top 10 rankings of both
approaches, we identi ed three main groups of candidate
datasets that were recommended to be interlinked with lak :
generic: semanticweb-org, w3c-wordnet,
tags2conrkb-explorer-acm</p>
      <p>rkb-explorer-roma
rkb-explorer-italy osm-semantic-network</p>
      <p>rkb-explorer-lisbon
sweto-dblp
secold
acorn-sat
ifpri-linked-open-data-global-hunger-index</p>
      <p>miguiadeviajes-gnoss
rkb-explorer-newcatostxlecast-toxrefdb
rkb-explorer-ibm
rkb-explorer-cordis</p>
      <p>twc-healthdata
rkb-explorer-irit
vivo-indiana-university rkb-explorer-deploy
rkb-explorer-deepblue
rkb-explorer-ieee rkb-explorer-ulm
debian-package-tracking-nsoysbtelmprizes event-media open-food-facts
foaf:name
swrc:Proceedings
aksworg greek-legal-entities
foaf:homseoupthaamgpeton-ac-uk-prdocfisle-sheffieldkrystian-pietruszka swc:hasRelatedDocument
swc:ConferenceEvent
foaf:membfoearf:mbox_sha1seuumskadi-farmacias
foaf:Person moviinesst-aanrgceen-htiunba-us-congressional-committees
southampton-ac-uk-apps
rkb-explorer-webscience lod2
southampton-ac-uk-phonebook foaf:Organization
swrc:series
cablegate
twc-dbaetan-egfoicviaries-of-the-european-commission
transparency-linked-data sparql-endpoint-staintsutsance-hub-people
foaf:lastName</p>
      <p>semanticweb-org
foaf:firstName national-diet-library-authoeruit-ipeasrliament-media
kdata
eurostat-rdf
national-diet-library-subject-headings
instance-hub-organizations
vivo-weil -cornel -medical-col ege
vivo
qualitywebdata-org
arrayexpress_e-mtab-104
vivo-scripps-research-institute
enipedia</p>
      <p>southampton-ac-uk-services taxonconcept
sandrart-net</p>
      <p>southampton-ac-uk-org
instance-hub-us-federal-agencies
vivo-ponce
delicious, geonames-semantic-web, lexvo,
nytimeslinked-open-data, rkb-explorer-wiki
with smaller popularity and having at least one feature of
lak.
bibliographic: dcs-she edl, linked-open-camera,
sweto-dblp, rkb-explorer-acm, lcsh,
dnb-gemeinsamenormdatei, rkb-explorer-eprints, rkb-explorer-dblp
educational area: gnoss.</p>
      <p>The top 10 recommendations of the rankings di er in
some aspects. Considering the groups identi ed above,
the Bayesian ranking contains a higher number of generic
datasets, while the Social Network-based ranking contains
a higher number of bibliographic datasets. This probably
happens because Bayesian ranking prioritizes
recommendations for lak of datasets linked from the larger number of
other datasets having the larger number of lak features. On
the other hand, the Social Network-based ranking prioritizes
the datasets pointed by the larger number of other datasets
The results also indicate that the selection of the feature set
is very important because it directly in uences the generated
rankings and can lead to recommendations of datasets which
are more as well as less generic. In our experiments with lak,
we ltered out some generic features (e.g., owl:sameAs), but
included DC and FOAF elements. Thus, we expected that
both generic and speci c datasets from our set of datasets
were recommended. As the metadata used to triplify lak
were not using classes and properties speci cally related
to the application domain, this characteristic was not
evidenced in the recommendation results.</p>
    </sec>
    <sec id="sec-11">
      <title>5. CONCLUSIONS</title>
      <p>This paper presented a detailed analysis, based on Bayesian
classi ers and on Social Network Analysis techniques, to
address the dataset interlinking recommendation problem for
lak, using only metadata. Thus, the rank score functions are
potentially useful to reduce the cost of dataset interlinking.
For more information, including the full set of data used
in the experiments, graphical visualizations and detailed
results, we refer to the RecLAK Web application, avaliable at
http://www.inf.puc-rio.br/~grlopes/RecLAK.</p>
    </sec>
    <sec id="sec-12">
      <title>6. ACKNOWLEDGMENTS</title>
      <p>This work was partly funded by CNPq, under grants
160326/2012-5, 301497/2006-0 and 57128/2009-9, and
by FAPERJ, under grants E-26/170028/2008 and
E26/103.070/2011.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked Data</article-title>
          .
          <source>In Design Issues. W3C</source>
          ,
          <year>July 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          .
          <article-title>Scienti c data integration system in the linked open data space</article-title>
          .
          <source>Programming and Computer Software</source>
          ,
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>48</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. A. P. P.</given-names>
            <surname>Leme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <article-title>Identifying candidate datasets for data interlinking</article-title>
          .
          <source>In ICWE'13</source>
          , pages
          <fpage>354</fpage>
          {
          <fpage>366</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. P. P.</given-names>
            <surname>Leme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <article-title>Recommending tripleset interlinking through a social network approach</article-title>
          .
          <source>In WISE'13</source>
          , pages
          <fpage>149</fpage>
          {
          <fpage>161</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. P.</given-names>
            <surname>Paes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <article-title>Comparing recommendation approaches for dataset interlinking</article-title>
          .
          <source>Technical report</source>
          , Department of Informatics, PUC-Rio,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Loscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Batista</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Souza</surname>
          </string-name>
          .
          <article-title>Using information quality for the identi cation of relevant web data sources</article-title>
          .
          <source>In IIWAS'12</source>
          , pages
          <fpage>36</fpage>
          {
          <fpage>44</fpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Jin</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Similarity index based on local paths for link prediction of complex networks</article-title>
          .
          <source>Physical Review E</source>
          ,
          <volume>80</volume>
          (
          <issue>4</issue>
          ):
          <fpage>046122</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Schu</surname>
          </string-name>
          <article-title>tze. Foundations of Statistical Natural Language Processing</article-title>
          . MIT Press,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>d'Aquin. Identifying Relevant Sources for Data Linking using a Semantic Web Index</article-title>
          .
          <source>In LDOW'11</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          , M. d'Aquin,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Motta</surname>
          </string-name>
          .
          <article-title>What Should I Link to? Identifying Relevant Sources and Classes for Data Linking</article-title>
          .
          <source>In JIST'12</source>
          , pages
          <fpage>284</fpage>
          {
          <fpage>299</fpage>
          . Springer Berlin Heidelberg,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          . Cite4me:
          <article-title>Semantic retrieval and analysis of scienti c publications</article-title>
          .
          <source>In LAK (Data Challenge)</source>
          , volume
          <volume>974</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          .
          <article-title>Cite4me: A semantic search and retrieval web application for scienti c publications</article-title>
          .
          <source>In ISWC (Posters &amp; Demos)</source>
          , volume
          <volume>1035</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <volume>25</volume>
          {
          <fpage>28</fpage>
          . CEUR-WS.org,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. R. d.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Tavares</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Loscio</surname>
          </string-name>
          .
          <article-title>Feedback-based data set recommendation for building linked data applications</article-title>
          .
          <source>In I-SEMANTICS'12</source>
          , pages
          <fpage>49</fpage>
          {
          <fpage>55</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Taibi</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <article-title>Fostering analytics on learning analytics research: the lak dataset</article-title>
          .
          <source>In LAK (Data Challenge)</source>
          , volume
          <volume>974</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rettinger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lamm</surname>
          </string-name>
          .
          <article-title>Discovering related data sources in data-portals</article-title>
          . In SemStats workshop, ISWC'
          <volume>13</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>