=Paper=
{{Paper
|id=Vol-1481/paper1
|storemode=property
|title=DBpediaSameAs: an Approach to Tackle Heterogeneity in DBpedia Identifiers
|pdfUrl=https://ceur-ws.org/Vol-1481/paper1.pdf
|volume=Vol-1481
|dblpUrl=https://dblp.org/rec/conf/i-semantics/ValdestilhasAK15
}}
==DBpediaSameAs: an Approach to Tackle Heterogeneity in DBpedia Identifiers==
DBpediaSameAs: an Approach to Tackle Heterogeneity in
DBpedia Identifiers
Andre Valdestilhas Natanael Arndt Dimitris Kontokostas
AKSW, Department of AKSW, Department of AKSW, Department of
Computer Science Computer Science Computer Science
Augustusplatz 10 Augustusplatz 10 Augustusplatz 10
D-04109 Leipzig, Germany D-04109 Leipzig, Germany D-04109 Leipzig, Germany
valdestilhas@informatik.uni- arndt@informatik.uni-leipzig.de kontokostas@informatik.uni-
leipzig.de leipzig.de
ABSTRACT Further DBpedia has more than one URI representing the
The DBpedia dataset has multiple URIs within the dataset same resource within the dataset, e.g. the dbpedia:Brassil4
and from other datasets connected with (transitive) owl has at least the following equivalents within the DBpedia
:sameAs relations and thus referring to the same concepts. dbpedia:Republica_Federativa_do_Brasil, dbpedia:ISO_
With this heterogeneity of identifiers it is complicated for 3166-1:BR and dbpedia:Brazil which are all redirecting to
users and agents to find the unique identifier which should dbpedia:Brazil. Thus a problem to consider is to directly
be preferably used. We are introducing the concept of DBpe- resolve any of the equivalents directly to the final URI e.g.
dia Unique Identifier (DUI) and a dataset of linksets relating http://dbpedia.org/resource/Brazil without any redun-
URIs to DUIs. In order to improve the quality of our dataset dancies.
we developed a mechanism that allows the user to rate and
suggest links. As proof of concept an implementation with Also, according to Halpin et. al. [1] and Wood et. al. [4],
a graphical web user interface is provided for accessing the sameas.org has collected millions of triples with owl:same
linkset and rating the links. The DBpedia sameAs service As relations. It would be important to promote reciprocal
is available at http://dbpsa.aksw.org/SameAsService. owl:sameAs confirmation mechanisms and develop effective
trust mechanisms to assure the quality of owl:sameAs rela-
tions.
Categories and Subject Descriptors
M.0 [Knowledge Management]: Knowledge Acquisition; To tackle the identifier heterogeneity problem we are making
H.4 [Information Systems Applications]: Miscellaneous the following contributions:
General Terms • We describe an approach for the mitigation of the iden-
Semantic Web, Linked Data, DBpedia, SameAs, Link Rate tifier heterogeneity problem and implement a proto-
type where the user is able to evaluate existing links,
as well as suggest new links to be rated.
1. INTRODUCTION
As DBpedia [3] was evolving during the 9 years of its ex- • The ability to generate statistics about good and bad
istence, the community extended the linksets to DBpedia links which, brings the possibility to have a quality
resources. Thus, DBpedia has more than one URI that rep- control for the links to DBpedia.
resents the same resource, which leads to the identifier het- • We define the DBpedia Unique Identifier (DUI), which
erogeneity problem. For instance a DBpedia resource can instead of several transient owl:sameAs DBpedia URIs
contain owl:sameAs links to other data sets such as Free- for the same final address, now is possible to have
Base1 , Wikidata, GeoNames2 or yago3 . a unique URI from DBpedia. A DUI goes directly
1
to the final address instead of having to process sev-
Freebase project webpage: http://freebase.com eral possible intermediate results. For example, with
2
GeoNames project and exploration webpage: http:// a URI from Freebase, 17 redundant URIs from DB-
geonames.org pedia where avoided or if one used a service such as
3
Yago project webpage:http://yago-knowledge.org sameAs.org, 1141 URIs would be avoided.
The rest of the paper is organized as follows: section 2 rep-
resents a proposed approach for tackling the identifier het-
erogeneity problem, we evaluate our work in section 3, in
section 4 we focus on related work, and finally section 5
concludes the paper and outline future work.
4
Throughout the paper we are using the following names-
pace definitions: owl: http://www.w3.org/2002/07/owl#,
dbpedia: http://dbpedia.org/resource/.
1
2. REPRESENTATION OF THE IDEA of a relational database i.e. comparative with voting system
This section provides an explanation about our main idea, in future works. (3) An implementation of a service on the
such as implementation and descriptions. web was provided, where the user enters the URI and re-
ceives a DUI. (4) In order to provide an interface to access
Before continuing the work, there are some definitions that this service were created a web system that receive as input
were adopted. a URI, return as output an DBpedia identifier and allow
rate and make suggestions about the resulting link.
• Normalization of the URI: Is understood by nor-
Figure 2 presents the relation of the contribution in a graph
malizing URIs, the fact of eliminating redundancies.
form.
• DBpedia unique identifier: The DBpedia Unique
Identifier (DUI) is an unique URI that identifies a re-
source in the DBpedia repository and also is the result
of our normalization.
The idea started with a stand alone service on the web that
solves the problem where the user provides a URI as param-
eter and instead of several transient URIs with owl:sameAs
property, the user receives a single DUI from our service.
2.1 The work-flow
The work-flow for requesting the DUI of a given resource is
represented in fig. 1. Firstly, the user will provide a URI Figure 2: Relation of the contributions.
from some address, i.e. FreeBase. Then, instead of possible
several results of URIs with the property owl:sameAs, our
system will return a DUI. Consequently, the user has a pos- Where the DBpedia Link Repository uses the DBpediaSameAs
sibility to rate, verify, validate, and suggest a different link. service in order to tackle the heterogeneity and giving the
Then the rate can give us a chance to have statistics about appropriate DUI, that redirects the user to the DBpedia
the quality of the links. Link Rate interface, thus, providing a feedback to the DB-
pedia Link Repository, therefore, improving the quality of
the DBpedia endpoint.
3. EVALUATION
The aim of this qualitative6 evaluation was centered in veri-
fying the behavior of the service DBpediaSameAs, the Graph-
ical User Interface (GUI) that gives the possibility to verify
and rate the links.
There are chosen 3 evaluation criteria:
(1) Normalization on DBpedia URIs: With this criteria
was evaluated if the DBpediaSameAs can provide an normal-
ization on DBpedia URIs. (2) Rate the Links: Where was
evaluated if the DBpediaSameAs can provide a way to rate
the links. (3) DBpediaSameAs as service: Was evalu-
Figure 1: General work-flow. ated if DBpediaSameAs can provide a stand alone service
on the web that brings the normalization on DBpedia URIs.
A service, also was implemented, where the user can pro-
vide a URI and the API will return the DBpedia identifier
like a URI that represents the owl:sameAs about the URI 3.1 Normalization on DBpedia URIs
provided. The criteria used in this evaluation are uniquely to tackle
heterogeneity, that was observed during the search of co-
2.2 Methodology references between different data sets with a problem about
This section describes in four steps the technique and how redundancies.
the idea was developed, from phase of importing links to a
relational database until the development of the service on When was used a URI from freebase in order to obtain a DB-
the web and a GUI. pedia URI was observed that at least 3 URIs were returned,
that drives to the same final address.
(1) The files with triples that contains owl:sameAs links,
were downloaded. (2) All triples were imported in a rela- As an example of a real case, executed in our public server,
tional database 5 , because we will use some characteristics with a URI from Freebase:
5 6
http://tinyurl.com/creatdb http://tinyurl.com/rmethod
2
$ curl http :// dbpsa . aksw . org / ¬ 3.3.1 Transitive and Redirect Links
SameAsService / SameAsServlet ? uris = http %3 ¬ Transitive and Redirect Links are redundancies at DBpe-
A %2 F %2 Frdf . freebase . com %2 Fns %2 Fm .015 fr dia that supposed has a link to the same place, in other
returns : http :// dbpedia . org / resource / ¬ words, they use owl:sameAs property, this links will redirect
Brazil another links, will provide a transition between the links,
that’s why the name transitive. In this case, instead of using
this transitive links that points to the same final destination
Where, in this case, instead of 17 URIs from DBpedia, that URI, this final destination URI will be used directly. The
goes to the same final address, our approach drives the user figure 4 try to make more clear this explanation.
directly to the final address.
Was discovered and treated 6,473,988 triples with transitive
As can be observed on the figure 4 that approach the transi- and redirect links from 62,531,487 imported links among 142
tive and redirect URIs, where show that with this approach domains inside DBpedia. Then, 10.35% of the links can be
instead of have several URIs the user can have only one avoided in some cases.
from the DBpediaSameAs. Thus, in this way, providing a
normalization on DBpedia URIs. 3.4 Discussion
The DBpediaSameAs was evaluated with its normalization
3.2 Rate the links of URIs, link rate, and DBpediaSameAs as a stand alone
In order to have a link rating, were implemented a GUI service on the web. As results of the normalization a DUI
that allows the users to give some feedback, suggestions, in was obtained in order to tackle the heterogeneity. In other
this way, improving the quality of the links. The rate is a words, instead of several URIs e.g. from sameAs.org one
quite simple process, the GUI just ask the user to rate the DUI was obtained. The link rate functionality further allows
link with +1 if the link attends your expectations or -1 if the to improve the quality of the dataset.
link is wrong or some type of spam. The GUI was developed
using concepts from prefix.cc 7 and work from Zaveri[5] such Despite, the GUI of DBpediaSameAs, also a stand alone
our system of rate (+1 and -1) and the standard of the web service on the web was developed that brings the function-
documents. Some improvements and personalization, also ality to get a DUI without a GUI for agents or people which
was provided, such as the suggestions and the possibility to don’t need to use the DBpediaSameAs in a Graphical mode,
check the link. The figure 3 shows the moment when the allowing use as an off-the-shelf component.
user clicked on the -1 and indicated that the user didn’t like
the link and was asked to make a suggestion of a new URI. 4. RELATED WORK
The work [5] elaborates a data quality assessment method-
ology in DBpedia, which comprises of a manual and semi-
automatic process. This work drive us to a reinforcement
about the concept of data quality used in our work, when in
our case will be more a manual process and also we are able
to improve the DBpedia data quality.
The work [2], presents a two staged experiment for the cre-
ation of gold standards that act as benchmarks for several
interlinking algorithms. The similar aspects of this works
are: The validation of links and a dubbed manual valida-
Figure 3: Rate a link. Available at http://dbpsa.aksw tion, where the user i.e. validator or evaluator specifies
.org/SameAsService whether a link generated by an interlinked tool is correct
or incorrect. The results of the link validation process are
The field about a suggestion for a new link will only appear used to learn presumably better link specifications and thus
when the user are not satisfied with the current link, then, achieving high-quality. Also, this work proposes an experi-
when clicking on the -1, then the system will ask for an ment to investigate the effect of user intervention in dataset
optional suggestion. interlinking on small knowledge bases.
4.1 A related problem with sameAs.org
3.3 Results The sameAs.org is a service that leading source of co-reference
The results of this work could also be expressed in num- data on the Semantic Web. For example, when the web
bers that was obtained during importing triples to the rela- site sameAs.org is accessed with a URI from Freebase that
tional database and with some results from the sameAs.org should bring information about a country called Brazil.
web site. A total of 62,531,487 triples imported into our
database, the time was 2,220 seconds for the whole oper- Was used the URI (http://rdf.freebase.com/ns/m.015fr)
ation, thus, was noticed that 28,167 triples were imported as parameter to the service, and is received as return more
per second. The source code used to obtain the results is than 1140 URIs as shown in fig. 5, but the user can have a
available in our github repository8 . doubt about which one is the correct.
7
http://prefix.cc
8 Our work is not an alternative to the sameAs web site, but
https://github.com/firmao/dbpedia-links/blob
/master/CreateDB.sh brings possibilities, like, was noticed that the sameAs.org
3
Figure 4: Transitive / Redirect links in DBpedia.
Ackermann for a essential help with the deployment of ap-
plication and very good suggestions. Additionally,research
activities of this paper were funded by grants from the EU’s
7th & H2020 Programmes for projects ALIGNED (GA 644055),
GeoKnow (GA 318159) and LIDER (GA 610782).
Figure 5: Several URIs with the property owl:same 7. REFERENCES
As from the web site sameAs.org. [1] H. Halpin, P. J. Hayes, J. P. McCusker, D. L.
McGuinness, and H. S. Thompson. When owl: sameas
isn’t the same: An analysis of identity in linked data.
does not provide a way to rate the link, but with this rating, In P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika,
is possible to improve the quality of the data, and bring some L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm,
facility to the user. editors, International Semantic Web Conference (1),
volume 6496 of Lecture Notes in Computer Science,
5. CONCLUSION AND FUTURE WORKS pages 305–320. Springer, 2010.
An approach was provided to tackle the heterogeneity work- [2] M. Hassan, J. Lehmann, and A.-C. N. Ngomo.
ing with owl:sameAs redundancies that were observed dur- Interlinking: Performance assessment of user evaluation
ing researching co-references between different data sets and vs. supervised learning approaches. In 24th
providing a unique DBpedia identifier and give the chance International World Wide Web Conference (WWW
to rate the resulting links and make suggestions. 2015): workshop: Linked Data on the Web
(LDOW2015), Florence, Italy, May 18 to 22, 2015,
A proof of concept was implemented as a computer web Proceedings, 2015.
system in order to present and validate our idea and every [3] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch,
concept of this work. The source code is available 9 . D. Kontokostas, P. Mendes, S. Hellmann, M. Morsey,
P. van Kleef, S. Auer, and C. Bizer. DBpedia - a
Was noticed in our results that there are benefits when a large-scale, multilingual knowledge base extracted from
considerable number of owl:sameAs redundancies can be wikipedia. Semantic Web Journal, 2014.
avoided. Rating the links allow users to make link sugges- [4] D. Wood, M. Zaidman, L. Ruth, and M. Hausenblas,
tions brings more quality to the repository, and the stand editors. Linked Data: Structured data on the Web.
alone service on the web allow you to use the DBpediaSameAs Manning, 2014.
also in a command line textual environment and can be used [5] A. Zaveri, D. Kontokostas, M. A. Sherif, L. Bühmann,
as an off-the-shelf component. M. Morsey, S. Auer, and J. Lehmann. User-driven
quality evaluation of dbpedia. In Proceedings of the 9th
For the future we plan to: (1) make a study about the re- International Conference on Semantic Systems,
sults of link rating. This needs a period of usage of the I-SEMANTICS ’13, pages 97–104, New York, NY,
DBpediaSameAs service in order to gather sufficient results USA, 2013. ACM.
for proper analysis. (2) An implementation case with more
members of the DBpedia community. A study about how
will be the behavior when implement with the DBpedia com-
munity.
6. ACKNOWLEDGMENT
We would like to acknowledge, National Council for Scien-
tific and Technological Development (CNPq) 10 and Univer-
sität Leipzig for their support. Special thanks to Markus
9
https://github.com/firmao/DBPediaLinkSameAs.git
10
http://www.cnpq.br/
4