UpLOD: A Tool for Inconsistent Links Repairment in
                        the LOD

    André Gomes Regino1 , Enio de Jesus Pontes Monteiro1 , Andressa Cristina dos
                          Santos1 , Julio Cesar dos Reis1,2
                   1
                   Institute of Computing, University of Campinas, Brazil
                   andre.regino@students.ic.unicamp.br,
                  enio.monteiro@students.ic.unicamp.br,
    andressa.santos@students.ic.unicamp.br, jreis@ic.unicamp.br
       2
         Nucleus of Informatics Applied to Education, University of Campinas, Brazil


       Abstract. The amount of interconnected RDF data has expressively grown as a
       result of the adoption of Semantic Web technologies. Changes in RDF datasets
       are essential to guarantee data evolution. However, changes may affect well-
       formed and validated “sameAs” links, impacting the real meaning intended by
       ontology maintainers and/or the link creators. The manual maintenance is unfea-
       sible due to the data volume. In this article, we describe a software tool capable of
       correcting links between RDF datasets based on the evolution of the underlying
       datasets. Our tool automatically detects broken links caused by RDF changes. On
       this basis, it provides maintenance actions assisted by users for repairing links
       and turning them adequate. We present an architecture integrating these features
       and how the user interacts with the software.

       Keywords: Link Maintenance; Linked Data; Semantic Web tools


1    Introduction

The semantic definition of RDF entities tends to undergo modifications over time. These
changes can influence “sameAs” links to other datasets and potentially decrease the
quality and consistency of the links. Maintaining their accuracy is critical because ap-
plications for data search and integration are based on them for their proper function-
ing. Due to the large volume of existing links leveraged by the growing number of RDF
repositories, manual correction becomes arduous and costly. In this sense, correcting
links manually is a hard and error-prone task. This justifies the development of novel
computational tools capable of assisting domain specialists in this maintenance task.
    Literature has presented studies to address the problem of link maintenance arising
[2] due to the constant reviews and changes carried out in RDF datasets. Our literature
analysis indicates that existing approaches are not fully prepared to perform automatic
detection of changes in RDF repositories [2], and subsequently, carry out a correction
in artifacts associated with them, such as links and semantic annotations [2].
    In this article, we present a software tool capable of identifying and applying main-
tenance actions on links affected by the evolution of RDF datasets as automatically as


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
possible. Our maintenance process comprises the execution of three steps: 1) identify
changes between RDF dataset versions; 2) discover broken links among these changes;
and 3) repair these links (cf. Section 2). Our software solution integrates these features
allowing dataset maintainers to visualize and analyse changes and conduct repairment
actions via the tool.


2   Link Maintenance Software Tool

The development of our tool was based on the conceptualization of the Linked Open
Data Maintenance Framework (LODMF) [1] to support the maintenance of links be-
tween RDF datasets. Our tool receives two different versions of the RDF dataset and
detects changes in the triples from one version to the other. On this basis, the tool sug-
gests maintenance actions to correct found cases of broken links. The output is a set of
fixed links. Fig. 1 presents the interaction flow to perform maintenance actions.


                    Fig. 1. User interaction flow throughout the system.


     The users can “Create a New Task” (cf. A in Fig. 2 – Figure A) to start a new dataset
maintenance; or the user can “Open a Task” (cf. B in Fig. 2 – Figure A) to keep a RDF
verification already started. The left side menu in the first screen allows to access the
recent tasks performed (cf. D in Fig. 2 – Figure A).
     When the users select the option of “Create a New Task” (A in Fig 1 – Figure A),
they encounter an interface describing the required data for input (cf. Figure B of Fig.
2). The user informs a name for the task and includes a list of parameters (B, C and
D in Fig. 1). It starts with the source RDF dataset (cf. B in Fig. 2 – Figure B) and
its second version. The user indicates a target RDF dataset (cf. C and D in Fig. 2 –
Figure B). The source RDF dataset is the location of the outgoing links and the target
is the location of the incoming links. The user can insert a “Background Knowledge”
as a semantic network used by our tool to compute semantic similarity between RDF
resources (cf. E in Fig. 2 – Figure B). This is required for computing candidate resources
in the maintenance process. The user can inform the types of predicates of the links that
should be analysed (cf. F in Fig. 2 – Figure B). As default, we consider “owl:sameAs”.
    Fig. 2 presents a dashboard with key information regarding the input datasets. It
enables the user to view the detailed percentage of the identified links and other relevant
statistics about the datasets. The user can access the list of changed links separated by
modified subject, predicate and object through the left side menu (cf. A, B and C in
Fig. 2). A graphic (cf. D in Fig. 2) enables to analyze the percentage of links that were
already verified by the user, the number of discarded and to be verified. The system
presents information about the involved datasets (E in Fig. 2) as long as the changing
operations are found in links between the processed RDF versions (F in Fig. 2).
    The analysis is made link by link to complete the maintenance process. At this
stage, we consider only those affected links by the computed RDF changes. This is a
result of a procedure from our system that categorizes an evolved link as affected. In
summary, a link is considered affected if the semantic similarity between subject and
object decreased from version v0 to version v1 [3]. The user might select one of the
affected links (steps F and G in Fig. 1) displayed in the left side menu (cf. A, B and C
in Fig. 2). This classifies the links based on the type of changes that affected them. By
selecting one link, our software tool presents details regarding such link in an interface
dedicated to its maintenance (cf. Fig. 3).
    Fig. 3 presents the interface in which the left side contains a fixed menu with the list
of changed links. It was designed with different panels illustrating useful information
about the selected link. The panel (E in Fig. 3) shows an example of the evolution
of a link between versions V0 and V1 of the dataset. In this example, the first link
(with subject represented by the ID 2643743) evolved to a new version (with subject
represented by the ID 11609024). After this evolution, our tool categorized the link as
affected.
    The panel (D in Fig. 3) shows the suggested actions that the user can take to fix this
affected link (cf. step I in Figure 1). These actions are automatically computed by the
system. In the running example, the first suggestion is the replacement of the subject
from the resource with ID 11609024 to 2643744. In this case, a subject consisting of a
higher degree of similarity with the object (“London”).
    Alternatively, if the user does not agree with the tool’s suggestion, there is a pos-
sibility to completely remove the affected link from the dataset (option “Discard link”
– D in Fig. 3) or apply other maintenance actions suggested by the tool (step J in Fig.
1). The list of suggested actions varies from choosing another subject to replacing the
predicate, for instance. To this end, the user should choose the option “Show more link
suggestions” (cf. D in Fig. 3).
    Following the repairment process, two graphs are shown to help users in understand-
ing the actions suggested by the tool (cf. H in Fig. 3). They support people comprehend-
ing visually what will be the result after applying one or other link maintenance action.
The panel (cf. G in Fig. 3) shows the values of similarity of each similarity algorithm
used by our tool. The example (cf. G in Fig. 3) illustrates three algorithms/background
knowledges: Levenshtein, WordNet and Nasari.
    The selected links (cf. F in Fig. 3) (green and orange background) are compared
using the three similarity algorithms (cf. G in Fig. 3). Values closer to 1 indicate that
the subject and object of the links are more syntactically and semantically similar when
compared to values closer to 0. The graph view shows the affected links and some of
its connected resources (cf. H in Fig. 3). The orange graph (left side) shows the link
in a broken state; the green graph (right side) presents the link after repairment, if the
suggestion made by the tool is accepted by the user. Both panels G and H serve as
guidance to the user in the final decision. The user chooses the adequate action and
commits the changes by clicking on “Apply change” (cf. D in Fig. 3). If the users need
to analyze another link, they select the desired link on the left side panel, returning to
step G (cf. Figure 1) of the system.


Fig. 2. Figure A - System home screen. Figure B - Initial and configuration interface of the Up-
LOD. Figure C - Link evaluator dashboard interface.


3   Conclusion

The real value of semantic-enabled computer systems lays on the reliability of links.
This study investigated how to keep links updated according to the evolution of RDF
data repositories. We presented a software tool for the semi-automatic maintenance of
RDF links affected by data evolution. Our defined maintenance process works on the
basis of change operations automatically identified in the evolution of RDF datasets.
We are currently investigating additional features in the tool for the adaptation of RDF-
based semantic annotations. Our next steps involve the development of a module re-
sponsible for maintaining semantic annotations. We plan to conduct complete case stud-
ies applying the use of the system in real-world scenarios.


                                Fig. 3. Link maintenance interface.


Acknowledgements
This work was financially supported by the São Paulo Research Foundation (FAPESP)
(grants #2017/02325-5, #2018/14199-7, #2019/14582-8, #2020/12466-8)3 .


References
1. Regino, A.G., dos Reis, J.C., Bonacin, R.: Lodmf: A linked open data maintenance frame-
   work. In: 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure
   for Collaborative Enterprises (WETICE). pp. 263–268. IEEE (2020)
2. Regino, A.G., dos Reis, J.C., Bonacin, R., Morshed, A., Sellis, T.: Link maintenance for in-
   tegrity in linked open data evolution: Literature survey and open challenges. Semantic Web
   12, 517–541 (2021)
3. Regino, A.G., dos Reis, J.C.: Discovering semantically broken links in lod datasets. In: Work-
   shop Managing the Evolution and Preservation of the Data Web (MEPDaW) co-located at the
   19th International Semantic Web Conference (ISWC), virtual conference. pp. 17–26 (2020)

 3
     The opinions expressed here are not necessarily shared by the financial support agency.