<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Link Maintenance in the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andre Gomes Regino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Cesar dos Reis</string-name>
          <email>jreisg@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computing - University of Campinas (Unicamp) Campinas - SP -</institution>
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Links among data elements represent the core of the Semantic Web. The links are built with semi-automatic linking algorithms using a variety of similarity calculus. The data interconnected by these algorithms demands automatic methods and tools to maintain its consistency. These changes occur mainly in datasets that represent knowledge in areas that evolve drastically, like biology and medicine. Even though the constant change of these links is an important task for the evolution of these structured datasets, such changing operations can negatively influence the well-established links, which turns difficult the consistency of the connections over time. In this work, we aim to investigate new methods responsible for fixing and updating links among ontologies in the Linked Open Data context.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The remaining of this paper is organised as follows: Section 2 presents the general
and specific goals of the ongoing research; Section 3 shows the related work about link
maintenance. Section 4 presents the methodology and the framework, while 5 presents
what has been done and achieved so far. Finally, Section 6 explains the next steps and
challenges.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Goal</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Generic Goal</title>
      <p>We aim to investigate, formalize and implement semi automatic link maintenance actions
in order to recognize affected links and turn them up-to-date. These actions will address
cases of structurally broken links, which part of the link was removed, and semantically
broken links, which the meaning of the resources changed.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Specific Goals</title>
      <p>Our research will provide the following contributions:</p>
      <p>An exhaustive study of the behavior regarding broken links and its correlation with
simple and complex changes in Linked Open Data datasets;
A framework that outputs the broken links found between two versions of the
same dataset, in addition to the fixing, rebuild or removal of these links;
Each of the steps mentioned at Section 4 retrieves relevant knowledge of the
maintenance of links timeline: In Step 1, we can retrieve a list of changed concepts between
the versions of the datasets; In Step 2, a list of links that may be broken; In Step 3, an
up-to-date dataset.</p>
    </sec>
    <sec id="sec-5">
      <title>3. Related Work</title>
      <p>Previous investigations addressed the problem of broken links using different methods.
Literature has shown that most of the proposals only handle the detection phase of the
process and are not concerned with the fixing phase. It has also shown that there is no
evidence of a solution that can automatically detect and fix these links without human
intervention.</p>
      <p>
        Dealing with broken links in the traditional Web is not a novelty problem.
Existing work attempted to adapt traditional Web techniques of dealing with broken links to
the Semantic Web environment [
        <xref ref-type="bibr" rid="ref8">Vesse et al. 2010</xref>
        ]. Existing implemented tools send
notifications to the maintainer of the dataset when detecting that a resource has changed.
This approach suffers from scalability issues since it is dependant on the number of
notifications sent [
        <xref ref-type="bibr" rid="ref5">Popitsch and Haslhofer 2011</xref>
        ].
      </p>
      <p>
        Another approach stores the versions based on changes in the dataset and, in the
case of inconsistency detection, it is easier to manually restore previous versions and find
the source of the potentially broken link. One of the limitations in this approach is the
number and size of deltas, which are the mapping of differences between two versions of a
given dataset [
        <xref ref-type="bibr" rid="ref2">Kondylakis et al. 2017</xref>
        ]. The use of metadata was studied to store relevant
data with the nodes of the datasets, which is a case of a change, revisiting the metadata of
the nodes helps on identifying what happened [
        <xref ref-type="bibr" rid="ref3">Meehan et al. 2016</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. Methodology</title>
      <p>In order to keep to links up-to-date, we are building a framework composed of three main
steps, listed as:</p>
      <p>Step A: Detect the changes that occurred in a given period of time based on two
releases of the involved datasets that follow the LOD principles. These changes
can be simple changes (atomic changes like addition or removal actions) or
complex changes (non-atomic changes like update action, a change composed by a
removal followed by an addition actions) of the knowledge stored in the datasets;
Step B: Recognize which of the changes found at Step A turned into an affected
link (semantically or structurally broken link). This link could be also created,
removed, updated or remained untouched (which can also be an example of a
inconsistency, given that the dataset had evolved, but the link remains the same);
Step C: Apply corrective actions in the recognized affected and broken links.
This action can be a reconnection with an unbroken link (like the children or
parents of the outdated link) or the removal of the link.</p>
      <p>Figure 2 shows the steps mentioned above and subsections 4.1, 4.2 and 4.3
explains the purpose of each of them.</p>
    </sec>
    <sec id="sec-7">
      <title>4.1. Step A: Detection of Changes</title>
      <p>The initial step (Step A of Figure 2) is responsible for detecting the changes that occurred
in a given period of time based on two releases of the involved LOD datasets. These
changes can be simple changes (like simple addition or removal of triples) or complex
changes (update action) of the knowledge stored in the datasets.</p>
      <p>As an example, suppose there is a link connecting two resources in two different
datasets. The first dataset has a resource entitled “Dubai”. The second dataset also has a</p>
      <p>
        Figura 2. LODMF Framework [
        <xref ref-type="bibr" rid="ref6">Regino et al. 2020</xref>
        ]
resource with the same title. They are connected by a “sameAs” link, stating that “Dubai”
from the first dataset is the same “Dubai” from the second dataset. However, the
maintainer of the first dataset changed the resource “Dubai” to “Emirates of Dubai”. Step A of
LODMF aims to detect these changes that may affect the links.
      </p>
      <p>This step derives some research questions: How to detect changes between two
LOD datasets? Which kind of changes most occurs in LOD datasets? Is there any relation
between the domain of the dataset and the most frequent changes?</p>
    </sec>
    <sec id="sec-8">
      <title>4.2. Step B: Recognize Affected Links</title>
      <p>Step B is responsible for categorizing the set of links as inconsistent or consistent,
depending on the change it suffered. Following the example presented at subsection 4.1, in
Step B the changed link (“Dubai” connected to “Dubai” changed to “Emirates of Dubai”
connected to “Dubai”) is evaluated. If the change did not alter the meaning intended by
the link’s author, then the change is categorized as valid and the link as unaffected.
However, this is not the case of the modification from “Dubai” to “Emirates of Dubai”, which
changed the meaning of the link. “Emirates of Dubai” from first dataset can not be linked
to “Dubai” resource of the second dataset by the predicate “sameAs”, given that these
resources do not represent the same thing in the real world. This semantic inconsistency
is detected and the set of affected links proceed to Step C.</p>
      <p>The research questions related to these steps are: How to precisely categorize a
link as inconsistent? Is there a correlation between the number of affected and unaffected
links? The proportion of broken links over the total links is the same between the many
versions of the dataset through time?</p>
    </sec>
    <sec id="sec-9">
      <title>4.3. Step C: Apply Link Maintenance Action</title>
      <p>At Step C we apply maintenance actions on the affected links detected at Step B
(subsection 4.2). In our example, given that the link between “Emirates of Dubai” and “Dubai”
is inconsistent, it should be repaired at Step C. The repairment can be a reconnection to
another resource (connect “Emirates of Dubai” from the first dataset to a synonym in the
second dataset), a replacement of the predicate connecting the resources (“sameAs” can
be changed to “differentFrom”) or, in the last case, the complete removal of the link.</p>
      <p>The derived questions in these steps are as follows: As long as an invalid link is
found, which actions to take? How can we assure that the newly created link fits in more
than the older one? How to semantically reconnect a link?</p>
    </sec>
    <sec id="sec-10">
      <title>5. Initial Results</title>
      <p>
        Tables 1, 2 and 3 shows the collected results in a conducted study [
        <xref ref-type="bibr" rid="ref7">Regino et al. 2019</xref>
        ]
regarding Step A of Section 4. At this work, we aimed to correlate changes that occurred
in triples and resultant changes in links associated with that triples. We used Agrovoc,
a well-known dataset in life sciences related to agriculture, food, and environment. We
collect two releases of this dataset: April 2018’s, with 4.254.655 triples and April 2019’s
with 4.540.205 triples. For each changed link, we try to correlate these changes (addition,
removal, and modification) with changes in triples based on the versions of the dataset.
      </p>
    </sec>
    <sec id="sec-11">
      <title>Triples / Links Add Add 98.84%</title>
      <p>No Add
0.41%</p>
      <p>Tabela 1. Added Cases</p>
    </sec>
    <sec id="sec-12">
      <title>Triples / Links</title>
    </sec>
    <sec id="sec-13">
      <title>Remove</title>
    </sec>
    <sec id="sec-14">
      <title>Remove</title>
      <p>3.85%</p>
    </sec>
    <sec id="sec-15">
      <title>No Remove 96.15%</title>
      <p>Tabela 2. Removed Cases</p>
    </sec>
    <sec id="sec-16">
      <title>Triples / Links</title>
    </sec>
    <sec id="sec-17">
      <title>Modify Add</title>
      <p>0%</p>
    </sec>
    <sec id="sec-18">
      <title>Remove</title>
      <p>0.04%</p>
    </sec>
    <sec id="sec-19">
      <title>Modify</title>
      <p>4.41%</p>
    </sec>
    <sec id="sec-20">
      <title>No Change 95.55%</title>
      <p>Tabela 3. Modified Cases</p>
      <p>Table 1 shows that Agrovoc dataset applies the concept of Linked Data, linking
99% of their newly added triples to an external dataset. At Table 2, however, 96.15% of
identified removed cases show that if an internal triple is removed, the connecting link
remained untouched, generating cases of structurally broken links. Table 3, regarding
modification, shows that the fourth sub-case concerns the most frequent one, in which
the modification of triples led to unchanged links. This case needs additional studies to
further observe to which extend these unchanged links remained semantically inconsistent
due to the modifications of the associated RDF triples.</p>
      <p>We also performed a literature survey to understand to which extent the link
maintenance problem for integrity in Linked Open Data was studied. We discovered that most
of the developed techniques focus on the discovery part of the broken links, not in the
fixing part. We also discovered that, to the best of our knowledge, there is no evidence of
a tool that can discover and fix semantically broken links automatically. This study is not
yet published.</p>
    </sec>
    <sec id="sec-21">
      <title>6. Future Work</title>
      <p>
        We are now focusing on developing novel strategies to address the challenges of
identifying broken links and maintaining them (Steps B and C of Section 4). In addition, we
are evaluating state-of-art methods to detect semantic inconsistencies in the links, such
as the usage of genetic programming [
        <xref ref-type="bibr" rid="ref1">Isele and Bizer 2011</xref>
        ] and background knowledge
        <xref ref-type="bibr" rid="ref4">(WordNet [Fellbaum 2012] and BabelNet [Navigli and Ponzetto 2012])</xref>
        . For evaluation
purposes, we are developing a gold standard dataset sample containing links that are
semantically inconsistent, links that changed but are still consistent, and links that are
unchanged. We will measure each of the 3 steps of the framework in terms of precision,
recall, and f-measure using this gold standard dataset to verify the quality of the links
modified by our approach.
Fellbaum, C. (2012). Wordnet. The encyclopedia of applied linguistics.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Isele</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Learning linkage rules using genetic programming</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Ontology Matching-Volume</source>
          <volume>814</volume>
          , pages
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          . CEUR-WS. org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Kondylakis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Despoina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glykokokalos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalykakis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karapiperakis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lasithiotakis</surname>
            , M.-
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makridis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moraitis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panteri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plevraki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al. (
          <year>2017</year>
          ).
          <article-title>Evordf: A framework for exploring ontology evolution</article-title>
          .
          <source>In European Semantic Web Conference</source>
          , pages
          <fpage>104</fpage>
          -
          <lpage>108</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Meehan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>O</given-names>
            <surname>Sullivan</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Validating interlinks between linked data datasets with the summr methodology</article-title>
          .
          <source>In International Conferences On the Move to Meaningful Internet Systems</source>
          , pages
          <fpage>654</fpage>
          -
          <lpage>672</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>193</volume>
          :
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Popitsch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Haslhofer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Dsnotify: A solution for event detection and link maintenance in dynamic datasets</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>9</volume>
          (
          <issue>3</issue>
          ):
          <fpage>266</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Regino</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          , dos Reis,
          <string-name>
            <given-names>J. C.</given-names>
            , and
            <surname>Bonacin</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Lodmf: A linked open data maintenance framework</article-title>
          .
          <source>In Proceedings of the Workshop on Semantic Technologies for Smart Information Sharing</source>
          and
          <article-title>Web Collaboration (Web2Touch) co-located with 29th IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE) (accepted for publication)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Regino</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsoui</surname>
            ,
            <given-names>J. K. R.</given-names>
          </string-name>
          , dos Reis,
          <string-name>
            <given-names>J. C.</given-names>
            ,
            <surname>Bonacin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Morshed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , and
            <surname>Sellis</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Understanding link changes in LOD via the evolution of life science datasets</article-title>
          .
          <source>In Proceedings of the Workshop on Semantic Web Solutions for Large-Scale Biomedical Data Analytics co-located with 18th International Semantic Web Conference (ISWC)</source>
          , volume
          <volume>2477</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>40</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Vesse</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Hall,
          <string-name>
            <given-names>W.</given-names>
            , and
            <surname>Carr</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Preserving linked data on the semantic web by the application of link integrity techniques from hypermedia</article-title>
          .
          <source>In Linked Data on the Web (LDOW2010)</source>
          .
          <source>Event Dates: 27th April</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>