<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Navigation Tool for Exploring Semantic Web Corpora</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>CNRS, Inria, LORIA, F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Lorraine, CNRS, Université de Strasbourg, AHP-PReST</institution>
          ,
          <addr-line>F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic Web technologies provide a way to represent and to exploit data and can be particularly suitable for historical corpora. The Henri Poincaré correspondence corpus is composed of more than 2000 letters which constitute scientific, administrative and private exchanges. Several technologies have been used for this corpus: the rdf model, the rdfs knowledge representation language and the sparql query language. Recently, a navigation tool has been created to explore this correspondence corpus by exploiting similarities between resources. This tool can simplify corpus exploration and highlight unexpected relations between elements. It relies on the use of a flexible search mechanism based on the definition and the application of sparql query transformation rules. The system can be connected to any rdf database as long as underlying data is exposed through a public sparql endpoint.3</p>
      </abstract>
      <kwd-group>
        <kwd>Digital humanities</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Historical corpora</kwd>
        <kwd>SPARQL query transformation</kwd>
        <kwd>Knowledge base exploration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This navigation system will be presented through a live or recorded
demonstration which will introduce several use cases and present its functionalities. This
tool has been imagined in the context of the exploitation of the Henri Poincaré
correspondence corpus. Henri Poincaré (1854-1912) is a famous French scientist
who made several significant contributions in multiple areas of mathematics,
physics and philosophy. Numerous research works are dedicated to the life and
career of this man of science. The study of his correspondence is a long-term
project which has led to the publication of several thematic volumes. An
important aspect of this project is related to the online publishing and exploitation
of this corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The letters of the corpus are available on a website and come
with a scan of the original document,4 a transcription, a critical apparatus and
a set of meta-data. Semantic Web technologies have been used to offer advanced
3 Copyright © 2021 for this paper by its authors. Use permitted under Creative
      </p>
      <p>Commons License Attribution 4.0 International (CC BY 4.0).
4 Some are unavailable due to copyright rules.
tools for the exploitation of this corpus. For this purpose, the rdf model is used
to describe facts, the rdfs language is used to represent the domain knowledge
and the sparql query language allows the interrogation of the corpus graph.
When it comes to the exploration of an historical corpus, one of the main issues
is to be able to put forward new and unexpected relations between individuals,
institutions, scientific works, etc. A navigation tool has been developed for the
correspondence corpus exploration by exploiting the similarities between
documents. This system relies on the use of a sparql query transformation rule
mechanism which can help to highlight unexpected relations between resources.</p>
      <p>The remainder of this article is organized as follows. First, the navigation tool
interface is presented (Section 2). Then, the need and the use of the flexible search
mechanism for this navigation tool are presented (Section 3). A discussion related
to this system architecture and reusability is provided (Section 4). Section 5
concludes and points out some future works.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A Navigation Tool for Exploring Corpora</title>
      <p>This system can be used for the exploration of any Semantic Web graph. It is
particularly relevant for the exploration of historical corpora because results are
presented within a chronological-based interface. A demonstration video of this
tool is available online. The tool is available as a Web interface which allows
users to generate sparql queries and to visualize, to filter and to export results.</p>
      <p>The top-left block gives some information about the initial resource related to
the current search process. In our example, this resource corresponds to a letter
sent by Henri Poincaré to Gösta Mittag-Leffler on June 29, 1881. The
bottomleft block gathers a set of search conditions which can be used to create sparql
queries and which are generated based on the initial resource. Next to each
condition is given an integer which corresponds to the number of resources matching
the given condition on the rdf graph. Users can select different conditions by
clicking on them. Clicking on the "query" button updates the results presented
on the bottom-right block of the interface within a chronological-based view. For
each result, some information about the resource is given. For the Henri Poincaré
correspondence corpus, each letter is described with its label, its sender, its
recipient, the topics and the persons quoted. Above the result block, a date slider
can be used to filter the set of presented results. The system proposes to export
the results in a csv file which embeds the iri and some information about each
resource. Another functionality concerns the presentation of a bar chart which
expresses the distribution of the results in relation to the chosen date property.
It is possible to start a new search process centered around one of the letters
presented in the result block. The idea of the system is to start with an initial
resource and to explore the corpus by navigating from a resource to another
by taking different paths. This way of exploring the corpus could lead to the
identification of unexpected relations between the elements of the corpus.
A Navigation Tool</p>
    </sec>
    <sec id="sec-3">
      <title>Going Further by Applying a sparql Query</title>
    </sec>
    <sec id="sec-4">
      <title>Transformation Mechanism</title>
      <p>
        In some situations, the generated conditions may not be sufficient for providing
interesting or surprising results. A first solution, which is included in the tool,
is to let users manually add a new condition. Another idea to overcome this
issue is to be able to generate new conditions that the initial resource would not
necessary match but which are related to its characteristics. For this purpose,
the system relies on the use of transformation rules that can be applied to
provide new filtering conditions which are related to the conditions generated from
the initial resource characteristics. These transformation rules are defined and
applied using the sqtrl (sparql Query Transformation Rule Language) tool,
which has been introduced to allow flexible querying with sparql [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>On the user interface, the "More" button is used to generate new conditions
in an iterative way. For the running example, the first click on the button
generates two new conditions: fsent to Henri Poincarég and fsent by Gösta
Mittag-Lefflerg. This is related to the application of a transformation rule
which exchanges the sender and the recipient of a letter. Another action of the
button adds the condition fhas topic Mathematicsg, based on the application
of a topic generalization rule which replaces Équations aux dérivées partielles et
espaces lacunaires by Mathematics. By applying the same transformation rule,
the tool generates the condition fhas topic Travelg. Two other rule
applications generate conditions to search for letters sent of received by one of the
persons quoted: fhas for correspondent Thiébautg and fhas for correspon
dent Balthazar Mathisg.
4</p>
    </sec>
    <sec id="sec-5">
      <title>System Architecture and Reusability</title>
      <p>
        The system, its source code as well as a user and technical documentation are
available online on a GitHub repository. When developing this tool, one of the
main challenges was to ensure its reusability with other corpora. In this
context, the backend application (developed with the Jena API [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) comes with
a configuration file to define: the url of the sparql endpoint; the path to the
transformation rule file; the list of properties to be used for condition generation;
the list of properties to be displayed for each result; the property and language
for labels; the date property and the temporal interval associated with results
filtering. The documentation describes another use case which is related to the
search of literary works by querying the DBpedia public sparql endpoint5.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>A navigation system, accessible through a Web user interface, has been proposed
for exploring the Henri Poincaré correspondence corpus. This system benefits
from the use of a flexible search mechanism for exploiting the relations between
the elements of the corpus. It is a generic tool for Semantic Web graphs
exploration and could thus be reused with other corpora. Several future works are
considered for improving this system. Some improvements concern adding new
features to the user interface such as adding new export formats or being able to
remove a condition from the list. A major improvement would be to keep a trace
of any action performed with the tool and thus being able to save the research
process. In the context of a historical corpus, the research methodology is an
important aspect of the work associated with results presentation. Another idea
is to provide the user with an explanation of conditions generated by the use of
the sparql query transformation mechanism. This could be the description of
which transformation rule has been applied and for which resources.
Acknowledgement. This work is supported partly by the French PIA project
“Lorraine Université d’Excellence”, reference ANR-15-IDEX-04-LUE.
5 https://dbpedia.org/sparql/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bruneau</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaillard</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lasolle</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nauer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynaud</surname>
            ,
            <given-names>J.: A SPARQL</given-names>
          </string-name>
          <string-name>
            <surname>Query Transformation Rule</surname>
          </string-name>
          Language -
          <article-title>Application to Retrieval and Adaptation in Case-Based Reasoning</article-title>
          . In: Aha,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Lieber</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Case-Based Reasoning Research and Development. ICCBR 2017</source>
          . pp.
          <fpage>76</fpage>
          -
          <lpage>91</lpage>
          . Lecture Notes in Computer Science, Springer, Cham (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bruneau</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lasolle</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nauer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rollet</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Applying and Developing Semantic Web Technologies for Exploiting a Corpus in History of Science: the Case Study of the Henri Poincaré Correspondence</article-title>
          . Semantic Web - Interoperability, Usability, Applicability
          <volume>12</volume>
          (
          <issue>2</issue>
          ),
          <fpage>359</fpage>
          -
          <lpage>378</lpage>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>McBride</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Jena: a Semantic Web toolkit</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <fpage>55</fpage>
          -
          <lpage>59</lpage>
          (
          <year>2002</year>
          ). https://doi.org/10.1109/MIC.
          <year>2002</year>
          .1067737
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>