<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>C.: Historiography's two voices: Data infrastructure and history at scale in
the oxford dictionary of national biography (ODNB). Journal of Cultural Analytics
(2018). https://doi.org/DOI:10.22148/16.028.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.22148/16.028</article-id>
      <title-group>
        <article-title>Using the AcademySampo Portal and Data Service for Biographical and Prosopographical Research in Digital Humanities</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>HELDIG - Helsinki Centre for Digital Humanities, University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Semantic Computing Research Group (SeCo), Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <abstract>
        <p>This paper presents the in-use AcademySampo portal and Linked Open Data (LOD) service for biographical and prosopographical research, a new member in the Sampo series of cultural heritage applications for Digital Humanities. The portal is based on a dataset of short textual biographies about all 28 000 Finnish and Swedish academic people educated in 1640-1899 in Finland. Linked data extracted from the biography entries was enriched by internal and external data linking, and by reasoning, e.g., genealogical networks of the people mentioned. The data was published as a LOD service. This paper demonstrates how to use the AcademySampo data in Digital Humanities research by faceted search integrated seamlessly with data analytic tools of the AcademySampo portal, as well as by using the LOD service directly via a SPARQL editor and by Python scripting using Google Colab and Jupyter notebooks.</p>
      </abstract>
      <kwd-group>
        <kwd>Biography</kwd>
        <kwd>Prosopography</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Digital Humanities Demo paper</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Biographical research is ”concerned with the reconstruction of life histories and
the constitution of meaning based on biographical narratives and documents” 3
regarding individual persons, while ”prosopography is an investigation of the
common characteristics of a group of people, whose individual biographies may
be largely untraceable” 4 [8]. This demo paper concerns application on Semantic
Web technologies and Linked Data in biographical and prosopographical research.</p>
      <p>
        We demonstrate how to use the new AcademySampo LOD service and semantic
portal [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">5,4,6</xref>
        ] for these research purposes, based on the Finnish registries
“Ylioppilasmatrikkeli” 1640–18995 that contain short biographical descriptions of 28 000
students of the University of Helsinki, formerly the Royal Academy of Turku.
These registries cover a significant part of the history of Finland and the Finnish
university institution, since the University of Helsinki was the only university in
Finland during the time frame in focus.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Using AcademySampo Portal</title>
      <p>
        The AcademySampo portal6, based on the Sampo model7, is used by first filtering
results by faceted search, and then applying data analytic tools to the result
set. The user can search for people, places, occupations, and student’s clubs in
separate application perspectives. Data analytic tools are available for both 1)
individuals (biography) and 2) groups of them (prosopography). For example, Fig.
1 illustrates the family relations of the poet Johan Ludvig Runeberg (1804–1877)
extracted and reasoned from the mentions of him in the biographies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In prosopographical research, the user can first filter out a group of people and
then analyse them. For example, the students of the Swedish Småland student
5 The registry contains two parts: the database covering the years 1640–1852 is available
in Finnish and Swedish at https://ylioppilasmatrikkeli.helsinki.fi, and the registry of
1853–1899 is available at https://ylioppilasmatrikkeli.helsinki.fi/1853-1899
6 The portal with an English interface available at https://akatemiasampo.fi/en/
7 More information about the Sampo model and Sampo series of portals and data
services available at https://seco.cs.aalto.fi/applications/sampo/
club in a certain time period can be selected and their migrations visualized from
the place of birth to the place of death on a map [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Using the SPARQL Endpoint for Data Analysis</title>
      <p>Alternatively, the Linked Open Data service of AcademySampo8 can be accessed
directly for customized analyses. For example, the YASGUI9 interface for SPARQL
querying and visualizing the results can be used, or Python scripting with
notebooks in Google Colab10 and Jupyter11.</p>
      <p>
        For instance, Fig. 2 illustrates the distribution of the most common vocational
groups of people in the data during different time periods. This chart shows how in
the 17th and early 18th century the religious vocations have been most dominant.
However, during the three centuries the proportion of religious occupations has
decreased from over 50 to mere 15 per cent. Respectively, the fields of public
administration, and education, have had an increasing growth during the observed
time period. This analysis and visualization was created using Google Colab.
Related Works Analyzing biographical data has grown into a new research
and application field, reported, e.g., in the Biographical Data in Digital World
workshops BD2015 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], BD2017 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and BD2019. Data analyses related to those
of this paper have been made for dictionaries of biography of U.K. [9], Ireland [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
and Finland [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but not for the new AcademySampo data.
8 The AcademySampo LOD service is available at https://www.ldf.fi/dataset/yoma.
9 https://yasgui.triply.cc
10 https://colab.research.google.com/notebooks/intro.ipynb
11 https://jupyter.org
      </p>
      <p>
        Contributions Developing AcademySampo demonstrates, how textual
biographies can be transformed into linked data and be enriched with related
datasets as well as by reasoning new relations in the data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The resulting LOD,
published on the Linked Data Finland platform12 can be used by users without
programming skills via the ready-to-use tools integrated seamlessly with faceted
search and exploration in the portal. Alternatively, the SPARQL endpoint can be
uses flexibly for versatile data-analyses and visualizations with little knowledge
about SPARQL and programming.
      </p>
      <p>Acknowledgements Thanks to Yrjö Kotivuori and Veli-Matti Autio for
their seminal work in creating the original databases used in our work. This work
is related to the EU project InTaVia: In/Tangible European Heritage13, and
the EU COST action Nexus Linguarum14 on linguistic data science. CSC – IT
Center for Science provided computational resources for the work.
12 https://ldf.fi
13 https://intavia.eu/
14 https://nexuslinguarum.eu/the-action</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bhreathnach</surname>
          </string-name>
          , Ú.,
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fhinn</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleircín</surname>
            ,
            <given-names>G.Ó.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghallaigh</surname>
            ,
            <given-names>B.Ó.</given-names>
          </string-name>
          :
          <article-title>A quantitative analysis of biographical data from Ainm, the Irish-language biographical database (</article-title>
          <year>2019</year>
          ), http://doras.dcu.ie/23774/1/Ainm%20BD%
          <fpage>20FINAL</fpage>
          .docx.pdf,
          <source>presented at the 3rd Conference on Biographical Data in a Digital World (BD</source>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. ter Braake,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Anstke</surname>
          </string-name>
          <string-name>
            <surname>Fokkens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.S.</given-names>
            ,
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wandl-Vogt</surname>
          </string-name>
          , E. (eds.):
          <fpage>BD2015</fpage>
          ,
          <article-title>Biographical Data in a Digital World 2015</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , Vol-
          <volume>1399</volume>
          (
          <year>2015</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1272</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fokkens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ter Braake</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sluijter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arthur</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wandl-Vogt</surname>
          </string-name>
          , E. (eds.):
          <source>BD2017 Biographical Data in a Digital World 2015. CEUR Workshop Proceedings</source>
          , Vol-
          <volume>1399</volume>
          (
          <year>2017</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2119</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hyvönen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rantala</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ikkala</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuominen</surname>
          </string-name>
          , J.:
          <article-title>Akatemiasampoportaali ja -datapalvelu henkilöiden ja henkilöryhmien historialliseen tutkimukseen (AcademySampo portal and data service for biographical and prosopographical research)</article-title>
          .
          <source>Informaatiotutkimus</source>
          <volume>40</volume>
          (
          <issue>2</issue>
          ),
          <fpage>28</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2021</year>
          ), https://journal.fi/inf/article/ view/102656.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyvönen</surname>
          </string-name>
          , E.:
          <article-title>Linked open data service about historical Finnish academic people in 1640-1899</article-title>
          . In:
          <article-title>DHN 2020 Digital Humanities in the Nordic Countries</article-title>
          .
          <source>Proceedings of the Digital Humanities in the Nordic Countries 5th Conference</source>
          . pp.
          <fpage>284</fpage>
          -
          <lpage>292</lpage>
          . CEUR Workshop Proceedings, Vol.
          <volume>2612</volume>
          (
          <year>2020</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2612</volume>
          /short14.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyvönen</surname>
          </string-name>
          , E.:
          <article-title>Reconciling and using historical person registers as linked open data in the AcademySampo knowledge graph</article-title>
          .
          <source>In: Proceedings of the 20th International Semantic Web Conference (ISWC</source>
          <year>2021</year>
          ). Springer (
          <year>2021</year>
          ), https: //seco.cs.aalto.fi/publications/2021/leskinen-hyvonen-reconciling
          <article-title>-2021.pdf</article-title>
          , in press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tamper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyvönen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valjus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keravuori</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Analyzing biography collection historiographically as linked data: Case national biography of Finland (</article-title>
          <year>2021</year>
          ), https://seco.cs.aalto.fi/publications/2021/tamper-et-al-bs
          <article-title>-2021</article-title>
          .pdf, submitted.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>