<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Second World War on the Semantic Web: The WarSampo Project and Semantic Portal</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eero Hyvo¨nen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jouni Tuominen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eetu Ma¨kela¨</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je´re´mie Dutruit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kasper Apajalahti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erkki Heino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petri Leskinen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esko Ikkala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data</institution>
          ,
          <addr-line>Metadata Models, and Ontologies</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Semantic Computing Research Group (SeCo), Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>5</lpage>
      <abstract>
        <p>This paper initiates and fosters work on publishing Linked Open Data about the Second World War. It is argued that the heterogeneous, distributed data about the international world war history makes a promising use case for semantic technologies. We hope that by making war data openly available we can learn from the past and promote peace. According to Georg Wilhelm Friedrich Hegel “we learn from history that we learn nothing from history”. Hopefully this is not the case for the Second World War (WW2), now that fighting has started again even within the borders of Europe in Ukraine. One way to promote peace is to make reliable data about the war openly available for everybody to learn. WarSampo is a project and semantic portal that aims at this goal by publishing large heterogeneous sets of data about the WW2 in Finland as Linked Open Data (LOD). Application demonstrators are built that provide different perspectives in war history, for both historians and the public. The data covers the Winter War 1939-1940 against the Soviet attack, the Continuation War 1941-1944 where the occupied areas of the Winter War were temporarily regained, and the Lapland War 1944-1945, where the Finns pushed the German troops away from Lapland. WarSampo1 is the next step in our series of “Sampo” portals based on Linked Data, including CultureSampo2 [9], BookSampo3, and TravelSampo4 and continues our earlier works on modeling the First World War [6,8]. The project started in autumn 2014 and is finished in 2017, by the centennial of Finland's independence. Data The project deals initially with the datasets presented in Table 1. The casualties data (1) includes data about the deaths in action during the wars. War diaries (2) are digitized authentic documentations of the troop actions in the frontiers. Photos and films 1 http://www.sotasampo.fi 2 http://www.kulttuurisampo.fi 3 http://www.kirjasampo.fi 4 http://www.seco.tkk.fi/projects/subi</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Dataset Name</p>
      <p>Providing organization</p>
      <p>Size
Casualties of WW2</p>
      <p>National Archives
War diaries
Photos &amp; films
Kansa taisteli
magazine articles
Karelian places
Karelian maps
Audio &amp; films</p>
      <p>National Archives
Defence Forces &amp;
Military Museum
Bonnier &amp; The Assoc. for
Military History in Finland
National Land Survey
National Land Survey
National Broadcasting
Company YLE
93,000 death records
23,000 war diaries of troops
160,000 photos &amp; films
3,360 articles of
veteran soldiers
30,000 places of
the annexed Karelia
War time maps of Karelia
250 recordings and films
(3) were taken during the war by the troops of the Defense Forces. The Kansa taisteli
magazine (4) was published in 1957–1986; its articles contain mostly memories of the
men that fought on the fronts. Karelian places (5) and maps (6) cover the war zone area
in pre-war Finland that was finally annexed by the Soviet Union. YLE’s audio and film
material (7) (“Living Archive”) was recorded during the war, or is related to it.</p>
      <p>
        Metadata Models CIDOC CRM5 is used as the harmonizing basis for modeling
data, with events providing the semantic glue for data linking [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our data model for
WW1, presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], is used as the metadata model to start with.
      </p>
      <p>
        Domain Ontologies The data is annotated using a set of domain ontologies,
including: 1) an ontology of the troops and their hierarchies, 2) persons with their ranks and
roles, 3) place ontology of historical places, 4) event ontology of battles, politics, and
other war time incidents, 5) an ontology of time periods, 6) ontology of weapons, 7)
ontology of vessels, and 8) a subject matter ontology. For 1–7 we have harvested named
entities from the datasets, given them URIs and labels and some initial structure, as
needed in our initial demos (discussed below). However, ontology modeling and
development is still underway. A challenge of the actor ontologies, for example, is modeling
the changes: names and positions of the troops as well as the roles of the personnel in
the army change frequently (e.g., promotions of persons and changes in troop
leadership) and have to be conditioned on time. For 8, the KOKO ontology, a center piece of
the Finnish ontology infrastructure [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], is used.
3
      </p>
      <p>Applications: Perspectives to War History
The data and ontologies are published using SPARQL endpoints that form the basis of
the WarSampo semantic portal and its applications. The idea of the portal is to provide a
variety of different kind of perspectives to war data, represented on different tabs. Most</p>
    </sec>
    <sec id="sec-2">
      <title>5 http://cidoc-crm.org</title>
      <p>datasets will have their own perspective, where the user can first search data of interest
and then get linked data related to the resources found. The perspectives enrich each
other via Linked Data.</p>
      <p>Initial prototypes for two perspectives have already been implemented: one for the
war casualty data and one for the Kansa taisteli magazine articles. Fig. 1 depicts the
user interface for the casualty data of 93,000 death incidents, with 6 facets on the left
(marital status, gender, citizenship, nationality, mother tongue, and death category). On
the top, an interactive timeline for the time facet is shown and below it there is a heat
map illustrating the death counts on the maps during the selected time interval. Later on,
death records will be enriched with links to, e.g., war diaries related to the dead person’s
troop, related photos, and articles. The second demonstrator provides a faceted search
interface to Kansa taisteli magazine articles, and links each article to further contextual
data, such as related places, Wikipedia articles, troops, persons etc. based on the article
metadata. Links to WarSampo demonstrators as well as further information about the
project is provided at http://www.sotasampo.fi/en/.</p>
      <p>
        WarSampo is implemented using the “7-star” Linked Data Finland platform6 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
based on Fuseki7 with a Varnish Cache8 front end for serving LOD. As a first
official LOD publication, the casualty data from the National Archives is already publicly
available for everyone to use9.
      </p>
    </sec>
    <sec id="sec-3">
      <title>6 http://www.ldf.fi 7 http://jena.apache.org/documentation/serving data/ 8 https://www.varnish-cache.org 9 http://www.ldf.fi/dataset/narc-menehtyneet1939-45</title>
      <p>
        There are several projects publishing WW1 data on the web, such as Europeana
Collections 1914–191810, 1914–1918 Online11, WW1 Discovery12, Out of the Trenches13,
CENDARI14, Muninn15, and WW1LOD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. War history makes a promising use case
for Linked Data because war data is heterogeneous, distributed in different countries
and organizations, and written in different languages [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Many web sites publish data about the WW2. For example, the key datasets of
WarSampo have been published in Finland by our collaborators, and in other countries
many more sites are online, such as the World War II Database16 to name one. However,
there are only few works on linking WW2 data, such as [
        <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
        ]. Much of the WW2 data
is still confidential because people involved in the incidents or their close relatives are
still alive. WarSampo contributes to related research by initiating and fostering large
scale LOD publication of WW2 data, based on event-based data modeling. Our work is
funded by the Ministry of Education and Culture and Finnish Cultural Foundation.
10 http://www.europeana-collections-1914-1918.eu
11 http://www.1914-1918-online.net
12 http://ww1.discovery.ac.uk
13 http://www.canadiana.ca/en/pcdhn-lod/
14 http://www.cendari.eu/research/first-world-war-studies/
15 http://blog.muninn-project.org
16 http://ww2db.com
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>de Boer</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>van Doornik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitinck</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marx</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veken</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linking the kingdom: enriched access to a historiographical text</article-title>
          .
          <source>In: Proc. of the 7th International Conference on Knowledge Capture (KCAP</source>
          <year>2013</year>
          ). pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          . Association of Computing Machinery, New York (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Collins,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Mulholland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Zdrahal</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Semantic browsing of digital collections</article-title>
          .
          <source>In: Proc. of the 4th International Semantic Web Conference (ISWC</source>
          <year>2005</year>
          ). Springer-Verlag (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Doerr</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The CIDOC CRM - an ontological approach to semantic interoperability of metadata</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>24</volume>
          (
          <issue>3</issue>
          ),
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Hyvo¨ nen, E.,
          <string-name>
            <surname>Viljanen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuominen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Seppa¨la¨,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Building a national semantic web ontology and ontology service infrastructure-the FinnONTO approach</article-title>
          .
          <source>In: Proc. of the 5th European Semantic Web Conference (ESWC</source>
          <year>2008</year>
          ). pp.
          <fpage>95</fpage>
          -
          <lpage>109</lpage>
          . Springer-Verlag (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Hyvo¨ nen, E.:
          <article-title>Publishing and using cultural heritage linked data on the semantic web</article-title>
          . Morgan &amp; Claypool, Palo Alto, CA, USA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Hyvo¨ nen, E.,
          <string-name>
            <surname>Lindquist</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , To¨rnroos, J., Ma¨kela¨, E.:
          <article-title>History on the semantic web as linked data - an event gazetteer and timeline for</article-title>
          <source>World War I. In: Proc. of CIDOC 2012 - Enriching Cultural Heritage</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Hyvo¨ nen, E.,
          <string-name>
            <surname>Tuominen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Ma¨kela¨, E.:
          <article-title>Linked Data Finland: A 7-star model and platform for publishing and re-using linked datasets</article-title>
          .
          <source>In: The Semantic Web: ESWC 2014 Satellite Events, Revised Selected Papers</source>
          . pp.
          <fpage>226</fpage>
          -
          <lpage>230</lpage>
          . Springer-Verlag (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Ma¨kela¨, E., T o¨rnroos, J.,
          <string-name>
            <surname>Lindquist</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Hyvo¨ nen, E.:
          <article-title>World War I as Linked Open Data (</article-title>
          <year>2015</year>
          ), http://www.seco.tkk.fi/publications/submitted/makela-et-al
          <article-title>-ww1lod.pdf, submitted for review</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Ma¨kela¨, E., Hyv o¨nen, E.,
          <string-name>
            <surname>Ruotsalo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>How to deal with massively heterogeneous cultural heritage data - lessons learned in CultureSampo</article-title>
          .
          <source>Semantic Web - Interoperability, Usability, Applicability</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>85</fpage>
          -
          <lpage>109</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>