=Paper= {{Paper |id=Vol-3152/BD2019_paper3 |storemode=property |title=Linked Data - A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web |pdfUrl=https://ceur-ws.org/Vol-3152/BD2019_paper_3.pdf |volume=Vol-3152 |authors=Eero Hyvönen,Petri Leskinen,Minna Tamper,Heikki Rantala,Esko Ikkala,Jouni Tuominen,Kirsi Keravuori |dblpUrl=https://dblp.org/rec/conf/bd/HyvonenLTRITK19 }} ==Linked Data - A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web== https://ceur-ws.org/Vol-3152/BD2019_paper_3.pdf
                         Linked Data – A Paradigm Shift for
           Publishing and Using Biography Collections on the Semantic Web

                   Eero Hyvönen1,2 , Petri Leskinen1 , Minna Tamper1 , Heikki Rantala1 ,
                         Esko Ikkala1 , Jouni Tuominen1,2 , and Kirsi Keravuori3

                            Semantic Computing Research Group (SeCo), Aalto University, Finland
                             1

                     2
                         HELDIG – Helsinki Centre for Digital Humanities, University of Helsinki, Finland
                                                 3
                                                   Finnish Literature Society
                          https://seco.cs.aalto.fi/projects/biografiasampo/en/
                                                                 Abstract
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the Web, based on Linked Data.
The idea is to represent biographical data in a harmonized, semantically interoperable form, which enables 1) data enrichment by
aggregating linked content from complementary, distributed, and heterogeneous data sources, as well as by reasoning, and 2) development
of intelligent services using machine “understandable” data. Based on the aggregated global knowledge graph, published in a SPARQL
endpoint, tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people can
be provided. As a demonstration of these ideas, we discuss the new in-use linked data service and semantic portal B IOGRAPHY S AMPO
– Finnish Biographies on the Semantic Web that quickly attracted thousands of end users on the Web. This semantic portal is based on a
knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 980 scholars. The texts are enriched
with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. Reasoning
is used for query expansion and for discovering serendipitous relations between entities, such as persons and places.


1. Biographical Dictionaries on the Web                                  mantic Web7 (Hyvönen et al., 2019) based on the NBF and
                                                                         other biography collections of the Finnish Literature Soci-
Biographical dictionaries (Keith, 2004) may contain tens
                                                                         ety8 . The idea is to 1) transform textual biographies into
of thousands of short biographies of historical persons of
                                                                         Linked Data by using language technology and knowledge
importance. Traditionally, such dictionaries have been pub-
                                                                         extraction, to 2) enrich the data by linking it to internal and
lished as printed book series. The Oxford Dictionary of Na-
                                                                         external data sources and by reasoning, to 3) publish the
tional Biography1 (ODNB), with more than 60 000 lives,
                                                                         data as a Linked Data service and a SPARQL endpoint on
was first published on-line in 2004, and since then major
                                                                         the web (Heath and Bizer, 2011; Hyvönen, 2012), and to 4)
biographical dictionaries have opened their editions on the
                                                                         create end-user applications on top of the service, including
Web with search engines for finding and (close) reading bi-
                                                                         data-analytic tools and visualizations for distant reading of
ographies of interest. On-line national biographical collec-
                                                                         Big Data.
tions include USA’s American National Biography2 , Ger-
many’s Neue Deutsche Biographie3 , Biography Portal of                   This paper considers B IOGRAPHY S AMPO from a publish-
the Netherlands4 , Dictionary of Swedish National Biogra-                ing paradigm shift perspective, complementing our ear-
phy5 , and National Biography of Finland6 (NBF).                         lier papers: In (Hyvönen et al., 2019), an overview of
                                                                         B IOGRAPHY S AMPO from an end-user’s point of view is
ODNB and other early adopters of web technology started
                                                                         presented; Knowledge extraction from texts is concerned
the paradigm shift in publishing and reading biographical
                                                                         in (Tamper et al., 2018); In (Tamper et al., 2019) network
dictionaries on the Web. We call such systems 2. gener-
                                                                         analysis of the biographies is in focus; In (Hyvönen and
ation publications. This paper argues for taking the next
                                                                         Rantala, 2019) relational search of named entities is dis-
step forward into 3. generation systems, i.e., to publish-
                                                                         cussed, yet another separate application perspective of the
ing and using biographical dictionaries as Linked Data on
                                                                         portal.
the Semantic Web. The goal is to serve both machine and
human readers, and support both close and distant read-                  In the following, we first present the underlying “Sampo”
ing (Shultz, June 24 2011). To demonstrate and evalu-                    publishing model and series of semantic portals whose new
ate this idea in practise, we present the new in-use sys-                member B IOGRAPHY S AMPO is. After this the underly-
tem B IOGRAPHY S AMPO – Finnish Biographies on the Se-                   ing knowledge graph is presented, and the new linked data
                                                                         based possibilities for biographical and prosopographical
                                                                         research are illustrated. In conclusion, related works are
   1
    https://www.oxforddnb.com                                            discussed and contributions summarized.
   2
    http://www.anb.org/aboutanb.html
  3
    http://www.ndb.badw-muenchen.de/ndb_auf
                                                                            7
gaben_e.htm                                                                   B IOGRAPHY S AMPO is available at http://biografias
  4
    http://www.biografischportaal.nl/en                                  ampo.fi. More information and publications are available at
  5
    https://sok.riksarkivet.se/Sbl/Start.asp                             the project homepage https://seco.cs.aalto.fi/pro
x?lang=en                                                                jects/biografiasampo/en/.
  6                                                                         8
    https://kansallisbiografia.fi/english                                     https://www.finlit.fi/en

  Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2. Sampo Model for Linked Data Publishing                            (2011) for tourism, BookSampo12 (2011) for fiction litera-
The ideas of the Semantic Web (SW) and Linked Data can               ture, WarSampo13 (2015) for military history, and Name-
be applied to address the problems of (semantic) data in-            Sampo14 (2019) for toponomastic research of historical
teroperability and distributed content creation at the same          place names. Our experiences suggest that the Sampo
time, as depicted in Fig. 1. Here the publication system             model is a promising way to create useful systems that end-
is illustrated by a circle. A shared semantic ontology in-           users like. For example, in 2018, BookSampo had ca. 2
frastructure is situated in the middle. It includes mutually         million users and WarSampo 230 000.
aligned metadata and shared domain ontologies, modeled               In B IOGRAPHY S AMPO, the knowledge graph was ex-
using SW standards. If content providers outside of the cir-         tracted from the biography collections listed in Table 1,
cle provide the system with (meta)data, it is automatically          linked not only internally but also enriched with links to
linked and enriched with each other and forms a knowledge            the external data sources listed in Table 2. In addition,
graph. For example, if metadata about a painting created             data was harvested from 1) the art collection data of the
by Picasso comes from an art museum, it can be enriched              National Gallery of Finland15 , 2) the National bibliogra-
(linked) with, e.g., biographies from Wikipedia and other            phy of Finland Fennica16 , 3) BookSampo semantic portal17
sources, photos taken of Picasso or by him, information              linked data for fiction literature (Mäkelä et al., 2011), 4) the
about his wives, books in a library describing his works of          critical edition of J.V. Snellman’s works (Snellman, 2002
art, related exhibitions open in museums, and so on. At the          2004)18 , and 5) the Finnish history ontology HISTO19 .
same time, the contents of any organization in the portal            The core biographies were converted into RDF by using a
having Picasso related material get enriched by the meta-            natural language pipeline described in more detail in (Tam-
data of the new artwork entered in the system. This is a             per et al., 2018). The data model used is an extension of
win-win “business model” for everybody to join such a sys-           CIDOC CRM (Doerr, 2003; Le Boeuf et al., 2019) that we
tem; collaboration pays off.                                         call Bio CRM (Tuominen et al., 2018). In this model, the
                                                                     life of a person is essentially a chain of events in time and
However, the model also creates new challenges. In ad-
                                                                     space which the person participated in different roles.
dition to enriching information also conflicting data from
                                                                     The knowledge graph was published in a Linked Data ser-
different sources may be aggregated, leading to problems
                                                                     vice20 on top of which the semantic portal B IOGRAPHY -
of data fusion. A solution to this is to maintain provenance
                                                                     S AMPO with seven application perspectives was imple-
metadata about the primary sources (cf., e.g., (Koho et al.,
                                                                     mented using a standard SPARQL endpoint API.
2019)). This is needed also in order to promote and sepa-
rate the identities of the data providers and to acknowledge
                                                                     3. New Ways for Using Biographies
their distinct contributions in the Sampo. Yet another is-
sue is how to maintain the Sampo when aggregated data or             3.1. From Text Publishing to Tooling for DH Research
the ontology infrastructure changes. To make this as au-             Data analysis in Digital Humanities (DH) is typically done
tomatic as possible, human involvement in the annotation             partly by the machine, partly by the human. In visualiza-
and publishing pipeline should be minimized, as suggested            tions, such as maps, timelines, and networks, the machine
in (Koho et al., 2018). However, taking the human out of             presents target data in a form from which the human user
the loop may lower the quality of data, and more source              is able to make interpretations more easily. In statistics,
criticism and understanding about the limitations of the au-         e.g., pie charts, line charts, and histograms are used. An-
tomatically annotated and aligned data is needed from the            other type of tooling is network analysis (Newman, 2018),
end-user (Hyvönen et al., 2019). In general, more collabo-          where different kind of connections between entities, such
ration and mutual agreements are needed between the pub-             as family relations between persons or references between
lishers, which complicates the publishing process. Also the          texts can be represented as graphs for visual inspection and
underlying technology needs new kind of expertise on se-             mathematical analysis. In data-analysis and knowledge dis-
mantic computing.                                                    covery, statistical or other patterns of data are searched for
The model of Fig. 1 fits well with Linked Data idea of
providing data as a service and as a live SPARQL end-                  12
                                                                          https://seco.cs.aalto.fi/applications/ki
point (Heath and Bizer, 2011), on top of which indepen-              rjasampo/
                                                                       13
dent applications can be created on the client side without               https://seco.cs.aalto.fi/projects/sotasa
server side concerns. We call this whole the Sampo9 model            mpo/en/
                                                                       14
(Hyvönen, 2012).                                                         https://seco.cs.aalto.fi/projects/nimisa
The model has been developed and tested in a se-                     mpo/
                                                                       15
                                                                          https://www.kansallisgalleria.fi/en/avoi
ries of several practical case studies, including Culture-
                                                                     n-data/
Sampo10 (2008) for cross-cultural contents, TravelSampo11              16
                                                                          https://www.kansalliskirjasto.fi/en/news
                                                                     /finnish-national-bibliography-released-a
   9
     In Finnish mythology and the epic Kalevala, ”Sampo” is a        s-open-data
                                                                       17
mythical artefact of indeterminate type that gives its owner rich-        http://kirjasampo.fi
                                                                       18
ness and good fortune, an ancient metaphor of technology.                 http://snellman.kootutteokset.fi
  10                                                                   19
     https://seco.cs.aalto.fi/applications/ku                             https://seco.cs.aalto.fi/ontologies/hist
lttuurisampo/                                                        o/
  11                                                                   20
     https://seco.cs.aalto.fi/applications/tr                             Hosted by the Linked Data Finland service http://ldf.
avelsampo/                                                           fi.
        Figure 1: Sampo model for Linked Data publishing is based on a shared ontology infrastructure in the middle.

                                     Dataset name                               # of People
                                     National Biography of Finland                    6478
                                     Business Leaders                                 2235
                                     Finnish Generals and Admirals 1809–1917            481
                                     Finnish Clergy 1554–1721                         2716
                                     Finnish Clergy 1800–1920                         1234
                                     Sum                                              13144
                                Table 1: Core bios provided by the Finnish Literature Society.


in order to find “interesting”, serendipitous (Aylett et al.,     only publish biographies with search interfaces, but also to
2012) new knowledge. Techniques such as topic modelling           incorporate ready to use tooling for DH research on top of
(Brett, 2012) fall in this category. The results also here typ-   the data service. In addition, the SPARQL endpoint makes
ically need human interpretation, as statistical methods are      it possible to study the data by custom designed queries in
usually unable to explain their results. In knowledge-based       situations where the ready to use interfaces are not enough
systems, knowledge structures can be used for this.               for problem solving. The SPARQL API can also be used
Many of the methods and tools above are well-defined and          for extracting and downloading filtered subsets of data from
domain independent, and there are lots software packages          the endpoint in different formats (e.g., CSV) to be used in
available for using them, such as Gephi21 , R (Field et al.,      external tools, such as spreadsheets, R, or Gephi. Some of
2015), and various Python and JavaScript libraries. How-          these new possibilities are illustrated below by the ready to
ever, each of them have their own input formats and user          use tools of B IOGRAPHY S AMPO.
interfaces, and need specific skills from the user. Further-
more, visualizations are crafted case by case; tools for for-
mulating, adjusting, and comparing analysis results in some       3.2. Examples: B IOGRAPHY S AMPO at Work
general ways would be helpful for the user.
Second generation dictionaries of biographies on the Web          Problem solving in DH often has two phases, as in the
are used in the following traditional way: a search box or        prosopographical research method (Verboven et al., 2007,
form is filled up specifying the person(s) whose biographies      p. 47): First, a target group of entities in the data is selected
are searched for. Then the search button is pushed, and a         that share desired characteristics for solving the research
list of hits is shown that can be opened for close reading by     question at hand (in the case of prosopography, a people
clicking. The paradigm chance of publishing biographies           group is selected). Second, the target group is analyzed, and
as linked data (third generation systems) makes it possi-         possibly compared with other groups, in order to solve the
ble to build systems based on live data services, especially      research question. Using B IOGRAPHY S AMPO based on the
SPARQL endpoints. In this way, also other parties can             same pattern: First, faceted search is used for filtering out a
reuse the data in their own applications. It is possible to not   biography or a group of them for prosopography. After this,
                                                                  versatile ready to use tooling can be applied for reading a
  21
       https://gephi.org                                          single biography or for analysing groups of biographies and
              Data Source                                # of Links     Description
              Wikipedia                                        5806     http://fi.wikipedia.org
              Wikidata                                         6424     http://www.wikidata.org
              Fennica                                          4007     National Bibliography of Finland
              BLF                                              1084     Biografiskt Lexikon för Finland
              BookSampo                                         715     Finnish fiction literature on the Semantic Web service
              WarSampo                                          288     Second World War LOD service and portal
              ULAN                                              193     Union List of Artist Names Online
              VIAF                                             2475     Virtual International Authority Files
              Geni.com                                         5320     Family research and family tree data
              Homepages                                          43     Personal web sites
              Parliament of Finland                             631     Members of Parliament of Finland 1917–2018
              University of Helsinki (UH) Registry              379     Students and faculty of UH in 1853–1899
              Sum                                            27586
                                 Table 2: External data sources linked to the B IOGRAPHY S AMPO.




                                        Figure 2: Homepage of Eliel Saarinen (1873–1950).


comparing them with each other or other groups.22                        web. On the right, recommendation links to related biogra-
Enriching the Reading Experience After finding a biogra-                 phies are given, e.g., to similar biographies based on their
phy of interest, B IOGRAPHY S AMPO provides the user with                linguistic content. On the top of the page, there are five (5)
an enriched reading view of the protagonist’s life by cre-               tabs providing data-analytic views of Saarinen.
ating automatically a ”homepage” for each person, based                  Network Analysis For example, Fig. 3 presents his ego-
on 1) data linking and 2) reasoning. Fig. 2 shows as an                  centric network based on the links between the bios in the
example the homepage of Eliel Saarinen (1873–1950), a                    NBF, with a coloring scheme indicating persons of differ-
prominent Finnish architect. The page contains six (6) tabs              ent types. The depth and other parameters of the network
providing different biographical views of the person, here               can be controlled by the widgets on the left. In Fig. 4, an-
two pages based on the NBF, data at the Linked Data Fin-                 other tab visualizes the international events of Saarinen’s
land service, a genealogical family tree and homepage by                 life on a map and on four timelines of different event types
the Geni.com service, and the Finnish Wikipedia article.                 (personal life, career, artistic or scientific creations, and ac-
The entry is linked to seven (7) external data sources on the            colades) for a spatiotemporal analysis.
                                                                         Filtering Groups for Prosopography To support proso-
   22
      A short video is available on the Web illustrating the ideas of    pography, B IOGRAPHY S AMPO employs faceted search for
B IOGRAPHY S AMPO: https://vimeo.com/328419960.                          filtering out not only individual persons but also groups of
                                 Figure 3: Egocentric network analysis of Eliel Saarinen.




                        Figure 4: Spatiotempral visualization of the events in Eliel Saarinen’s life.


them sharing some properties, such as profession, place         visualizing proportional distributions of professions, soci-
of birth, place of education, working organization, etc.        etal domains, and working organizations. 2) Event maps
Once the target group has been selected, various generic        show how different events (personal life events, career
data-analytic tools and visualizations can be applied to the    events, artistic and scientific creation events, and accolades)
group: 1) Statistical tools include histograms showing var-     participated in by the biographees are distributed on maps.
ious numeric value distributions of the biographees, e.g.,      3) Life charts summarize the lives of persons from a transi-
their ages, number of spouses and children, and pie charts      tional perspective as blue-red arrows from the birth places
(blue end) to the places of death (red end).                    al., 2018), are stored in a separate knowledge graph of over
These tools and visualization can be applied not only to one    100 million triples.
target group but also to two parallel groups in order to com-
pare them. For example, Fig. 5 compares the generals and        4. Related Works and Contributions
admirals of the Grand Duchy of Finland (1809–1917) (on          Aside the business of publishing biographical dictionaries
the left) with the clergy (1800–1920) (on the right). With      in print and on the Web, representing and analyzing bio-
a few selections from the facets the user can see that, for     graphical data has grown into a new research and appli-
some reason, quite a few officers moved the to south to die     cation field. In 2015, the first Biographical Data in Dig-
while the Lutheran ministers stayed more in Finland. The        ital World workshop BD2015 was held presenting several
arrows are interactive. For example, by clicking on the pe-     works on studying and analyzing biographies as data (ter
culiar upper arrow to the east, one can find out that this      Braake et al., 2015), and the proceedings of BD2017 con-
arrow was due to general Gustaf A. Silfverhjelm’s (1799–        tain more similar works (Fokkens et al., 2017b).
1864) biography, where one can learn that he become a           B IOGRAPHY S AMPO is a result of research in this area and
chief cartographer in western Siberia where he died.            is related to several other works. In (Larson, 2010) analytic
Searching for Historical Places B IOGRAPHY S AMPO also          visualizations were created based on a U.S. Legislator reg-
provides the user with a map search view that projects the      istry database. The work on B IOGRAPHY S AMPO is con-
places in which the ca. 100 000 biographical events ex-         tinuation to two Semantic NBF demonstrators (Hyvönen
tracted from the biographies are projected on the places        et al., 2014; Hyvönen et al., 2018), and the idea has been
where they occurred. The maps in this view are not only         applied also to a historical registry of students (Hyvönen
contemporary ones but also historical maps served by a sep-     et al., 2017) and to the U.S. Legislator data (Miyakita et
arate historical ontology and map service Hipla.fi23 . Many     al., 2018). However, B IOGRAPHY S AMPO extends these
important events of Finnish history took place in the eastern   systems into several new directions in terms of language
parts of the country that was annexed to the Soviet Union       technology used, the DH tooling provided, such as network
after the Second World War. Old Finnish places there may        analysis views, relational search, and text analysis views
have been destroyed, placenames been changed, and names         for studying the language of the biographies. Also more
are now written in Russian. Using semi-transparent histor-      heterogeneous datasets are now studied and used.
ical maps on top of contemporary maps solves the problem        Extracting RDF and OWL data from natural language texts
by giving a better historical context for the events.           has been studied in several works in semantic web re-
Relational Knowledge Discovery To utilize reasoning and         search, cf., e.g., (Gangemi et al., 2017). In BiographyNet24
knowledge discovery in Linked Data, an application per-         (Fokkens et al., 2017a), language technology was applied
spective for finding ”interesting/serendipitous” (Aylett et     to extracting entities and relations in RDF using the bi-
al., 2012) connections in the biographical knowledge graph      ographies of the Biography Portal of the Netherlands as
was created. This application idea is related to relational     data. This work was related to the larger NewsReader
search (Lohmann et al., 2010; Tartari and Hogan, 2018).         project for extracting structured data from news (Rospocher
However, in our case a new knowledge-based approach was         et al., 2016). The work on BiographyNet focuses more on
developed to find out in what ways (groups of) people are       challenges of natural language processing and managing
related to places and areas. This method rules out non-sense    the provenance information of data from multiple sources,
relations effectively and is able to create natural language    while the focus of B IOGRAPHY S AMPO is on providing the
explanations for the connections (Hyvönen and Rantala,         end-users, both DH researchers and the general public, with
2019). The queries are formulated and the problems are          intelligent search and browsing facilities, enriched reading
solved using faceted search. For example, the query ”How        experience, and easy to use data-analytic tooling for biogra-
are Finnish artists related to Italy?” is solved by select-     phy and prosopography. Extracting and studying biograph-
ing ”Italy” from the place facet and ”artist” from the pro-     ical networks has also been researched in the Six Degrees of
fession facet. The results include connections of different     Francis Bacon25 (Warren et al., 2016) project. The statis-
types (that could be filtered in another facet), e.g., ”Elin    tics views and idea of analysing the biographies as a collec-
Danielson-Gambogi received in 1899 the Florence City Art        tion of texts in B IOGRAPHY S AMPO is related to (Warren,
Award”. The system understands, for example, that Flo-          2018) where the ODNB is analysed as an artifact. In the
rence is in Italy based on the historical place ontology.       latter works, Linked Data is not used.
Text Analysis of Biographies The biographies can also           These lines of research are related to ours as they are
be analyzed by using linguistic analysis, providing yet an-     based on the idea of extracting semantic structures from
other different perspective for studying them. Both indi-       the largely unstructured biographical text collections, and
vidual bios as well as groups of them can be analyzed and       on using the data for DH research in biography and proso-
compared with each other as in prosopography above. For         pography. In addition and in contrast to the related works,
example, it turns out that the biographies of female mem-       B IOGRAPHY S AMPO employs the “Sampo model” where
bers of the Finnish Parliament frequently contain the words     the data is enriched through a shared content infrastruc-
”family” and ”child”, but these words are seldom used in        ture by related external heterogeneous datasets, here, e.g.,
the biographies of their male colleagues. The texts, ana-       collection databases of museums, libraries, and archives, a
lyzed by a natural language processing pipeline (Tamper et
                                                                  24
                                                                       http://www.biographynet.nl
  23                                                              25
       http://hipla.fi                                                 http://www.sixdegreesoffrancisbacon.com
Figure 5: Comparing the life charts of two prosopographical target groups, admirals and general (left) and clergy (right) of
the historical Grand Duchy of Finland (1809–1917).


critical edition, genealogical data, and various biographical     In Digital Futures (Third Annual Digital Economy Con-
data sources and semantic portals online. Another differ-         ference), 23-25 October, 2012, Aberdeen, UK.
ence is that in our work, a main goal has been to develop       Megan R. Brett. 2012. Topic modeling: A basic introduc-
and provide versatile DH tooling for end-users on top of a        tion. Journal of Digital Humanities, 2(1).
Linked Data SPARQL endpoint.                                    Martin Doerr. 2003. The CIDOC CRM—an ontological
This paper presented and demonstrated the vision of a para-       approach to semantic interoperability of metadata. AI
digm shift in publishing biography collections on the Se-         Magazine, 24(3):75–92.
mantic Web. The vision has also been operationalized and        Andy Field, Jeremy Miles, and Zoe Field. 2015. Discover-
implemented as the semantic portal B IOGRAPHY S AMPO              ing Statistics Using R. SAGE Publications Inc., USA.
now in use on the Web by thousands of users. The bio-           Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek
graphical data of the portal was extracted and aggregated         Vossen, Susan Legêne, Guus Schreiber, and Victor
automatically by the computer and has not been fully vali-        de Boer. 2017a. BiographyNet: Extracting relations be-
dated by human experts, which would be impossible due to          tween people and events. In Europa baut auf Biogra-
the amount and complexity of the big data. This is a typical      phien, pages 193–224. New Academic Press, Wien.
situation in DH research, and calls for using more source       Antske Fokkens, Serge ter Braake, Ronald Sluijter,
criticism when interpreting the analyses than when dealing        Paul Arthur, and Eveline Wandl-Vogt, editors. 2017b.
with human curated datasets. The quality and completeness         BD2017 Biographical Data in a Digital World 2015.
of the B IOGRAPHY S AMPO data has not yet been analyzed           CEUR Workshop Proceedings, Vol-1399.
formally, but our informal tests suggest that the results are   Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recu-
very useful even if errors are also encountered. This is the      pero, Andrea Giovanni Nuzzolese, Francesco Draicchio,
price to be paid for advanced end-user services and distant       and Misael Mongiovı̀. 2017. Semantic web machine
reading on distributed heterogeneous biographical data.           reading with FRED. Semantic Web Journal, 8:873–893.
Acknowledgements This research was part of the Severi           Tom Heath and Christian Bizer. 2011. Linked Data: Evolv-
project26 , funded mainly by Business Finland. Thanks to          ing the Web into a Global Data Space (1st edition). Syn-
CSC – IT Center for Science, Finland, for computational           thesis Lectures on the Semantic Web: Theory and Tech-
server resources for the data service and applications.           nology. Morgan & Claypool.
                                                                Eero Hyvönen and Heikki Rantala. 2019. Knowledge-
5. References                                                     based relation discovery in cultural heritage knowledge
R. S. Aylett, D. S. Bental, R. Stewart, J. Forth, and             graphs. In DHN2019, Digital Humanities in the Nordic
  G.Wiggins. 2012. Supporting serendipitous discovery.            Countries 2019. CEUR Workshop Proceedings, Vol-
                                                                  2364.
  26
       http://seco.cs.aalto.fi/projects/severi                  Eero Hyvönen, Miika Alonen, Esko Ikkala, and Eetu
  Mäkelä. 2014. Life stories as event-based linked data:      Goki Miyakita, Petri Leskinen, and Eero Hyvönen. 2018.
  Case semantic national biography. In Proceedings of             U.S. congress prosopograher - a tool for prosopograph-
  ISWC 2014 Posters & Demonstrations Track. CEUR                  ical research of legislators. In 7th International Confer-
  Workshop Proceedings, Vol-1272.                                 ence, EuroMed 2018, Nicosia, Cyprus. Springer-Verlag.
Eero Hyvönen. 2012. Publishing and using cultural her-         Mark Newman. 2018. Networks. Oxford University Press.
  itage linked data on the semantic web. Morgan & Clay-         Marco Rospocher, Marieke van Erp, Piek Vossen, Antske
  pool, Palo Alto, CA.                                            Fokkens, Itziar Aldabe, German Rigau, Aitor Soroa,
Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuomi-          Thomas Ploeger, and Tessel Bogaard. 2016. Building
  nen, and Laura Sirola. 2017. Reassembling and en-               event-centric knowledge graphs from news. Web Seman-
  riching the life stories in printed biographical registers:     tics: Science, Services and Agents on the World Wide
  Norssi high school alumni on the semantic web. In               Web, 37:132–151.
  Language, Technology and Knowledge, pages 113–119.            Kathryn Shultz. June, 24, 2011. What is distant reading?
  Springer–Verlag.                                                New York Times. https://www.nytimes.com/
Eero Hyvönen, Petri Leskinen, Minna Tamper, Jouni                2011/06/26/books/review/the-mechani
  Tuominen, and Kirsi Keravuori. 2018. Semantic na-               c-muse-what-is-distant-reading.html
  tional biography of Finland. In Proceedings of the Digi-        accessed: 13 August 2018.
  tal Humanities in the Nordic Countries, 3rd Conference        Johan V. Snellman. 2002–2004. J. V. Snellman: Kootut
  (DHN 2018), pages 372–385. CEUR Workshop Proceed-               teokset 1–24. Ministry of Education and Culture,
  ings, Vol-2084.                                                 Helsinki, Finland.
Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki             Minna Tamper, Petri Leskinen, Kasper Apajalahti, and
  Rantala, Esko Ikkala, Jouni Tuominen, and Kirsi Ker-            Eero Hyvönen. 2018. Using biographical texts as linked
  avuori. 2019. BiographySampo – publishing and enrich-           data for prosopographical research and applications. In
  ing biographies on the semantic web for digital human-          7th International Conference, EuroMed 2018, Nicosia,
  ities research. In Proceedings of the 16th Extended Se-         Cyprus, Proceedings, Part I, pages 125–137. Springer–
  mantic Web Conference (ESWC 2019). Springer–Verlag.             Verlag.
Thomas Keith. 2004. Changing conceptions of National            Minna Tamper, Eero Hyvönen, and Petri Leskinen. 2019.
  Biography. Cambridge University Press.                          Visualizing and analyzing networks of named entities in
                                                                  biographical dictionaries for digital humanities research.
Mikko Koho, Esko Ikkala, Erkki Heino, and Eero Hyvönen.
                                                                  In Proceedings of the 20th International Conference on
  2018. Maintaining a linked data cloud and data ser-
                                                                  Computational Linguistics and Intelligent Text Process-
  vice for Second World War history. In Digital Her-
                                                                  ing (CICling 2019). Springer–Verlag, April. Accepted.
  itage. Progress in Cultural Heritage: Documentation,
                                                                Gonzalo Tartari and Aidan Hogan. 2018. WiSP: Weighted
  Preservation, and Protection. 7th International Confer-
                                                                  shortest paths for RDF graphs. In Proceedings of VOILA
  ence, EuroMed 2018, Nicosia, Cyprus, volume 11196.
                                                                  2018. CEUR Workshop Proceedings, Vol-2187.
  Springer–Verlag.
                                                                Serge ter Braake, Ronald Sluijter Anstke Fokkens, Thierry
Mikko Koho, Esko Ikkala, and Eero Hyvönen. 2019. Re-
                                                                  Declerck, and Eveline Wandl-Vogt, editors. 2015.
  assembling the lives of Finnish prisoners of the Second
                                                                  BD2015, Biographical Data in a Digital World 2015.
  World War on the semantic web. In BD-2019, Biograph-
                                                                  CEUR Workshop Proceedings, Vol-1399.
  ical Data in a Digital World 2019. CEUR Workshop Pro-
                                                                Jouni Tuominen, Eero Hyvönen, and Petri Leskinen. 2018.
  ceedings, http://ceur-ws.org. Accepted.
                                                                  Bio CRM: A data model for representing biographical
Ray Larson. 2010. Bringing lives to light: Biogra-                data for prosopographical research. In BD-2017, Bio-
  phy in context. Final Project Report, University of             graphical Data in a Digital World 2017, pages 59–66.
  Berkeley, http://metadata.berkeley.edu/Bi                       CEUR Workshop Proceedings, Vol-2119.
  ography_Final_Report.pdf.                                     Koenraad Verboven, Myriam Carlier, and Jan Dumolyn.
Patrick Le Boeuf, Martin Doerr, Christian Emil Ore,               2007. A short manual to the art of prosopography. In
  and Stephen Stead, editors. 2019. Definition of                 Prosopography approaches and applications. A hand-
  the CIDOC Conceptual Reference Model, Version                   book, pages 35–70. Unit for Prosopographical Research
  6.2.6. ICOM/CIDOC Documentation Standards Group                 (Linacre College).
  (CIDOC CRM Special Interest Group). http://www.               Christopher N. Warren, Daniel Shore, Jessica Otis,
  cidoc-crm.org/Version/version-6.2.6.                            Lawrence Wang, Mike Finegold, and Cosma Shalizi.
Steffen Lohmann, Philipp Heim, Timo Stegemann, and                2016. Six Degrees of Francis Bacon: A Statistical
  Jürgen Ziegler. 2010. The RelFinder user interface: In-        Method for Reconstructing Large Historical Social Net-
  teractive exploration of relationships between objects of       works. DHQ: Digital Humanities Quarterly, 10(3).
  interest. In Proceedings of the 14th International Con-       Christopher N. Warren. 2018. Historiography’s two
  ference on Intelligent User Interfaces (IUI 2010), pages        voices: Data infrastructure and history at scale
  421–422. ACM.                                                   in the Oxford Dictionary of National Biography
Eetu Mäkelä, Kaisa Hypén, and Eero Hyvönen. 2011.             (ODNB). Journal of Cultural Analytics, November 22.
  BookSampo—lessons learned in creating a semantic por-           doi:10.31235/osf.io/rbkdh.
  tal for fiction literature. In The Semantic Web – ISWC
  201, pages 173–188. Springer–Verlag.