<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Crete,Hersonissos, Greece
$ minna.tamper@aalto.fi (M. Tamper); rafael.leal@aalto.fi (R. Leal); laura.sinikallio@helsinki.fi (L. Sinikallio);
petri.leskinen@aalto.fi (P. Leskinen); jouni.tuominen@helsinki.fi (J. Tuominen); eero.hyvonen@aalto.fi
(E. Hyvönen)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Extracting Knowledge from Parliamentary Debates for Studying Political Culture and Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minna Tamper</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Leal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Sinikallio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petri Leskinen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jouni Tuominen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eero Hyvönen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalto University (Semantic Computing Research Group - SeCo)</institution>
          ,
          <addr-line>Finland. https:// seco.cs.aalto.fi</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Helsinki (HELDIG and HSSH)</institution>
          ,
          <addr-line>Finland. https:// heldig.fi</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper presents knowledge extraction and natural language processing methods used to enrich the knowledge graph of the plenary debates (textual transcripts of speeches) of the Parliament of Finland. This knowledge graph includes some 960 000 speeches (1907-2021) interlinked with a prosopographical knowledge graph about the politicians. A recent subset of the speeches was used to extract named entities and topical keywords for semantic searching and browsing the data and for data analysis. The process is based on linguistic analysis, named entity linking, and automatic subject indexing. The results were included into the ParliamentSampo knowledge graph in a SPARQL endpoint. This data can be used for studying parliamentary language and culture in Digital Humanities research and for developing applications, such as the ParliamentSampo portal.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;parliamentary studies</kwd>
        <kwd>natural language processing</kwd>
        <kwd>linked data</kwd>
        <kwd>digital humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Parliaments enact new laws, oversee the work of the government, and decide on the state
budget. Parliamentary data are used in many areas of research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], as they provide a wealth of
information on the state and functioning of democratic systems, political life and, more generally,
language and culture. For these reasons, a lot of parliamentary materials have been digitized
in recent decades [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Digitized parliamentary materials ofer a wide range of perspectives on
diferent research topics and have been used in a variety of fields, such as linguistics, political
science, economics, and history. A most important research material for parliament studies are
the debates in the parliaments, i.e., sequences of transliterated speeches (minutes) of Members
of Parliament (MP) and other politicians, through which one can study the language and its
changes itself as well as the underlying societal phenomena at large [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        This paper argues and shows that by enriching textual parliamentary speeches with linked
data using knowledge extraction methods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it is possible to support Digital Humanities
(DH) research and enhance the usability of the data in applications, such as semantic search,
browsing, and data analysis. As a case study, a part the ca. 960 000 speeches of the system
ParliamentSampo – Finnish Parliament on the Semantic Web [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] are used. In an earlier work,
the speeches covering the whole history 1907–2021 of the Parliament of Finland (PoF) were
extracted from original heterogeneous data sources and transformed into a speech knowledge
graph (S-KG) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (and also into the Parla-CLARIN format1). At the same time, the S-KG was
interlinked with a prosopographical KG (P-KG) representing detailed biographical data and
networks of the ca. 2800 MPs and politicians involved in the PoF activities [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Both graphs
were published as a LOD service on the Linked Data Finland platform LDF.fi [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], including a
SPARQL endpoint2. In this paper, the textual speeches of this ParliamentSampo dataset are
enriched further using knowledge extraction techniques in order to support DH [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] analysis
and for further development of the semantic portal ParliamentSampo on top of the endpoint.
      </p>
      <p>In this paper, we first shortly overview the related work (Section 2) followed by the description
of the speech data and then focus on the new data enrichments using Natural Language
Processing (NLP) methods (Section 3). Section 4 discusses how the new data can be utilized
in the ParliamentSampo portal. Lastly, the contributions of this work are summarized and
discussed (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Regarding the digitization of parliamentary data, plenary debates have been in central role,
e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the CLARIN list of parliamentary corpora3 in diferent countries. Parliamentary
materials have also been transformed into linked data, too. A prominent example of this is the
LinkedEP [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] system on the European Parliament’s data. Linked data has also been used in the
Italian Parliament4, and the LinkedSaeima for the Latvian parliament [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in addition to the
Finnish ParliamentSampo system [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] whose data [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] was re-used in the paper.
      </p>
      <p>
        Knowledge extraction has been applied to enrich datasets to enable distant reading approaches
to studying parliamentary debates. For example, the Latvian LinkedSaeima dataset has utilized
named entity linking (NEL) to enrich their metadata. Similarly, the Dutch parliamentary debates
dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] has been enriched with named entities (NE). The Slovenian siParl corpus [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
includes with linguistic information about the parliamentary debates in CONLL-U format.
      </p>
      <p>
        The NLP methods used in this work have been developed mainly for handling Finnish texts.
With respect to NER, some of the most relevant tools are StanfordNER [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], FiNER [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and
the FinBERT based NER tool, of which the last one is currently estimated to be the most
accurate [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">18, 19, 20</xref>
        ]. Similarly, there are morphological analyzers for Finnish besides the Turku
Neural parser, such as the two used in this paper, Voikko and uralicNLP, the latter of which
employs Omorfi [ 21]. Regarding entity linking, there are few tools available for Finnish, such
as ARPA [22]. Various tools have been created also for other languages to link NEs to diferent
1https://github.com/clarin-eric/parla-clarin
2The data will be published openly using the CC BY 4.0 license by the end of 2022.
3https://www.clarin.eu/resource-families/parliamentary-corpora
4http://data.camera.it
datasets, such as [23, 24, 25].
      </p>
      <p>
        In Finland, parliamentary materials have been digitized and utilized to some extent in DH and
social science research. For example, [26] examines the diferences in political speech between
parties throughout the parliamentary period 1907–2018. In [27], the content of the plenary
speeches given in Parliament in 1999–2014 were studied by using topic modeling. Also, in [28]
the debates were examined. However, data have so far been used only in a few studies that
deploy methods from corpus linguistics, language technology, or computer science [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Previous search applications for the Finnish parliamentary speech data are based mostly on
traditional text search. However, search applications have been developed for other digitized
and enriched Cultural Heritage datasets [29, 30]. The data analysis tools to examine the results
are few, such as the concordance analysis of the Language Bank of Finland5, where the words
are visualized in their textual contexts and show some statistics of occurrences in the search
results. The Language Bank’s tool has many corpora and one small corpus covering a small
part of the entire time series of the Finnish parliamentary speeches.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets and Knowledge Extraction</title>
      <sec id="sec-3-1">
        <title>3.1. Core Datasets</title>
        <p>
          The ParliamentSampo system includes data about the MPs, parliamentary speeches, and
political organizations within the PoF. The data covers also the comments of the Speaker
(President) of the PoF and all other small comments recorded in the minutes, e.g., in connection
with voting proceedings. The ParliamentSampo data contains two major parts:
1. The Prosopographic Knowledge Graph The Prosopographic Knowledge Graph [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
covers all MPs of Finland since the year 1907. At its core lies a RDF conversion of data about MPs
from the originally XML-formatted Open Data service6 of PoF. In addition to basic information,
such as times and places of birth and death, the data includes detailed information about
politicians’ life events, such as studies, working life, political career, and their written publications.
In addition to people, the graph contains information about organizations, professions, and
positions, as well as places. Organizations include, e.g., parties, ministries, parliamentary groups,
committees, and constituencies, as well as schools, organizations, and companies outside the
political community.
        </p>
        <p>
          2. The Parliamentary Speeches Knowledge Graph The knowledge graph of parliamentary
speeches contains speeches collected from all the minutes of the plenary sessions of the PoF
since 1907 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This knowledge graph was compiled from the documents available on the Open
Data services7 and web sites8 of the PoF. Depending on the time period they covered, the
documents were available in diferent formats: PDF, HTML, or XML. PDF documents were
transformed into text with OCR.
        </p>
        <p>In addition to the actual speeches, the speech graph contains all the relevant metadata
attached to the minutes, such as interjections, information about the session where the speech
5https://www.kielipankki.fi/support/access/
6https://avoindata.eduskunta.fi/#/fi/dbsearch
7https://avoindata.eduskunta.fi/#/fi/home
8https://www.eduskunta.fi/fi/Sivut/default.aspx
was given (time, date, serial number, etc.), speaker information (name, role, party) and possible
topic of discussion, and supporting documents (e.g. committee report). Based on the metadata,
the speeches were linked to the MPs P-KG. For example, speakers and the parties they represent
are resources with URI identifiers described in the P-KG.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Knowledge Extraction</title>
        <p>In our work, the speech knowledge graph was enriched using various NLP methods. The toolset
that was used in enriching the BiographySampo dataset [31, 32] was re-used together with
new methods for NER, lemmatization, and automatic subject indexing. The parliamentary
debates were enriched with named entity recognition (NER) and linking, subject indexing,
and by creating a linguistic knowledge graph containing linguistic details for the speeches.
Here, NLP methods were used on a subset of the speeches dataset, consisting of speeches from
parliamentary session 2015 to the end of parliamentary session 2021, totaling in a little over 114
000 speeches, covering about 12% of the speeches dataset.</p>
        <p>Lemmatization and Subject Indexing The Secompling9 was used for the tasks of
lemmatization and subject indexing. It is an under-development library, which aims at integrating
diferent Finnish NLP tools.</p>
        <p>Lemmatization can be seen of as a kind of text normalization, especially for a language
as morphologically rich as Finnish, which has 15 inflectional cases and a rich system for
derivative words. Lemmatization enables exact term-based search instead of wildcard-based
stemming. Lemmatization allows word count-based algorithms, such as TF-IDF, to work with
more precision. Secompling employs the Turku Neural parser pipeline [33, 34] for lemmatization,
and Voikko10 and uralicNLP [35] to check and possibly fix errors regarding these base forms.
The Secompling lemmatization module has not been formally evaluated yet.</p>
        <p>Subject indexing allows texts to be described succinctly by focusing on keywords that best
characterize their contents. In our work, the subject indexing tool Annif [36], developed by the
National Library of Finland, is used for this task. As Annif is capable of using
machine-learningbased correlational backends such as Parabel, it may sometimes suggest NEs not mentioned
in the texts. Since we focus on entities that are actually mentioned in the speeches, NEs from
Annif were ignored. The other subject keywords are filtered out according to their weight. The
keywords provided by Annif are entities from the General Finnish Ontology YSO11 – which is
part on the national Finnish LOD infrastucture [37] – with ready-to-use URIs for data linking.</p>
        <p>A total of 10467 keywords were identified for this dataset, with an average of 23.23 keywords
per text, a maximum of 78, a minimum of 1 and a standard deviation of 8.46. The most common
subjects were poliitikot ’politicians’ (around 49% of the texts), ministerit ’ministers’ (ca. 45%),
kunnat ’municipalities’ (ca. 42%) and lainsäädäntö ’legislation’ (ca. 39%).</p>
        <p>
          Named Entity Recognition and Linking NEL was performed on the speeches to improve
data browsing and searching in the ParliamentSampo portal. Similarly to the analytics done
for the textual biographies in the BiographySampo [32] system, NEL enables more detailed data
analytics in the ParliamentSampo dataset, too. NEs were extracted using the Nelli [38, 39]
9https://version.aalto.fi/gitlab/seco/secompling
10https://voikko.puimula.org/
11https://finto.fi/yso/fi/new?clang=en
tool and its results linked using the ARPA. Unlike in the BiographySampo dataset, here Nelli
was configured to use FinBERT’s combined NER model [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], Reksi [38], and the Turku Neural
parser pipeline. FinBERT’s NER tool that is coupled with Reksi to pick up links to legislation,
references to various dates, and identifiers (e.g., URLs). The Turku Neural parser was selected for
morphological analysis based on its performance [33]. These tools extracted entities that were
later linked using ARPA to the ParliamentSampo dataset and to other external datasets, such as
the Kanto ontology of Finnish actors12, the Place Name Register PNR ontology of contemporary
Finnish places13, and the YSO places14 ontology that contains also historical places mentioned
in the speeches.
        </p>
        <p>The tools used for NER managed to extract NEs from 89% of the speeches of which 30%
contained people, 19% mentions of time, 12% organizations, and 7% of places. However, the
linking of entities requires still some work. For example, the full name references to MPs were
linked while the surname references were not. The place mentions were linked mostly correctly,
however, the target ontologies lacked some mentioned place names like Wuhan.</p>
        <p>Morpholinguistic Knowledge Graph Lastly, the speeches were transformed into a separate
morpholinguistic knowledge graph (MLKG) containing detailed linguistic and morphological
information about the speeches using a pipeline previously used for BiographySampo [40]. This
graph can be used for linguistic analysis of the parliamentary speeches similarly to the work
done in BiographySampo [32]. For example, in the BiographySampo dataset, it was noticed that
biographies of women contained more family-related terminology while biographies about men
used more words related to war and religion. In order to apply same methods to parliamentary
speeches, a similar pipeline was used, updated to use the Turku Neural parser pipeline, and
adjusted to handle larger datasets in smaller chunks of text. In this case, it was configured to
process data by year. The results are also linked to the ParliamentSampo speeches dataset to
enable analysis of speeches using the speech metadata.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Using the Enriched Data in ParliamentSampo</title>
      <p>
        The enriched ParliamentSampo data is used in the development of the ParliamentSampo
portal [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which is based on the Sampo model [41] and the Sampo-UI framework [42]. The portal
demonstrates how the data service can be used for developing applications for DH research. In
this application the data can be browsed using ontology-based faceted search, and the results
can then be analyzed with the integrated visualization and data analysis tools.
      </p>
      <p>The enriched data is initially used to boost the browsing and searching capabilities of the
portal. For instance, NEs and keywords can be used via facets to find speeches that mention
a specific topic or a NE, such as a place or an organization. Coupled with the speeches, their
metadata, and the prosopographical data, this enables studying, e.g., how MPs talk about matters
related to their constituency. At the moment mentioned organizations and places have already
been added into portal as facets to test the data.</p>
      <p>Currently, the MLKG about the speeches is limited to a few years. It is not yet included in the
12https://finto.fi/finaf/fi/
13https://www.ldf.fi/dataset/pnr/
14https://finto.fi/yso-paikat/en/
ParliamentSampo portal, but we plan to add it and develop similar linguistic analysis views as
in BiographySampo. This enables, e.g., to study the vocabulary used by the MPs and parties in
their speeches. It is also possible to compare diferences in vocabulary of men and women.</p>
      <p>In addition, the SPARQL endpoint underlying the ParliamentSampo portal can be used for
querying, analyzing, and visualizing the enriched data. In Fig. 1, e.g., the speeches mentioning
Finland’s neighbouring countries Norway, Sweden, Estonia, and Russia are counted on a yearly
basis and plotted from 2015 to 2021. The plot shows that Sweden appears more frequently than
the other neighbours. Russia is also mentioned increasingly in 2020 and 2021. Based on initial
analysis of the speeches mentioning Russia and Sweden, the discussions and their frequencies
are related to topics such as the annexing of Crimea, managing good relations, defence and
security of Finland and its nearby areas, such as the Baltic sea. These mentions reflect the
working order of the parliament, the domestic and world events described in the media at the
time. It remains as future work to study the context of these mentions in more detail. Similarly,
by linking to place ontologies it is possible to leverage the benefit of organized information to
create visualizations that cluster all, e.g., Russia-related place names as mentions about Russia.
It also enables the use of map-based visualizations.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        In this paper, we presented work for enriching the Finnish parliamentary debate corpus to
support data browsing and using it for DH research. This is ongoing work that still requires
adjustments and extensive evaluation, similarly, the ParliamentSampo portal is still under
development. The tools used in the enrichment have been previously evaluated with diferent
corpora, but not for the parliamentary data. The FinBERT NER tool has achieved an accuracy
of 93.11% using the combined model in cross-corpus evaluation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Similarly, the Turku
Neural Parser pipeline is evaluated based on CoNLL 2018 UD Shared Task [43] with accuracy of
LAS15 86.60%, UPOS16 96.66%, and XPOS17 97.63% [33]. Subject indexing is dificult to evaluate,
however based on evaluation of the Annif tool, its accuracy is 30–50% depending on the test
corpus [44]. These results have been produced on formal Finnish language texts similar to the
Finnish parliamentary debates corpus.
      </p>
      <p>15Labeled attachment score (LAS) is the proportion of words that have connected correctly the head word with
the right dependency relation.</p>
      <p>16Universal part- of-speech tagging
17Language-specific part-of-speech tagging</p>
      <p>The data has been partially added to the ParliamentSampo knowledge graph and
utilized already in the facets of the semantic portal. The enriched data enables DH research
through topics and NEs. The enrichments help to find interesting phenomenon in the
ParliamentSampo dataset. Similarly to the Finnish dataset, NEs have been added, e.g., into the Dutch
and Latvian parliamentary debate corpora. The linked NEs and keywords enable data analytics
and search optimization in the faceted search application. The MLKG contains millions of
triples of morphological and linguistic information as linked data. Unlike, e.g., Slovenian debate
corpora, the Finnish dataset can be queried directly using SPARQL to analyze speeches using
also the metadata. However, due to size of dataset, there is still much work to be done to speed
up the queries. It remains future work to create applications for the DH community to enable
to study the debates in more detail.</p>
      <p>Acknowledgements Our work is part of the Semantic Parliament project18, funded by the
Academy of Finland and is also related to the EU project InTaVia19 and the EU COST action
Nexus Linguarum20. The project uses the computing resources of the CSC – IT Center for
Science.
18https://seco.cs.aalto.fi/projects/semparl/en/
19https://intavia.eu
20https://nexuslinguarum.eu
with Stanford NER in a Finnish Historical Newspaper and Journal Collection, in: 13th
IAPR International Workshop on Document Analysis Systems, 2018.
[21] T. A. Pirinen, Development and Use of Computational Morphology of Finnish in the Open
Source and Open Science Era: Notes on Experiences with Omorfi Development., SKY
Journal of Linguistics 28 (2015) 381–393. doi:10.23978/inf.107890.
[22] E. Mäkelä, Combining a REST Lexical Analysis Web Service with SPARQL for Mashup
Semantic Annotation from Text, in: The Semantic Web: ESWC 2014 Satellite Events
- ESWC 2014 Satellite Events, Anissaras, Crete, Greece, May 25-29, 2014, Revised
Selected Papers, Springer International Publishing, 2014, pp. 424–428. doi:10.1007/
978-3-319-11955-7_60.
[23] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: A
Nucleus for a Web of Open Data, in: The Semantic Web: 6th International
Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007,
Busan, Korea, November 11-15, 2007, Springer, Berlin, Heidelberg, 2007, pp. 722–735.
doi:10.1007/978-3-540-76298-0_52.
[24] D. Damljanovic, K. Bontcheva, Named Entity Disambiguation using Linked Data, in:
The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference,
ESWC 2012, Heraklion, Crete, Greece, May 27-31, 2012. Proceedings, Springer-Verlag
Berlin Heidelberg, 2012, pp. 231–240.
[25] L. Derczynski, D. Maynard, G. Rizzo, M. Van Erp, G. Gorrell, R. Troncy, J. Petrak,
K. Bontcheva, Analysis of named entity recognition and linking for tweets, Information
Processing and Management 51 (2015) 32–49. doi:10.1016/j.ipm.2014.10.006.
[26] S. Simola, A century of partisanship in Finnish political speech, 2020. Part of PhD thesis:</p>
      <p>Essays in Labor and Political Economics, Aalto University.
[27] K. Makkonen, P. Loukasmäki, Eduskunnan täysistunnon puheenaiheet 1999-–2014: Miten
käsitellä LDA-aihemalleja?, Politiikka 61 (2019) 127––159.
[28] E. Lillqvist, I. K. Kavonius, M. Pantzar, “Velkakello tikittää”: Julkisyhteisöjen velka
suomalaisessa mielikuvastossa ja tilastoissa 2000—2020, Kansantaloudellinen Aikakauskirja 116
(2020) 581––607.
[29] E. Hyvönen, E. Ikkala, M. Koho, , R. Leal, H. Rantala, M. Tamper, How to search and
contextualize scenes inside videos for enriched watching experience: Case stories of the
second world war veterans, 2022. Under peer review.
[30] A. Brandsen, S. Verberne, K. Lambers, M. Wansleeben, Can BERT Dig It?–Named
Entity Recognition for Information Retrieval in the Archaeology Domain, arXiv preprint
arXiv:2106.07742 (2021).
[31] E. Hyvönen, P. Leskinen, M. Tamper, H. Rantala, E. Ikkala, J. Tuominen, K. Keravuori,
BiographySampo - publishing and enriching biographies on the semantic web for digital
humanities research, in: P. Hitzler, M. Fernández, K. Janowicz, A. Zaveri, A. J. Gray,
V. Lopez, A. Haller, K. Hammar (Eds.), The Semantic Web. ESWC 2019, Springer-Verlag,
2019, pp. 574–589. doi:10.1007/978-3-030-21348-0_37.
[32] M. Tamper, P. Leskinen, E. Hyvönen, R. Valjus, K. Keravuori, Analyzing Biography
Collection Historiographically as Linked Data: Case National Biography of Finland, Semantic
Web – Interoperability, Usability, Applicability (2021). Accepted.
[33] J. Kanerva, F. Ginter, N. Miekka, A. Leino, T. Salakoski, Turku Neural Parser Pipeline:
An End-to-End System for the CoNLL 2018 Shared Task, in: Proceedings of the CoNLL
2018 Shared Task: Multilingual parsing from raw text to universal dependencies, 2018, pp.
133–142. doi:10.18653/v1/K18-2013.
[34] J. Kanerva, F. Ginter, T. Salakoski, Universal Lemmatizer: A sequence-to-sequence model
for lemmatizing Universal Dependencies treebanks, Natural Language Engineering (2020)
1–30. doi:10.1017/S1351324920000224.
[35] M. Hämäläinen, UralicNLP: An NLP library for Uralic languages, Journal of Open Source</p>
      <p>Software 4 (2019) 1345. doi:10.21105/joss.01345.
[36] O. Suominen, Annif: DIY automated subject indexing using multiple algorithms, LIBER</p>
      <p>Quarterly 29 (2019) 1–25. doi:10.18352/lq.10285.
[37] E. Hyvönen, How to create a national cross-domain ontology and linked data infrastructure
and use it on the semantic web (2021). URL: https://seco.cs.aalto.fi/publications/2021/
hyvonen-dcmi-2021.pdf, keynote presentation for the DCMI 2021 conference.
[38] M. Tamper, A. Oksanen, J. Tuominen, A. Hietanen, E. Hyvönen, Automatic Annotation
Service APPI: Named Entity Linking in Legal Domain, in: The Semantic Web: ESWC 2020
Satellite Events, Springer-Verlag, 2020, pp. 110–114. doi:10.1007/978-3-030-62327-2_36.
[39] M. Tamper, E. Hyvönen, P. Leskinen, Visualizing and analyzing networks of named entities
in biographical dictionaries for digital humanities research, in: Proceedings of the 20th
International Conference on Computational Linguistics and Intelligent Text Processing
(CICling 2019), Springer, 2019. Accepted.
[40] M. Tamper, P. Leskinen, K. Apajalahti, E. Hyvönen, Using Biographical Texts as Linked
Data for Prosopographical Research and Applications, in: Digital Heritage. Progress
in Cultural Heritage: Documentation, Preservation, and Protection. 7th International
Conference, EuroMed 2018, Nicosia, Cyprus, Springer-Verlag, 2018, pp. 125–137. doi:10.
1007/978-3-030-01762-0_11.
[41] E. Hyvönen, Digital humanities on the Semantic Web: Sampo model and portal series,
2021. Submitted.
[42] E. Ikkala, E. Hyvönen, H. Rantala, M. Koho, Sampo-UI: A full stack JavaScript framework
for developing semantic portal user interfaces, Semantic Web – Interoperability, Usability,
Applicability 13 (2022) 69–84. doi:10.3233/SW-210428.
[43] D. Zeman, J. Hajič, M. Popel, M. Potthast, M. Straka, F. Ginter, J. Nivre, S. Petrov, CoNLL
2018 shared task: Multilingual parsing from raw text to Universal Dependencies, in:
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to
Universal Dependencies, Association for Computational Linguistics, Brussels, Belgium,
2018, pp. 1–21. doi:10.18653/v1/K18-2001.
[44] O. Suominen, M. Lehtinen, J. Inkinen, Annif and Finto AI: Developing and Implementing
Automated Subject Indexing, Jlis.it 13 (2022) 265–282. doi:10.4403/jlis.it-12740.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Benoît</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          Rozenberg (Eds.),
          <source>Handbook of Parliamentary Studies: Interdisciplinary</source>
          Approaches to Legislatures, Edward Elgar Publishing,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .4337/9781789906516.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Andrushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marjanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hatavara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kurunmäki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nummenmaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hyvärinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Teräs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peltonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nummenmaa</surname>
          </string-name>
          ,
          <article-title>Using parsed and annotated corpora to analyze parliamentarians' talk in Finland</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>185</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.24500.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karimäki</surname>
          </string-name>
          , Luonnonsuojelusta ilmastopolitiikkaan:
          <source>Ympäristöpoliittisen käsitteistön muutos parlamenttipuheessa 1960-2020, Politiikka</source>
          <volume>63</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .37452/ politiikka.109690.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Martinez-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Lopez-Arevalo, Information Extraction Meets the Semantic Web: A Survey</article-title>
          , Semantic Web - Interoperability, Usability, Applicability
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>255</fpage>
          -
          <lpage>335</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-180333.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sinikallio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drobac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          , E. Ikkala,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kesäniemi</surname>
          </string-name>
          ,
          <article-title>Parlamenttisampo: eduskunnan aineistojen linkitetyn avoimen datan palvelu ja sen käyttömahdollisuudet</article-title>
          ,
          <source>Informaatiotutkimus</source>
          <volume>40</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .23978/inf.107899.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sinikallio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drobac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          , E. Ikkala,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kesäniemi</surname>
          </string-name>
          ,
          <article-title>Finnish parliament on the semantic web: Using parliamentsampo data service and semantic portal for studying political culture and language, in: Digital Parliamentary data in Action (DiPaDa</article-title>
          <year>2022</year>
          ),
          <article-title>Workshop at the 6th Digital Humanities in Nordic and Baltic Countries Conference, long paper</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , Vol.
          <volume>3133</volume>
          ,
          <year>2022</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3133</volume>
          /paper05.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sinikallio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drobac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Mela</surname>
          </string-name>
          , E. Hyvönen,
          <article-title>Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup</article-title>
          ,
          <source>in: 3rd Conference on Language, Data and Knowledge (LDK</source>
          <year>2021</year>
          ), volume
          <volume>93</volume>
          ,
          <year>2021</year>
          , pp.
          <volume>8</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          :
          <fpage>17</fpage>
          . doi:
          <volume>10</volume>
          .4230/OASIcs.LDK.
          <year>2021</year>
          .
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <article-title>Members of Parliament in Finland Knowledge Graph and Its Linked Open Data Service, in: Further with Knowledge Graphs</article-title>
          .
          <source>Proceedings of the 17th International Conference on Semantic Systems, 6-9 September</source>
          <year>2021</year>
          , Amsterdam, The Netherlands,
          <year>2021</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>269</lpage>
          . doi:
          <volume>10</volume>
          .3233/SSW210049.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alonen</surname>
          </string-name>
          , E. Mäkelä,
          <article-title>Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets, in: The Semantic Web: ESWC 2014 Satellite Events</article-title>
          ,
          <source>Revised Selected Papers</source>
          , Springer-Verlag,
          <year>2014</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>230</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -11955-7_
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gardiner</surname>
          </string-name>
          , R. G.
          <article-title>Musto, The Digital Humanities: A Primer for Students and Scholars</article-title>
          , Cambridge University Press, New York, NY, USA,
          <year>2015</year>
          . https://doi.org/10.1017/ CBO9781139003865.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lapponi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Søyland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Velldal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oepen</surname>
          </string-name>
          ,
          <article-title>The Talk of Norway: a richly annotated corpus of the Norwegian parliament,</article-title>
          <year>1998</year>
          -
          <fpage>2016</fpage>
          ,
          <string-name>
            <given-names>Lang</given-names>
            <surname>Resources</surname>
          </string-name>
          &amp;
          <article-title>Evaluation 52 (</article-title>
          <year>2018</year>
          )
          <fpage>873</fpage>
          -
          <lpage>893</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10579-018-9411-5.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>A. Van Aggelen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kemman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kleppe</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Beunders</surname>
          </string-name>
          ,
          <article-title>The debates of the European Parliament as Linked Open Data</article-title>
          ,
          <source>Semantic Web - Interoperability, Usability, Applicability</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>271</fpage>
          -
          <lpage>281</lpage>
          . doi:
          <volume>10</volume>
          .1007/s42001-019-00060-w.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>U.</given-names>
            <surname>Boja</surname>
          </string-name>
          ¯rs, R. Dar g'is, U. Lavrinovičs, P. Paikens,
          <article-title>LinkedSaeima: A Linked Open Dataset of Latvia's Parliamentary Debates, in: Semantic Systems. The Power of AI and Knowledge Graphs</article-title>
          .
          <source>SEMANTiCS 2019</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>56</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>030</fpage>
          -33220-
          <issue>4</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Juric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-J.</given-names>
            <surname>Houben</surname>
          </string-name>
          ,
          <article-title>Bringing Parliamentary Debates to the Semantic Web</article-title>
          .,
          <source>in: Proceedings of the Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE</source>
          <year>2012</year>
          ),
          <year>2012</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pancur</surname>
          </string-name>
          , T. Erjavec,
          <article-title>The siParl corpus of Slovene parliamentary proceedings</article-title>
          ,
          <source>in: Proceedings of the Second ParlaCLARIN Workshop</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grenager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, in: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)</article-title>
          , June,
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2005</year>
          , University of Michigan, Ann Arbor, Michigan, USA, Association for Computational Linguistics,
          <year>2005</year>
          , pp.
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          . doi:
          <volume>10</volume>
          .3115/1219840.1219885.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruokolainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kauppinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Silfverberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lindén</surname>
          </string-name>
          ,
          <article-title>A Finnish news corpus for named entity recognition</article-title>
          ,
          <source>Language Resources and Evaluation</source>
          <volume>54</volume>
          (
          <year>2020</year>
          )
          <fpage>247</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10579-019-09471-7.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luoma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oinonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pyykönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laippala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pyysalo</surname>
          </string-name>
          ,
          <article-title>A Broad-coverage Corpus for Finnish Named Entity Recognition</article-title>
          ,
          <source>in: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4615</fpage>
          -
          <lpage>4624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Virtanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kanerva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ilo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luoma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luotolahti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salakoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ginter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pyysalo</surname>
          </string-name>
          , Multilingual is not enough:
          <source>BERT for Finnish</source>
          ,
          <year>2019</year>
          . arXiv:
          <year>1912</year>
          .07076.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruokolainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kettunen</surname>
          </string-name>
          ,
          <article-title>À la recherche du nom perdu-Searching for Named Entities</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>