<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finnish Parliament on the Semantic Web: Using ParliamentSampo Data Service and Semantic Portal for Studying Political Culture and Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eero Hyvönen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petri Leskinen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Sinikallio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matti La Mela</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jouni Tuominen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kimmo Elo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senka Drobac</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikko Koho</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esko Ikkala</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minna Tamper</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Leal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joonas Kesäniemi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Parliamentary Studies, University of Turku</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Helsinki Centre for Digital Humanities (HELDIG), University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Semantic Computing Research Group (SeCo), Department of Computer Science, Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>69</fpage>
      <lpage>85</lpage>
      <abstract>
        <p>This paper introduces the system ParliamentSampo - Parliament of Finland on the Semantic Web, a Linked Open Data (LOD) service, data infrastructure, and semantic portal for studying Finnish political culture, language, and networks of the Members of Parliament (MP). The article presents the vision behind the system, the LOD service, and explores the possibilities to utilize it in research and application development. A knowledge graph of linked data has been created based on ca. 962 000 speeches in all plenary sessions of the Parliament of Finland in 1907-2021; the data is also available in XML format, utilizing the new international Parla-CLARIN format. For the first time, the entire time series of the Finnish parliamentary speeches has been converted into data and a data service in a unified format. In addition, the speeches have been interlinked with another knowledge graph created from the database of the MPs and enriched from other data sources into a broader ontology-based data service. The paper shows how the LOD service SPARQL endpoint can be used to research parliamentary culture, the use of political language, and networks of politicians through data analysis. The service endpoint can also be used to develop applications for diferent user groups without programming skills, such as the ParliamentSampo semantic portal introduced in the paper, too. This application aims to make political decision making more transparent to the general public, media, politicians, and other end users.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;parliamentary studies</kwd>
        <kwd>semantic portals</kwd>
        <kwd>linked data</kwd>
        <kwd>digital humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The main tasks of parliaments are to enact new laws, oversee the work of the government,
and decide on the state budget; how the parliament works in Finland is documented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Parliamentary data are used in many areas of research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], as it provides a wealth of information
on the state and functioning of democratic systems, political life and, more generally, language
and culture. The most prominent part of the work of parliaments is the public plenary sessions, in
which the Members of Parliament (MP) discuss and vote on issues on the agenda and other topics
that arise. Parliaments draw up minutes of plenary sessions and make both the minutes and the
documents on which they are based on available to the public. Openness and transparency in
the work of parliaments is important for the voters, media, researchers, and also parliaments
themselves: based on open data, they can look at the decision-making stages, views, and actions
expressed by parliamentarians in their work as legislators.
      </p>
      <p>
        This paper argues, inspired by [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], for publishing and using parliamentary data in Digital
Humanities (DH) research based on Semantic Web (SW) technologies1 and Linked Data (LD)
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The LD approach for Cultural Heritage [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has arguably many advantages: 1) Linked data
and ontologies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] provide a framework for harmonizing heterogeneous distributed datasets
and combining them into larger and richer entities. 2) The SW is based on the Predicate Logic
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which provides an opportunity to enrich data by reasoning new information. 3) When the
machine “understands” the content of the data, intelligent web services and data analyses can
be implemented more easily. 4) Ready-made tools by other actors can be re-used for publishing,
processing and analysing the standardized data; the wheel doesn’t need to be reinvented.
      </p>
      <p>
        In this paper, we test and demonstrate the above arguments in DH research on parliamentary
culture and language [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] by presenting the ParliamentSampo system, a Linked Open Data
(LOD) corpus and data service of Finnish parliamentary data and a semantic portal on top of it2.
The paper presents the vision and first results of the ParliamentSampo extending our earlier
papers on creating the knowledge graphs for the speeches [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and MP networks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and a
Finnish presentation on the project [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The paper first reviews related research on parliamentary data (Section 2). In Section 3, our
vision of publishing and using Finnish parliamentary linked data on the SW is presented. After
this, first results obtained in developing and using the ParliamentSampo system in diferent
ways are presented (Section 4). In conclusion, results of our work are summarized and using
parliamentary data in research is considered on a more general level (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work on Parliamentary Data</title>
      <p>
        Lots of parliamentary materials have been digitized in recent decades, arguably only second
to newspapers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For example, the Royal Library of Sweden has digitized Swedish printed
parliamentary documents from 1521 to 1970. This collection3 is supplemented by the
parliament’s own digital materials and, e.g., by the Westac research project4 at the Umeå University.
Digitization has improved the accessibility and usability of parliamentary materials for both
the public and the research community. Websites have been created that make it easy for users
to browse and download materials. Examples include the website of the Lipad project5 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that
digitized Canadian parliamentary materials, and the portal Italian House of Representatives6
that comprehensively presents the history of the Italian parliament in 1848–2018.
1https://www.w3.org/standards/semanticweb/
2See the project homepage for more details, videos, and publications: https://seco.cs.aalto.fi/projects/semparl/en/ .
3http://data.riksdagen.se
4https://www.westac.se
5https://lipad.ca
6https://storia.camera.it
      </p>
      <p>
        Several parliamentary corpora have been formed from the minutes of the plenary debates,
which make it possible to study the content of the speeches and their language; see, e.g., [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and
the CLARIN list of parliamentary corpora7. The TEI-based Parla-CLARIN scheme8 for session
minutes has been developed within the CLARIN infrastructure, providing a common way to
represent the corpora [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The related ParlaMint project9 brings together Parla-CLARIN-based
national corpora. Parliamentary materials have also been transformed into the form of LD when
creating the LinkedEP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] system on the European Parliament’s data, the Italian Parliament10,
and the LinkedSaeima for the Latvian parliament [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The materials of the Parliament of Finland (PoF) have been digitized in various contexts but
are dificult to use, as they have been produced separately from diferent periods and stored in
diferent formats [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The usability of the materials is also hampered by their varying quality
and lack of descriptive data [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Language corpora have been published on parliamentary
debates, such as the Parliamentary Corpus of FIN-CLARIN’s Language Bank11 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] which
covers the years 2008––2016. It contains the speeches in a linguistically annotated form and
also synchronized links to original plenary session videos [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The Voices of Democracy
project has produced a research corpus that includes plenary minutes in 1980–2018 annotated
grammatically as well as interviews of veteran MPs conducted by the PoF after 1988 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The
minutes of the parliamentary debates from 1991 to 2015 can also be found in the International
Harvard Parlspeech Corpus [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], but we have identified gaps in the coverage in this corpus.
      </p>
      <p>
        Digitized parliamentary materials ofer a wide range of perspectives on diferent research
topics and have been used in a variety of fields, such as linguistics, political science, media
studies, economics, and history. The most important research material are the debates in
the parliaments, through which one can study the language and its changes itself as well as
the underlying societal phenomena at large [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Metadata makes it possible to structure the
speeches, for example, between parties, gender, or professional groups. Blaxill and Beelen [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
have examined the content of women’s parliamentary speeches, as well as the role of gender in
the speeches of MPs in the British Parliament. Parliamentary debates have been used in thematic
or conceptual analyses (cf., e.g., [
        <xref ref-type="bibr" rid="ref22 ref23 ref24">22, 23, 24, 25, 26</xref>
        ]) and to study the language and the opinions
of the parties or MPs (e.g., [27, 28]). Parliamentary debates have been used in translation studies
using, for example, the EuroParl Corpus12 of the European Parliament debates.
      </p>
      <p>
        The digitized material of the Finnish Parliament has been utilized to some extent in digital
humanities and social scientific research. La Mela [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], also Kettunen and La Mela [26], have
studied the history of the concept of Everyman’s right, a Nordic right of public access to nature,
with the digitized minutes of the Parliament, and examined their quality in PDF format. The
digitized minutes have been utilized in the development of language technology methods, in this
case the Finnish Semantic Tagger [26]. Similarly, Andrushschenko et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have used their
grammatically structured corpus and a search tool to organize and analyze parliamentary debates
in various research cases. Simola [29] has examined the diferences in political speech between
7https://www.clarin.eu/resource-families/parliamentary-corpora
8https://github.com/clarin-eric/parla-clarin
9https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora
10http://data.camera.it
11http://korp.csc.fi
12https://www.statmt.org/europarl/
parties throughout the parliamentary period 1907–2018, for which she compiled a separate
research dataset combining the debates and the speaker data. Makkonen and Loukasmäki [30]
have studied the plenary speeches given in Parliament of Finland in 1999–2014 and their content
by using topic modeling. FIN-CLARIN’s Parliamentary Corpus has been used, for example, by
Lillqvist et al. [31] in their study on debates about public debt. Previous search applications for
Finnish parliamentary speech data are based mostly on traditional text search. Data analysis
tools to examine the results are few, such as the concordance analysis of the Language Bank
Korp, where the words found are visualized in their textual contexts and show some statistics
of words occurrences in the search results. These applications cover only a small part of the
entire time series of the Finnish parliamentary speeches.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. ParliamentSampo Vision</title>
      <p>
        The vision of the Semantic Parliament project [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is to develop and implement in the living
laboratory environment model shown in Fig. 1 for publishing and utilizing parliamentary
materials as LOD on the SW. The work focuses on two core datasets:
1. Minutes of Parliamentary Sessions All Finnish parliamentary debates, totalling ca.
962 000 speeches and covering the existence of the PoF 1907–2021, have been transformed
into a 1) Linked Data knowledge graph and into 2) Parla-CLARIN XML form. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
2. Members of Parliament Data A proposographical knowledge graph has been created
for representing biographical data about all ca. 2800 Finnish MPs and other politicians
during the same time period (1907–2021). [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
      </p>
      <p>The left side of Fig. 1 shows content providers that produce data related to the PoF in their
own local data silos, but in non-interoperable formats. For example, the LawSampo (LawSampo)
system publishes Finnish legislation, the results of parliamentary discussions, and case law data
provided by the Ministry of Justice in Finland as the LOD service Semantic Finlex [32] and a
semantic portal13 [33].</p>
      <p>In in the middle of Fig. 1, the data is aggregated, harmonized, enriched, interlinked, and
published as a new FinnParla LOD service on the Linked Data Finland platform LDF.fi 14 [34].
Its data model is based on 1) a new ontology describing the activities of the PoF and 2) a set of
related vocabularies and ontologies describing, for example, (historical) places15, professions
[35], people, and organizations. Notice that lots of additional documents of the PoF processes,
such as propositions and bills, and votation data could be interlinked with the ParliamentSampo
system in the future using its open data infrastructure.</p>
      <p>The right side of Fig. 1 depicts the ways of utilizing the FinParla data service as 1) a semantic
portal, 2) using it in research by computational tools, and 3) for developing new applications.</p>
      <p>
        The knowledge graph of the parliamentary speeches (S-KG) (cf. Fig. 1), contains speeches
collected from all the minutes of the plenary sessions of the PoF since 1907. The S-KG was
compiled from several initial formats: 1) From 1907 until the middle of 1999, the minutes are
available only as scanned images embedded in PDF documents. This material was OCRed with
minor manual corrections made. 3) From mid-1999 to the end of 2014, the material was available
in HTML format at the Parliament’s website16. 3) From the 2015 onwards, the minutes are
available through the Finnish Parliament Open Data API17 in custom XML form. The data
quality of S-KG has been deemed satisfactory, although there were issued related to OCR errors
and the fact that there have been diferences in how the transliteration and metadata of the
minutes have been produced in the PoF. The data model of S-KG and the data transformation
process are described in detail in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        The S-KG was interlinked to the MPs prosopographic knowledge graph P-KG (cf. Fig. 1). For
example, speakers and the parties they represent are resources with URI identifiers described
in the P-KG graph. The data publication about MPs is a knowledge graph (P-KG) covering all
MPs who have worked in Finland [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. At its core is an RDF conversion of XML-formatted data
about MPs downloaded from the Open Data service18 of PoF. In addition to basic biographical
information, such as times and places of birth and death, the data includes detailed information
about the people’s life events, such as studying, working life, political career, and publications
written by the politicians.
      </p>
      <p>
        The Finnish parliament’s open data source has been supplemented and enriched with
information extracted from the Finnish Government’s website19 and Wikidata: in addition to MPs,
some 200 other people with significant political history, such as presidents, ministers, and
ombudsmen, have been added into the knowledge graph. For example, Mauno Koivisto has served
as President and Prime Minister but never as an MP. The knowledge graph was also interlinked
with the BiographySampo system [36], yet another example of the mutually interlinked “Sampo”
13LawSampo project: http://seco.cs.aalto.fi/projects/lawlod/
14Linked Data Finland service online: https://ldf.fi/
15https://seco.cs.aalto.fi/projects/histoplaces/
16https://www.eduskunta.fi/FI/taysistunto/Sivut/Taysistuntojen-poytakirjat.aspx
17https://avoindata.eduskunta.fi/#/fi/home
18https://avoindata.eduskunta.fi/#/fi/dbsearch
19https://valtioneuvosto.fi/hallitukset-ja-ministerit
systems and LOD infrastructure 20 in use in Finland, that publishes biographies of ca. 13 600
significant Finnish persons as a LOD service and a semantic portal 21 including biographies of
614 Finnish parlamentarians. The data model of P-KG, based on the CRM Bio extension [37] of
CIDOC CRM22, is described in more detail in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] including the transformation process of the
data sources into RDF. The transformation and linking could be done fairly accurately as the
primary data were already available in structured forms.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Using the ParliamentSampo LOD Service</title>
      <p>The goal of the ParliamentSampo system is to provide the end users with flexible and rich
possibilities for searching, browsing, and analyzing the PoF data. The new possibilities are
ofered by a standard SPARQL endpoint for 1) opening the data for external use, 2) for querying
the endpoint and studying the results, 3) for data analysis using various tools and scripting, and
4) for developing new external applications, such as the ParliamentSampo portal. These use
cases are explored next in more detail with examples.</p>
      <sec id="sec-4-1">
        <title>4.1. Exporting the Data for External Use</title>
        <p>A simple way for a researcher to use ParliamentSampo data is to download data from the data
service for local use and then apply one’s favourite tools for data analysis, such as spreadsheets,
R23 environment for statistical analysis, or Gephi24 for network analysis. For filtering out
subsets of interest in the big data, SPARQL querying can be used in flexible ways. It is also
possible to install a local SPARQL server environment for linked data on one’s own computer,
for example Fuseki25, which is also used in the LDF.fi service. The materials in the LDF.fi service
are published using container technology (i.e., Docker26), which means that installing the data,
the server, and possible versioned software packages is automatic and efortless.</p>
        <p>
          An example of using ParliamentSampo data externally is reported in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. For this case
study in political science, the Parla-CLARIN version was downloaded and a subset of the
speeches 1960–2020 was filtered out and analyzed further using custom XML-based tools. The
authors studied how the language used in discussing environmental politics has evolved in
Finland in the speeches of diferent parties. Eleven central environmental terms were selected
from a thesaurus27 used by the PoF library, speeches where these terms were used were then
extracted, and various quantitative analyses based on them were presented and compared with
the strategy plans of the parties with qualitative interpretations. The analyses showed, for
example, a constantly increasing intensity of environmental debates and a rhetorical shift of
language from protecting the nature to issues of climate change.
        </p>
        <p>20LOD Infrastructure for Digital Humanities in Finland (LODI4DH): https://seco.cs.aalto.fi/projects/lodi4dh/
21BiographySampo portal is available at https://biografiasampo.fi/ .
22https://cidoc-crm.org
23https://www.r-project.org
24https://gephi.org
25https://jena.apache.org/documentation/fuseki2/
26https://www.docker.com
27EKS Subject Headings: https://www.eduskunta.fi/kirjasto/EKS/index.html?kieli=en</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Querying the Endpoint and Studying Results</title>
        <p>SPARQL is a flexible way to query RDF data. The search result is presented in a tabular format
that can be examined as it is and be visualized and used for application-specific analyzes. For
example, Fig. 2 shows a visualization of the number of speeches (y-axis) in the S-KG graph by
language on a timeline from 1907 to 2021 (x-axis). Speeches in Finnish (’FI’ in the figure) have
clearly been given the most since the beginning (’Kaikki’ in the figure denotes all the speeches).
Originally, there have been more speeches in Swedish (’SV’ in the figure) than today, but the
number remains very small. The graphic was created using the YASGUI editor28 [38], which
can be used to edit SPARQL queries, target them to an online SPARQL endpoint, and to show
the results using pre-implemented visualizations.</p>
        <p>SPARQL is an expressive and flexible way to retrieve information from graphical data, and
it is suitable for use by DH researchers. The SPARQL query used to generate Fig. 2 is shown
below:
PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; # For shortening URIs
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX semparls: &lt;http://ldf.fi/schema/semparl/&gt;
PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
PREFIX dct: &lt;http://purl.org/dc/terms/&gt;
SELECT ?year (COUNT(?fin) as ?FI) (COUNT(?swe) as ?SV) # Variables in the result
(count(?document-URI) as ?ALL ) WHERE {
?document-URI a semparls:Speech . # Graph pattern matched
?document-URI &lt;http://purl.org/dc/terms/date&gt; ?dateTime .</p>
        <p>BIND(STR(year(?dateTime)) as ?year)
{</p>
        <p>BIND( &lt;http://id.loc.gov/vocabulary/iso639-2/swe&gt; as ?swe)
?document-URI dct:language ?swe .
} UNION {
BIND( &lt;http://id.loc.gov/vocabulary/iso639-2/fin&gt; as ?fin)</p>
        <p>?document-URI dct:language ?fin .</p>
        <p>}
} GROUP BY ?year ORDER BY ASC(?year) # Grouping and ordering results yearly</p>
        <p>This query above first introduces the namespaces used (PREFIX); they are used to make the
URI references in the query syntactically shorter and simpler. In the next SELECT part of the
query, all speeches and their languages are retrieved using a graph pattern formed by variables
starting with ?, which are fitted to the end point graph in all possible ways. The answer of
the query is a table of all possible value assignments for the variables than make the query
pattern to match the underlying data. The results are finally classified (GROUP BY) into groups
according to language, sorted by year (ORDER BY), and finally it is summed up (COUNT) how
many speeches there are in Finnish, Swedish, and in total. In the visualization, the variable ?year
forms the x-axis and the y-axis presents the annual number of speeches in diferent languages.
When the speech graph was created, language recognition of speeches was done automatically.
Typically this could be done accurately. However, sometimes OCR errors, for example, can
make language recognition dificult, and therefore speeches whose language code could note be
identified were excluded automatically from the query result.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Data-analysis by Scripting</title>
        <p>The PoF data can be examined computationally, for example, using Python scripting and
Jupyter notebooks in the Google Colab29 environment. Then one can use the simple HTTP
protocol to perform SPARQL queries and after this analyze and visualize query results using
tools provided by the programming environment used, e.g., by Python libraries.</p>
        <p>
          For example, Fig. 3 shows the ages of persons elected as MPs for the first time each year [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
In the figure, the blue solid line shows the age of all MPs, and the age of women is shown in red.
It can be seen from the graph that the starting age has remained almost constant throughout
the parliamentary activities, but on the other hand, since 1980, women have been younger than
men for some time when they started as MPs. The relative proportion of women in the PoF
is shown by a black dotted line. Before the 1960s, the proportion remained at an average of
10%, but has after this risen to 30–50%. The graphics in the image were implemented in Google
Colab using standard Python libraries for data analysis.
        </p>
        <p>Figure 4 shows a similarly formed tabular visualization of the correlation between the parties
and the occupations of the MPs. Here only the most popular parties and occupations over
the entire history of PoF are considered. The parties are presented in the horizontal rows of
the table and the number of representatives of each profession is indicated in the vertical row
corresponding to the occupation. The matrix shows, for example, that in the Centre Party,
the National Coalition Party, and the Swedish People’s Party the most common occupation is
Farmer. On the other hand, Entrepreneur has been the most common occupations with the
Finns Party.</p>
        <p>The same visualization components can be reused in diferent contexts. For example, the
matrix visualization of Fig. 4 is re-used in Fig. 5 for analyzing interruptions of speeches of the
current PoF. The y-axis lists the most active speakers and x-axis the MPs that have interrupted
their speeches. For example, of the interrupted speeches of MP Annika Saarikko (Centre Party),
the current Minister of Finance, 46% are due to MP Ben Ben Zyskowics, representing the
National Coalition Party in opposition, and 18% to MP Jukka Gustafsson representing the party
SDP in the government, indicating possibly diferent opinions inside the government.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Using the ParliamentSampo Portal</title>
        <p>
          The ParliamentSampo portal, based on the Sampo model [
          <xref ref-type="bibr" rid="ref25">39</xref>
          ] and the Sampo-UI framework
[
          <xref ref-type="bibr" rid="ref26">40</xref>
          ], demonstrates how the FinnParla data service can be used for developing applications for
DH research. In the portal, the data can be filtered using faceted search [
          <xref ref-type="bibr" rid="ref27">41</xref>
          ] based on ontologies,
and the results can then be analyzed with the help of seamlessly integrated visualization and
data analysis tools. The data can be accessed along two application views for studying 1)
speeches and 2) MPs. For example, in Fig. 6, the user has selected the Plenary Speeches view,
which shows the search facets Content, Speaker, Party, (Speech) Type, Language, and Date on
the left. The search result, i.e., the speeches found, is shown by default in tabular form on the
right. The user has written a query “suomettum*” in the Content text facet, in which case only
speeches that contain the word “suomettuminen” (Finlandization) in its various inflectional
forms have been filtered into the search result, as the wildcard “*” matches any string. The
user has also limited the result on the Date facet to speeches given since June 4, 1945, when
Parliament began to convene after the World War II. The result in this case is 177 speeches,
shown in a table (with paging). By selecting the tab “Timeline”, the yearly amount of speeches
is visualized as a function of time.
        </p>
        <p>
          In faceted search, the filtering selections can be made flexibly in any order, and the search
engine calculates a hit count for each subsequent facet selection, which tells how many results
would be obtained in the result set if the selection in question is made next. For example, in
the Speaker facet, a click on “Junnila, Tuure (1910-1999) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]” selects MP Tuure Junnila’s seven
speeches that mention “Finlandization”. The selection facets are created automatically using the
parliamentary ontology and knowledge graphs of the FinParla data. The hit count allows the
user to be directed to selections that do not lead to dead ends where the result set is empty. In
addition, the hit numbers provide an opportunity to investigate the result set statistically along
diferent facet dimensions. For example, a click on the pie symbol of the Speaker facet opens the
pie chart of Fig 7 which shows how many diferent speakers mention “Finlandization” in their
speeches. The most active MPS in this case are Mr. Georg C. Ehrnrooth (21 speeches) and Mr.
Ben Zyskowicz (19 speeches), two active right-wing politicians concerned with the concept.
        </p>
        <p>
          In accordance with the Sampo model, a number of pre-implemented data analysis tools and
visualizations, similar to those shown in the figures above, can be integrated into the application
perspectives of the ParliamentSampo portal. In the future, the tools and visualizations can be
found alongside the table visualization in Fig. 6 on their own tabs in the same way as, for example,
in the AcademySampo’s user interface [
          <xref ref-type="bibr" rid="ref28">42</xref>
          ]; the components of the Sampo-UI framework [
          <xref ref-type="bibr" rid="ref26">40</xref>
          ]
are reused in the implementation of both portals. Through these tools and visualizations, the
project explores the potential of Artificial Intelligence for knowledge discovery in DH research
[
          <xref ref-type="bibr" rid="ref29">43</xref>
          ], i.e., how could ParliamentSampo assist a researcher in finding research problems, in
solving them, and also in explaining solutions?
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        In the context of political research, the parliamentary speech is considered an important form of
political communication and political struggle. A parliamentary speech is not just any speech,
but has its own structure and its own rules, which at the same time reflect the general position
of the parliament. In addition, a parliamentary speech is an instrument of political struggle to
expose competing goals, challenge the views of an adversary, and unlock deadlocked settings.
Thus, a speech in a parliament is always also a political act, in which the words used are the
weapons of political decision-making and which not only tell about the issues under discussion,
but also reveal the diferent positions, values, and points of view of the speakers. [
        <xref ref-type="bibr" rid="ref30">44</xref>
        ]
      </p>
      <p>Traditionally, parliamentary speeches have been studied by close reading and using content
analysis, discourse analysis, and various methods of rhetorical research. However, digitalization
has also entered this traditional area of research more and more, as data on parliamentary
debates in various countries have become increasingly available in the form of open data. In
the case of the PoF, the digitization of parliamentary documents has progressed at a reasonable
speed, and some of the material has also been available through the parliament’s open data
service. The availability of the data and also the quality of the available data has improved in
recent years, but there are still significant diferences with similar data in diferent countries.</p>
      <p>The work on ParliamentSampo is an important step in utilizing the plenary debates in PoF
as part of the field of humanities research. Although the materials have always been available
to researchers manually and for some years also electronically digitized in PDF format, the
machine-readable data corpus now being prepared and published as a data service, together
with the ParliamentSampo portal, will integrate parliamentary plenary debates and other
open materials into the DH and national information infrastructure. This means in practice,
for example, the opportunity for political scientists, historians, and linguists to extract, model,
analyze, and visualize parliamentary speech through exploratory research, using a vast body of
data covering the entire period of the modern PoF since 1907.</p>
      <p>
        The possibility of exploratory data analysis opens up completely new possibilities and
perspectives for the study of parliamentary speech. In traditional close reading, the researcher is
forced to delimit the material strongly already at the collection stage, which usually happens
through either temporal or thematic delimitation – that is, either by focusing on a limited time
period or on limited themes. Digital methods make it possible to study the material without
such limitations, and thus to examine it, for example, with fully automatic or semi-automatic
classification methods. In this way, it may be possible to find, for example, new themes and
topics that have been sidelined in research in the past (cf., e.g., [
        <xref ref-type="bibr" rid="ref31 ref32">45, 46</xref>
        ]).
      </p>
      <p>
        On the other hand, distant reading and classification of data without strong presuppositions
also allows for a critical examination of previous research results, when the themes/topics
generated by distant reading can be compared with the results obtained by other methods [
        <xref ref-type="bibr" rid="ref33">47</xref>
        ].
      </p>
      <p>
        Another example of the possibilities ofered by data is research on the language of politics
and its long-term change (e.g., [
        <xref ref-type="bibr" rid="ref34 ref35 ref36 ref37 ref38 ref39">48, 49, 50, 51, 52, 53</xref>
        ]). Parliamentary big data enables large-scale
and systematic application of language technology methods. Although parliamentary speech is
also linguistically its own special form of speech, parliamentary speech also lives in time and
thus reflects both the wider linguistic development and the social atmosphere of discussion and
word choices that occur in it [30]. At the same time, the extensive data ofer an opportunity to
study the change in language use, for example, whether the social debate climate is polarized or
“brutalized”, as politicians and media actors have repeatedly suggested in recent years.
      </p>
      <p>
        The third opportunity ofered by parliamentary data relates to linking the use of language
more broadly to other social contexts of language users, such as education, age, and social
networks. Language can also be approached in policy research on the assumption that language
always reflects the wider world of values and ideas of its user, as well as his or her social status
and context. Discursive coalitions, which can be constructed based on the language use of the
speakers, thus ofer an interesting opportunity to detach oneself from the frame of reference
set by, for example, the party background and to focus analytical attention on networks built
through the use of language. In previous studies, this type of approach has been able to connect
experts to diferent ideological positions by analyzing the content of their texts [
        <xref ref-type="bibr" rid="ref40">54</xref>
        ], which we
think can be well applied to the classification of MPs.
      </p>
      <p>A few examples have been highlighted above where the utilization of parliamentary data
would seem to allow for significant new research openings in parliamentary research. However,
in the spirit of exploratory data analysis, it is worth highlighting the as-yet-unknown possibilities
that gradually emerge as researchers begin to outline new hypotheses and research questions
by examining and analyzing data. The potential of large datasets is surprising in their potential,
which on the one hand requires an open-minded attitude towards the data and on the other
hand underscores the growing responsibility of researchers working in data analysis. When it
is no longer possible for the researcher to know the material he or she is using thoroughly, he
or she must know the phenomena that are the subject of the material thoroughly. Only in this
way is it possible to assess which findings transmitted through excavation, analysis, modeling,
or visualization are truly relevant.</p>
      <p>Acknowledgements Our work is funded by the Academy of Finland and is also related to
the EU project InTaVia30 and the EU COST action Nexus Linguarum31. The project uses the
computing resources of the CSC – IT Center for Science.
[25] P. Ihalainen, A. Sahala, Evolving conceptualisations of internationalism in the UK
parliament: Collocation analyses from the League to Brexit, in: M. Fridlund, M., Oiva, P. Paju
(Eds.), Digital histories: Emergent approaches within the new digital history, Helsinki
University Press, 2020, pp. 199—-219. doi:10.33134/HUP-5-12.
[26] K. Kettunen, M. La Mela, Semantic tagging and the nordic tradition of everyman’s rights,</p>
      <p>Digital Scholarship in the Humanities (2021). doi:10.1093/llc/fqab052.
[27] G. Abercrombie, R. Batista-Navarro, Sentiment and position-taking analysis of
parliamentary debates: a systematic literature review, Journal of Computational Social Science 3
(2012) 245–270. doi:10.1007/s42001-019-00060-w.
[28] M. Magnusson, R. Öhrvall, K. Barrling, D. Mimno, Voices from the far right: a text analysis
of Swedish parliamentary debates, SocArXiv (2018). doi:10.31235/osf.io/jdsqc.
[29] S. Simola, A century of partisanship in Finnish political speech, 2020. URL: https://sites.</p>
      <p>google.com/site/sallasimolaecon/home/research.
[30] K. Makkonen, P. Loukasmäki, Eduskunnan täysistunnon puheenaiheet 1999-–2014: Miten
käsitellä LDA-aihemalleja?, Politiikka 61 (2019) 127––159. URL: https://journal.fi/politiikka/
article/view/77163.
[31] E. Lillqvist, I. K. Kavonius, M. Pantzar, “velkakello tikittää”: Julkisyhteisöjen velka
suomalaisessa mielikuvastossa ja tilastoissa 2000—2020, Kansantaloudellinen Aikakauskirja 116
(2020) 581––607. URL: https://journal.fi/politiikka/article/view/77163 .
[32] A. Oksanen, J. Tuominen, E. Mäkelä, M. Tamper, A. Hietanen, E. Hyvönen, Semantic
Finlex: Transforming, publishing, and using Finnish legislation and case law as linked
open data on the web, in: Knowledge of the Law in the Big Data Age, volume 317 of
Frontiers in Artificial Intelligence and Applications , IOS Press, 2019, pp. 212–228.
[33] E. Hyvönen, M. Tamper, E. Ikkala, S. Sarsa, A. Oksanen, J. Tuominen, A. Hietanen,
Publishing and using legislation and case law as linked open data on the semantic
web, in: The Semantic Web: ESWC 2020 Satellite Events, Springer, 2020, pp. 110–114.
doi:10.1007/978-3-030-62327-2\_19.
[34] E. Hyvönen, J. Tuominen, M. Alonen, E. Mäkelä, Linked Data Finland: A 7-star model
and platform for publishing and re-using linked datasets, in: The Semantic Web: ESWC
2014 Satellite Events, Revised Selected Papers, Springer-Verlag, 2014, pp. 226–230. URL:
https://doi.org/10.1007/978-3-319-11955-7_24.
[35] M. Koho, L. Gasbarra, J. Tuominen, H. Rantala, I. Jokipii, E. Hyvönen, AMMO Ontology of
Finnish Historical Occupations, in: Proceedings of the The First International Workshop
on Open Data and Ontologies for Cultural Heritage (ODOCH’19), volume 2375, CEUR
Workshop Proceedings, 2019, pp. 91–96. URL: http://ceur-ws.org/Vol-2375/.
[36] E. Hyvönen, P. Leskinen, M. Tamper, H. Rantala, E. Ikkala, J. Tuominen, K. Keravuori,
BiographySampo – publishing and enriching biographies on the semantic web for digital
humanities research, in: The Semantic Web. 16th International Conference, ESWC 2019,
Proceedings, Springer, 2019, pp. 574–589. doi:10.1007/978-3-030-21348-0.
[37] J. Tuominen, E. Hyvönen, P. Leskinen, io CRM: A data model for representing biographical
data for prosopographical research, in: Proceedings of the Second Conference on
Biographical Data in a Digital World 2017 (BD2017), volume 2119, CEUR Workshop Proceedings,
2018, pp. 59–66. URL: http://ceur-ws.org/Vol-2119/paper10.pdf.
[38] L. Rietveld, R. Hoekstra, The YASGUI family of SPARQL clients, Semantic Web –
Interop</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hidén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Honka-Hallila</surname>
          </string-name>
          ,
          <article-title>Miten eduskunta toimii (How Parliament of Finland works</article-title>
          ),
          <source>Edita Publishing</source>
          , Helsinki,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Benoît</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          Rozenberg (Eds.),
          <source>Handbook of Parliamentary Studies: Interdisciplinary</source>
          Approaches to Legislatures, Edward Elgar Publishing,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .4337/9781789906516.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Van Aggelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kemman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Beunders</surname>
          </string-name>
          ,
          <article-title>The debates of the European Parliament as Linked Open Data</article-title>
          ,
          <source>Semantic Web - Interoperability, Usability, Applicability</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>271</fpage>
          -
          <lpage>281</lpage>
          . doi:
          <volume>10</volume>
          .1007/s42001-019-00060-w.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Boja</surname>
          </string-name>
          ¯rs, R. Dar g'is, U. Lavrinovičs,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paikens</surname>
          </string-name>
          ,
          <article-title>Linkedsaeima: A linked open dataset of Latvia's parliamentary debates</article-title>
          ,
          <source>in: Semantic Systems. The Power of AI and Knowledge Graphs. SEMANTiCS 2019</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>56</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>030</fpage>
          -33220-4\_4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <article-title>Linked Data: Evolving the Web into a Global Data Space (1st edition)</article-title>
          ,
          <source>Synthesis Lectures on the Semantic Web: Theory and Technology</source>
          , Morgan &amp; Claypool,
          <year>2011</year>
          . URL: http://linkeddatabook.com/editions/1.0/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Publishing and Using Cultural Heritage Linked Data on the Semantic Web</article-title>
          ,
          <source>Synthesis Lectures on the Semantic Web: Theory and Technology</source>
          , Morgan &amp; Claypool, Palo Alto, CA, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          , R. Studer (Eds.),
          <source>Handbook on Ontologies (2nd Edition)</source>
          , Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          , Foundations of Semantic Web technologies, Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sinikallio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drobac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mela</surname>
          </string-name>
          , E. Hyvönen,
          <article-title>Plenary debates of the parliament of finland as linked open data and in parla-clarin markup</article-title>
          ,
          <source>in: 3rd Conference on Language, Data and Knowledge</source>
          ,
          <string-name>
            <surname>LDK</surname>
          </string-name>
          <year>2021</year>
          ,
          <article-title>Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH</article-title>
          , Dagstuhl Publishing,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . URL: https://drops.dagstuhl.de/opus/volltexte/2021/14544/pdf/OASIcs-LDK-2021
          <article-title>-8</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <article-title>Members of Parliament in Finland knowledge graph and its linked open data service</article-title>
          ,
          <source>in: of the 17th International Conference on Semantic Systems, 6-9 September</source>
          <year>2021</year>
          , Amsterdam, The Netherlands,
          <year>2021</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>269</lpage>
          . URL: https://ebooks.iospress.nl/volumearticle/57420. doi:
          <volume>10</volume>
          .3233/SSW210049.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sinikallio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drobac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          , E. Ikkala,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kesäniemi</surname>
          </string-name>
          ,
          <article-title>Parlamenttisampo: eduskunnan aineistojen linkitetyn avoimen datan palvelu ja sen käyttömahdollisuudet</article-title>
          ,
          <source>Informaatiotutkimus</source>
          <volume>40</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.23978/inf.107899.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Andrushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marjanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hatavara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kurunmäki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nummenmaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hyvärinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Teräs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peltonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nummenmaa</surname>
          </string-name>
          ,
          <article-title>Using parsed and annotated corpora to analyze parliamentarians' talk in Finland</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>185</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.24500.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Beelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Thijm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cochrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Halvemaan</surname>
          </string-name>
          , G. Hirst,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kimmins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lijbrink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Naderi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rheault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Polyanovsky</surname>
          </string-name>
          , T. Whyte,
          <article-title>Digitization of the Canadian parliamentary debates</article-title>
          ,
          <source>Canadian Journal of Political Science</source>
          <volume>50</volume>
          (
          <year>2017</year>
          )
          <fpage>849</fpage>
          -
          <lpage>864</lpage>
          . doi:
          <volume>10</volume>
          . 1017/S0008423916001165.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lapponi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Søyland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Velldal</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Oepen,</surname>
          </string-name>
          <article-title>The talk of norway: a richly annotated corpus of the norwegian parliament,</article-title>
          <year>1998</year>
          -
          <fpage>2016</fpage>
          ,
          <string-name>
            <given-names>Lang</given-names>
            <surname>Resources</surname>
          </string-name>
          &amp;
          <article-title>Evaluation 52 (</article-title>
          <year>2018</year>
          )
          <fpage>873</fpage>
          -
          <lpage>893</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10579-018-9411-5.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pancur</surname>
          </string-name>
          , T. Erjavec,
          <article-title>The siParl corpus of Slovene parliamentary proceedings</article-title>
          ,
          <source>in: Proceedings of the Second ParlaCLARIN Workshop</source>
          , European Language Resources Association,
          <year>2020</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>34</lpage>
          . URL: https://www.aclweb.org/anthology/
          <year>2020</year>
          .509parlaclarin-
          <fpage>1</fpage>
          .6.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>M. La Mela</surname>
          </string-name>
          ,
          <article-title>Tracing the emergence of nordic allemansrätten through digitised parliamentary sources</article-title>
          , in: M.
          <string-name>
            <surname>Fridlund</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oiva</surname>
          </string-name>
          , P. Paju (Eds.),
          <article-title>Digital histories: Emergent approaches within the new digital history</article-title>
          , Helsinki University Press,
          <year>2020</year>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>197</lpage>
          . doi:
          <volume>10</volume>
          .33134/HUP-5-11.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lennes</surname>
          </string-name>
          ,
          <article-title>FIN-CLARIN and language bank parliamentary data</article-title>
          .
          <source>workshop “digital parliamentary data and research”</source>
          ,
          <year>2019</year>
          . URL: https://www2.helsinki.fi/en/helsinki-centre
          <article-title>-for-digital-humanities/ workshop-digital-parliamentary-data-and-research.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mansikkaniemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Smit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurimo</surname>
          </string-name>
          ,
          <article-title>Automatic construction of the Finnish parliament speech corpus</article-title>
          ,
          <source>in: Proc. Interspeech</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>3762</fpage>
          -
          <lpage>3766</lpage>
          . doi:
          <volume>10</volume>
          .21437/ Interspeech.2017-
          <volume>1115</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rauh</surname>
          </string-name>
          , P. De Wilde,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schwalbach</surname>
          </string-name>
          ,
          <article-title>The ParlSpeech data set: Annotated full-text vectors of 3.9 million plenary speeches in the key legislative chambers of seven European states</article-title>
          (
          <issue>V1</issue>
          ),
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .7910/DVN/E4RSP9.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karimäki</surname>
          </string-name>
          , Luonnonsuojelusta ilmastopolitiikkaan:
          <source>Ympäristöpoliittisen käsitteistön muutos parlamenttipuheessa 1960-2020, Politiikka</source>
          <volume>63</volume>
          (
          <year>2021</year>
          ). URL: https://journal. if/politiikka/article/view/109690 . doi:
          <volume>10</volume>
          .37452/politiikka.109690.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Blaxill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Beelen</surname>
          </string-name>
          ,
          <article-title>A feminized language of democracy? the representation of women at Westminster since 1945</article-title>
          ,
          <source>Twentieth Century British History</source>
          <volume>27</volume>
          (
          <year>2016</year>
          )
          <fpage>412</fpage>
          -
          <lpage>449</lpage>
          . doi:
          <volume>10</volume>
          . 1093/tcbh/hww028.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Quinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Monroe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Colaresi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Crespin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Radev</surname>
          </string-name>
          ,
          <article-title>How to analyze political attention with minimal assumptions and costs</article-title>
          ,
          <source>American Journal of Political Science</source>
          <volume>54</volume>
          (
          <year>2010</year>
          )
          <fpage>209</fpage>
          -
          <lpage>228</lpage>
          . doi:
          <volume>10</volume>
          .1111/j.1540-
          <fpage>5907</fpage>
          .
          <year>2009</year>
          .
          <volume>00427</volume>
          .x.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. V.</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. T.</surname>
          </string-name>
          ,
          <article-title>Digitization of the Canadian parliamentary debates</article-title>
          , in: T.
          <string-name>
            <surname>Säily</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nurmi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Palander-Collin</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Auer (Eds.),
          <article-title>Exploring future paths for historical sociolinguistics</article-title>
          , John Benjamins, Amsterdam,
          <year>2017</year>
          , pp.
          <fpage>83</fpage>
          --
          <lpage>107</lpage>
          . doi:
          <volume>10</volume>
          .1017/ S0008423916001165.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guldi</surname>
          </string-name>
          ,
          <article-title>Parliament's debates about infrastructure: An exercise in using dynamic topic models to synthesize historical change</article-title>
          ,
          <source>Technology and Culture</source>
          <volume>60</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          . 1353/tech.
          <year>2019</year>
          .
          <volume>0000</volume>
          . erability,
          <source>Usability, Applicability</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>373</fpage>
          -
          <lpage>383</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-150197.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Digital humanities on the Semantic Web: Sampo model</article-title>
          and portal series, Semantic Web - Interoperability, Usability, Applicability (
          <year>2022</year>
          ). Accepted, https://seco.cs. aalto.fi/publications/2021/hyvonen-sampo-model-
          <year>2021</year>
          .pdf .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ikkala</surname>
          </string-name>
          , E. Hyvönen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rantala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sampo-UI</surname>
          </string-name>
          :
          <article-title>A full stack JavaScript framework for developing semantic portal user interfaces</article-title>
          ,
          <source>Semantic Web - Interoperability, Usability, Applicability</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>69</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-210428.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Manolis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papadakos</surname>
          </string-name>
          ,
          <article-title>Faceted exploration of RDF/S datasets: a survey</article-title>
          ,
          <source>Journal of Intelligent Information Systems</source>
          <volume>48</volume>
          (
          <year>2017</year>
          )
          <fpage>329</fpage>
          -
          <lpage>364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rantala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ikkala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuominen</surname>
          </string-name>
          ,
          <article-title>Akatemiasampo-portaali ja -datapalvelu henkilöiden ja henkilöryhmien historialliseen tutkimukseen</article-title>
          ,
          <source>Informaatiotutkimus</source>
          <volume>40</volume>
          (
          <year>2021</year>
          )
          <fpage>28</fpage>
          -
          <lpage>56</lpage>
          . URL: https://journal.fi/inf/article/view/102656/64169 .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Using the Semantic Web in Digital Humanities: Shift from Data Publishing to Data-analysis and Serendipitous Knowledge Discovery</article-title>
          , Semantic Web - Interoperability, Usability, Applicability
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>187</fpage>
          -
          <lpage>193</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-190386.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>K.</given-names>
            <surname>Palonen</surname>
          </string-name>
          ,
          <article-title>Eduskunnasta puhekunnaksi? Parlamentarismi retorisena politiikkana</article-title>
          ,
          <source>Politiikka</source>
          <volume>47</volume>
          (
          <year>2005</year>
          )
          <fpage>141</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mimno</surname>
          </string-name>
          , Topic Regression,
          <source>Ph.D. thesis</source>
          , University of Massachusetts Amherst,
          <year>2012</year>
          . URL: https://scholarworks.umass.edu/open_access_dissertations/520.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Tangherlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leonard</surname>
          </string-name>
          ,
          <article-title>Trawling in the sea of the great unread: Sub-corpus topic modeling and humanities research</article-title>
          ,
          <source>Poetics</source>
          <volume>41</volume>
          (
          <year>2013</year>
          )
          <fpage>725</fpage>
          -
          <lpage>749</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.poetic.
          <year>2013</year>
          .
          <volume>08</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ylä-Anttila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Eranti</surname>
          </string-name>
          , Aihemallinnuksesta kehysmallinnukseen,
          <source>Politiikka</source>
          <volume>60</volume>
          (
          <year>2005</year>
          )
          <fpage>148</fpage>
          -
          <lpage>156</lpage>
          . URL: http://elektra.helsinki.fi/se/p/politiikka/60/2/aihemall.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>P.</given-names>
            <surname>DiMaggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <article-title>Exploiting afinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. Government arts funding</article-title>
          ,
          <source>Poetics</source>
          <volume>41</volume>
          (
          <year>2013</year>
          )
          <fpage>570</fpage>
          -
          <lpage>606</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.poetic.
          <year>2013</year>
          .
          <volume>08</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jacobi</surname>
          </string-name>
          , W. van Atteveldt,
          <string-name>
            <given-names>K.</given-names>
            <surname>Welbers</surname>
          </string-name>
          ,
          <article-title>Quantitative analysis of large amounts of journalistic texts using topic modelling</article-title>
          ,
          <source>Poetics</source>
          <volume>4</volume>
          (
          <year>2016</year>
          )
          <fpage>89</fpage>
          -
          <lpage>106</lpage>
          . doi:
          <volume>10</volume>
          .1080/21670811.
          <year>2015</year>
          .
          <volume>1093271</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>S.</given-names>
            <surname>Purhonen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Toikka, “
          <article-title>Big Datan” haaste ja uudet laskennaliset tekstiaineistojen analyysimenetelmät: esimerkkitapauksena aihemallianalyysi tasavallan presidenttien uudenvuodenpuheista 1935-2015</article-title>
          , Sosiologia
          <volume>53</volume>
          (
          <year>2016</year>
          )
          <fpage>6</fpage>
          -
          <lpage>27</lpage>
          . URL: http://elektra.helsinki.fi/se/s/ 0038-1640/53/1/bigdatan.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [51]
          <string-name>
            <surname>S.-M. Laaksonen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Nelimarkka</surname>
          </string-name>
          ,
          <article-title>Omat ja muiden aiheet: Laskennallinen analyysi vaalijulkisuuden teemoista ja aiheomistajuudesta</article-title>
          ,
          <source>Politiikka</source>
          <volume>60</volume>
          (
          <year>2018</year>
          )
          <fpage>132</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>A.</given-names>
            <surname>Törnberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Törnberg</surname>
          </string-name>
          ,
          <article-title>Muslims in social media discourse: Combining topic modeling and critical discourse analysis</article-title>
          ,
          <source>Discourse, Context and Media</source>
          <volume>13</volume>
          (
          <year>2016</year>
          )
          <fpage>132</fpage>
          -
          <lpage>142</lpage>
          . doi:
          <volume>10</volume>
          . 1016/j.dcm.
          <year>2016</year>
          .
          <volume>04</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [53]
          <string-name>
            <surname>J. B. Mountford</surname>
          </string-name>
          ,
          <article-title>Topic modeling the red pill</article-title>
          ,
          <source>Social Sciences 7</source>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .3390/ socsci7030042.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jelveh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kogut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Naidu</surname>
          </string-name>
          ,
          <article-title>Detecting latent ideology in expert text: Evidence from academic papers in economics</article-title>
          ,
          <source>in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <source>ACL</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1804</fpage>
          -
          <lpage>1809</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>