Political Stance Analysis Using Swedish Parliamentary Data

                            Jacobo Rouces, Lars Borin, Nina Tahmasebi
                               Språkbanken, University of Gothenburg, Sweden
                              {jacobo.rouces, lars.borin, nina.tahmasebi}@gu.se


       Abstract. We process and visualize Swedish parliamentary data using methods from statistics
       and machine learning, which allows us to obtain insight into the political processes behind the
       data. We produce plots that let us infer the relative stance of political parties and their members
       on different topics. In addition, we can infer the degree of homogeneity of individual votes within
       different parties, as well as the degree of multi-dimensionality of Swedish politics.


1    Introduction

With a changing political landscape and new political movements gaining traction across the
Nordic countries, Europe and the world, it is especially important for political scientists and
citizens to be able to assess the political stance of political actors with respect to specific top-
ics. Using methods from statistics and machine learning on Swedish parliamentary data, we
obtain plots that allow us to visualize the relative stance of political parties and members of
the Swedish Parliament1 (MSPs) on different topics, the degree of homogeneity of individ-
ual votes within different parties, as well as the degree of multi-dimensionality of Swedish
politics.
     Although the Swedish political landscape is an especially interesting case due to its rela-
tively high number of parties, the methods can be generalized to any open parliamentary data
set that includes voting records.
     Section 2 explains how the data was acquired and pre-processed. Section 3 presents the
results of our analysis. Section 4 presents lines of work through which we are currently
working or plan to work.


2    Data Preprocessing

We use parliamentary data publicly available at the website of the Parliament of Sweden2
covering the period from 2014-11-19 to 2018-06-20.
 1
   In order to avoid confusion of names, in this paper, we use the acronym MSP (for “member of the Swedish
   Parliament”), rather than the expected and more common MP (for “member of parliament”), since the latter
   acronym is long established in Swedish politics as the conventional abbreviated reference to Miljöpartiet de
   Gröna, the Swedish Green party.
 2
   http://data.riksdagen.se
377

     The data is originally obtained from files in comma-separated values (CSV) format. Each
file contains data for one parliamentary working year, organized in lines of text, each contain-
ing a fixed number of text fields separated by commas. Each of the fields contains a specific
value for a variable: identifier of a voting session, identifier of the document about which
the voting session is about, name of an MSP, value of the vote of the MSP in that voting
session, date of the voting session, etc. Some additional information is included about MSPs,
such as which party they are member of, their birth date, gender, and constituency. In our
current experiments we only use the party affiliation. Each document identifier contains the
identifier of one of the 15 existing parliamentary committees,3 which define topics such as
taxation, finance, health and welfare, etc. We extract this, which allows us to connect each
voting session to one of these topics.
     We carry out some preprocessing to normalize the data. The names of one party changes
in this period and we use the latter name only (Folkpartiet → Liberalerna). The names of
the MSPs are stored in the CSV files in different formats, sometimes given-name first and
in other cases family-name first. We attempt to unify this by converting the first format into
the second one. If an MSP leaves a party, we use their last affiliation. The Committee on
Defense and the Committee on Foreign Affairs are joined into one because some voting
sessions use a joint identifier. There is one voting session using a joint identifier for the
Committee on Cultural Affairs and the Committee on Civil Affairs; since there seems to be
little topical overlap between these two, we chose to omit that voting session rather than
conflating all the other voting sessions of the two committees. The value of the vote of an
MSP in a voting session can be ‘Yes’, ‘No’, ‘Abstention’, ‘Absence’ and ‘Nonexistence’.
‘Nonexistence’ reflects reflects the situation where the individual in question was not an
MSP at the time of the voting, whereas ‘Absence’ reflects that the individual was an MSP
but was not present during the voting session. ‘Nonexistence’ is the only value not explicitly
included as a record in the CSV files, so it is inferred for each voting session as the voting
value for the MSPs for which there is an explicit voting value in other voting sessions in
the time period under analysis, but not in the voting session at hand. In the remainder of
the document, we will refer to ‘Yes’, ‘No’, ‘Abstention’ and ‘Absence’ as ‘existing votes’.
We encode ‘Yes’ as a numerical value +1, ‘No’ as −1 and both ‘Abstention’, and ‘Absence’
as a 0. In some experiments where a numerical value is required for every MSP and every
voting session, we encode ‘Nonexistence’ as 0 as well. However, because ‘Nonexistence’
does not reflect an actual choice of the MSP, assuming its equivalence to abstention and
absence introduces a kind of noise – had that individual been an MSP at that time, the vote
might as well have been ‘Yes’ or ‘No’. For this reason, in order to prevent this noise from
distorting our conclusions, experiments that reflect information about MSPs will differentiate
between ‘Continuous MSPs’, defined as those MSPs for which there is an existing vote for
each voting session in the period analyzed (2010–2018); and ‘Discontinuous MSPs’, defined
 3
     These    are    explained  at    https://www.riksdagen.se/en/Committees/
     The-15-parliamentary-committees/
                                                                                        378
as those MSPs for which there is at least one voting session for which there is no existing
vote in the period analyzed.
    This results in information about 427 MSPs, 267 of which are continuous. In the 2010–
2018 period there are 2,583 voting sessions, each of which contains exactly 349 valid votes,4
totaling 901,467 valid votes in the whole period. Each voting session is associated to one
among 14 committees/topics.
    Table 1 shows the parties in the period of time analyzed, together with the number of
distinct MSPs throughout that period. The abbreviations and colors in the table will be used
in the following sections.


     Initials Party Name (English)          Party Name (Swedish)                     MSPs CMSPs Color
   —      No party affiliation (all previously members of SD)                              8        6
   C      Centre Party                        Centerpartiet                               25      19
   KD     Christian Democrats                 Kristdemokraterna                           23      10
   L      Liberals                            Liberalerna                                 23      10
   M      Moderate Party                      Moderata samlingspartiet                  100       66
   MP     Green Party                         Miljöpartiet de Gröna                       30      21
   S      Swedish Social Democratic Party Sveriges socialdemokratiska arbetarparti 144            79
   SD     Sweden Democrats                    Sverigedemokraterna                         46      38
   V      Left Party                          Vänsterpartiet                              28      14
Table 1: Parties in the Swedish Parliament in the period 2014–2018. The colors are used in subsequent plots
and are approximately similar to the official colors of the parties, slightly modified to increase contrast.
“L/Liberals/Liberalerna” were called “FP/Liberal People’s Party/Folkpartiet liberalerna” until 2015, but we use
the new name. CMSPs indicate continuous MSPs in 2014–2018.


3      Data Analysis
We obtain three different types of views on the parliamentary data. Each view is based on
different mathematical and computational methods and aims at giving insight into different
political aspects reflected in the parliamentary data.
    – We build similarity charts that show how close parties are to other parties, and MSPs
      to other MSPs, in terms of their voting record. This is done considering all votes as
      a whole, as well as individually for votes associated to specific topics defined by the
      parliamentary committees. We apply agglomerative hierarchical clustering (Murtagh and
      Contreras, 2012) to order the parties/MSPs in such a way that similar parties/MSPs lie
      nearby in the order, which in turn makes visible the existing block structure at the sub-
      party, party, and supra-party levels. The results are shown and discussed in Section 3.1.
 4
     The Swedish Parliament has 349 members.
379

 – We build an intra-party variation chart that show the degree of variation of the voting
   record of MSPs within individual parties. This allows us to assess the degree of party dis-
   cipline in each party for each topic. Again, This is done considering all votes as a whole,
   as well as individually for votes associated to specific topics defined by the parliamentary
   committees. The results are shown in Section 3.2.
 – Using Principal Component Analysis (PCA) (Wall et al, 2003), we build a political com-
   pass (Lester, 1994) that allows us to assess, in a more visual way, the clustering of the
   parties, the position of individual MSPs in relation to the clusters of their own party
   as well as others. PCA also allows us to assess the degree of multi-dimensionality of
   Swedish politics as it allows investigation of questions such as whether Swedish politics
   can be reduced to a single left–right axis, or otherwise, how many axes are needed. The
   results are shown and discussed in Section 3.3.


3.1   Similarity Charts

Figure 1 displays charts showing degrees of similarity (obtained through cosine distance) be-
tween the votes of each party (which are obtained averaging the votes of the MSPs on each
party). Each chart is obtained using the existing votes associated to each committee, and one
extra chart is obtained using all votes. Using single-linkage hierarchical clustering (Murtagh
and Contreras, 2012), parties are ordered so that those with similar voting records are close.
This creates black blocks centered around the diagonals; the blocks represent groups of par-
ties with high mutual similarity in their voting records.
     Similarly, Figure 2 contains a chart showing degrees of similarity (obtained through co-
sine distance) between the votes of each continuous MSP. Discontinuous MSPs are omitted
since any attempt at guessing a non-existent vote for a voting session would introduce a bias.
Voting sessions from all committees are considered. MSPs are again ordered using single-
linkage hierarchical clustering, which causes those with similar voting records to be close in
the linear order, and the block structure to become clearly visible.
     In Figures 1 and 2, the three most visible blocks correspond to the well-known formal
blocks in Swedish politics: the Alliance (M+L+C+KD), the Red-Greens (S+MP+V) and the
Swedish Democrats (SD and independents, all of which belonged previously to SD). In addi-
tion to this, one can draw additional conclusions. Within the Red-Greens, V stands apart from
S and MP, which are very close to each other on average. In Figure 2, the clustering algorithm
is not even able to group members of MP in a single cluster separate from S, which means
that the differences among the votes of members of MP are comparable to those between
them and the members of S. The division by topics in Figure 1 shows additional insights:
for instance, on constitutional affairs, the Alliance and the Red-Greens form a single block
opposed to SD. The difference between V and S+MP is lower in some committees such as
Finance and Taxation, which should correspond to a political agreement on such issues. The
gap between SD and S+MP is comparatively lower in some committees such as Civil Affairs
                                                                                              380


Fig. 1: Charts showing similarity between the votes of each party (which are obtained averaging the votes of the
MSPs of each party). Each chart is obtained using the votes associated to each committee, and one extra chart is
obtained using all votes. Higher levels of similarity are shown in darker shades. The diagonals are black because
they stand for the perfect similarity of a party with itself.
  381


Fig. 2: Similarity (obtained through cosine distance) between the votes of each continuous MSP. All votes are
considered. Higher levels of similarity are shown in darker shades. The diagonal is black because it stands for the
perfect similarity of an MSP with itself. The side colored bars represent the party of each MSP. The individual
names of the 160 continuous MSPs are omitted due to space constraints. The legend for colors can be found in
Table 1 and Figure 4.
                                                                                      382
and Industry and Trade. On ‘Finance’, M stands farther from the Red-Green block as com-
pared to the rest of the Alliance, which produces a strong sub-block. On ‘Social Insurance’,
L stands closer to the Red-Green block as compared to the rest of the Alliance. On ‘Environ-
ment and Agriculture’, L stands closer to the Red-Green block as compared to the rest of the
Alliance. MSPs classified as independent are very close to SD on all matters, which is not
surprising since all of them happen to be former members of SD (and their votes during their
period in SD are counted as well even if they are labeled as independents).

3.2    Intra-party Variation Chart
Figure 3 shows the degree of variation (standard deviation) of the existing votes of MSPs
within each party, averaged across the votes associated to each committee, as well as for all
existing votes too.


Fig. 3: Standard deviation of the existing votes of MSPs within each party, averaged across the votes of each
committee (upper rows) as well as for all votes (lowest row). The numbers in parentheses indicate the number
of voting sessions associated to that committee. As the legend shows, lower levels of variation (higher similarity
among MSPs of the same party) are shown in darker shades. The horizontal axis is ordered in increasing order
towards the right using the lowest row. The positions in the vertical legend are not related to the positions in the
vertical axis of the figure (committees).


3.3    Political Compass
When dealing with high-dimensional data, one cannot visualize more than two or at most
three dimensions in a single plot (using X,Y, and Z axes). The voting data is high-dimensional
because each MSP is a point with several thousands of dimensions (one for each voting
383
session). Principal Component Analysis (Wall et al, 2003) is a statistical technique that finds
‘principal components’: an ordered list of new axes on which the data can be projected (which
means that each axis is a weighted sum of the many original ones), in a way that the first
components contain most of the variance of the data (i.e. the data points are more spread
apart when projected onto the first components). This allows visualizing the data in a lower
dimensional space while maintaining most of its original information.
    Figure 4 shows MSPs plotted in two-dimensional subspaces defined by pairs of different
principal components. In each plot, the horizontal and vertical axes represent each two prin-
cipal components. The clustering of MSPs according to party is clear. It also shows which
MSPs have voting records closer to those from other parties. Additional components may
reflect different groupings, but the higher the component index, the smaller are the distances
reflected in their axes. The plot is akin to a multi-dimensional political compass (Lester,
1994) (also known by other names such as political spectrum, Nolan chart, and Pournelle
chart), with the particularity of this one being that it is strictly driven by the available data,
although currently the axes lack an interpretation.
    Figure 5 shows how much spread (variance) is contained in each of the principal compo-
nents. It shows that the voting data cannot be linearly squeezed into a straight line (like the
traditional left-right axis), although a 5-dimensional space constitutes a reasonable summary
of the data.


4     Ongoing Work

We are currently working on extending the work presented in this paper in the following
ways.

    – Currently, the topic of each vote is determined by the committee assigned to the vote.
      However, some committees conflate topics that might be relevant to analyze indepen-
      dently (for instance, the topics of “citizenship and migration” and “national pensions”
      are arguably distinct but both handled by the Committee on Social Insurance), while, hy-
      pothetically, the same topic may be addressed with different perspectives across several
      committees. Each voting session has an associated document from which topics could
      be induced and we are comparing different language technology methods in order to
      automatically assign topics to voting sessions.
    – Currently, the axes of the political compass in Figure 4 are unlabeled and, therefore, lack
      a clear interpretation. We plan to apply language technology techniques on the parliamen-
      tary documents associated to the voting sessions in order to identify the topics associated
      to each axis. We are also testing other matrix factorization methods that may be more ap-
      propriate for the task, like Independent Component Analysis (Hyvärinen and Oja, 2000),
      which tries to decompose the data in a sum of independent non-Gaussian components,
      and Non-Negative Matrix Factorization, which allows non-negative data to be decom-
                                                                                          384


Fig. 4: MSPs plotted in two-dimensional subspaces defined by pairs of different principal components. The larger
dots correspond to continuous MSPs during the whole period under analysis (2010–2014), as defined in Section 2.
The smaller dots correspond to discontinuous MSPs.
385


Fig. 5: Variance contained in each of the principal components, showing how the voting data cannot be linearly
squeezed into a single left-right axis, although the first 5 components contain significantly more variance. The
plot is cropped until the 50th component, beyond which the decreasing trend is smoothly continuous.


    posed in non-negative components (Tandon and Sra, 2010). Non-linear decompositions
    are being tested too.
  – So far we have used the data from the period 2014–2018 (a single political term, and
    analyzed differences between MSPs and parties within that period of time as a single
    block. The methods can be extended to previous legislatures and generalized in a way
    that changes over time are also analyzed. Similarly, new results can be provided as new
    data from the latest parliamentary activity is released.
  – The results can also be generalized to other data from national or supra-national parlia-
    ments (for instance, the European Parliament). If the voting sessions are not associated to
    committees that provide clear topical distinctions, the above-mentioned topic extraction
    could be used.

Acknowledgements

This work has been supported by a framework grant (Towards a knowledge-based cultur-
omics;5 contract 2012-5738) as well as funding to Swedish CLARIN (Swe-Clarin;6 contract
2013-2003), both awarded by the Swedish Research Council, and by infrastructure funding
granted to Språkbanken by the University of Gothenburg.
  5
      https://spraakbanken.gu.se/eng/culturomics
  6
      https://sweclarin.se/eng
                                                                                   386
References

Hyvärinen A, Oja E (2000) Independent component analysis: Algorithms and applications.
  Neural Networks 13(4-5):411–430.
Lester J (1994) The evolution of the political compass (and why libertarianism is not right-
  wing). Journal of Social and Evolutionary Systems 17(3):231–241.
Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: An overview. Wiley
  Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(1):86–97.
Tandon R, Sra S (2010) Sparse non-negative matrix approximation: New formulations and
  algorithms. Tech. Rep. 193, Max Planck Institute for Biological Cybernetics, Tübingen.
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal
  component analysis. In: A Practical Approach to Microarray Data Analysis, Springer,
  Berlin, pp 91–109.