<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Model to explore the French Parliamentary Debates during the early Third Republic (1881-1899)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Bourgeois</string-name>
          <email>nicolas.bourgeois@epitech.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aurélien Pellet</string-name>
          <email>aurelien.pellet@epitech.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie Puren</string-name>
          <email>marie.puren@epitech.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Kremlin-Bicêtre, France</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre Jean-Mabillon (CJM), École nationale des chartes</institution>
          ,
          <addr-line>65 rue de Richelieu, 75002 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Natural Language Processing</institution>
          ,
          <addr-line>Topic Modelling, Parliamentary Debates</addr-line>
          ,
          <country>France, Early Third Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1881</year>
      </pub-date>
      <fpage>1881</fpage>
      <lpage>1899</lpage>
      <abstract>
        <p>In this long paper, we use NLP techniques to explore two decades (1881-1899) of parliamentary debates of the French Third Republic (1870-1940), and more specifically to analyse the importance of the army in the political debate. We use Latent Dirichlet Allocation to partition the vocabulary into topics, and then study the distribution of the topic “army” over time. We also examine its connection with other topics, in relation to the main political and military events of the period.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In this paper, we present the preliminary work we have carried out on a set of parliamentary
ocfiiel
(M. Puren)</p>
      <p>https://recherche.epitech.eu/rushmore_teams/nicolas-bourgeois (N. Bourgeois);
https://recherche.epitech.eu/rushmore_teams/aurelien-pellet/ (A. Pellet);
https://recherche.epitech.eu/rushmore_teams/marie-puren/ (M. Puren)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>CEUR
Workshop
Proceedings
onwards3. This choice is dictated by the document we are working on: it is from 1881 that the
debates held in the Chamber of Deputies are recorded in a publication specifically dedicated to
them4.</p>
      <p>
        The Chamber of Deputies played a considerable political role, especially in the nineteenth
century. At that time, the government paid particular attention to this assembly [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We thus
have access to the full report of the debates, written by a body of specialised civil servants set up
in 1847, whose techniques aim to recreate the naturalness of the deliberations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Parliamentary
debates are therefore an essential historical source for political history [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but also for other
historical fields, since they make it possible to follow the major stages in the development of
the legislative framework of various social, economic, religious or cultural fields of activity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
They are also of interest to other disciplines: political science, sociology, linguistics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or legal
history [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        However, while all parliamentary debates since the French Revolution were made available
online between 2009 and 2016, this has not prompted a new wave of research. Although they
constitute a fundamental democratic institution, debates are indeed little known by the general
public and little studied by specialists [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. On the other hand, the availability of its Anglo-Saxon
counterpart, the Hansard5, in the form of exploitable textual data, has stimulated new research
in history and in political science [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] but also in linguistics and natural language processing
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The form of the French debates and the means made available to users to read them online,
make them a dificult source to work with: to navigate through the digitised reports, it is best
to already know what you are looking for (for example: to search for debates on a law carried
out on a specific date). It is possible to do a full-text search within an issue (which corresponds
to a parliamentary sitting), but this does not allow the user to explore the corpus as a whole,
especially if he or she is interested in a major topic that has been debated over several years.
      </p>
      <p>
        Fortunately it is possible to extract the text of these digitized documents. From a
methodological point of view, parliamentary debates thus constitute an excellent case study for the
computational exploration of large historical corpora. While digitisation provides access to an increasingly
large amount of historical data, it requires the development of new ways of reading digitised
ancient sources [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], such as the methods ofered by “distant reading” as defined by Franco
Moretti [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Within the framework of the AGODA6 project, funded by the National Library of
France [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], our team is working on the development of tools to facilitate the exploration of this
corpus. As part of this work, we propose to use topic modelling, a method that is particularly
appropriate for the study of large historical corpora [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Topic modelling has shown its value in analysing similar sources, in particular the press
(such as in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Such corpora, large in volume, serial, and crossed by many diferent
topics that evolve over time, are well suited to a topic-based exploration. We wish to show the
interest in such a method to analyse and explore our corpus. Topic modelling indeed seems to
us to be an interesting “entry point” into parliamentary debates. We start from the hypothesis
that identifying the topics present in these debates makes it possible to better understand the
evolution of political ideas and debates over time. We present here a first approach based on raw
(uncorrected) data collected on a large scale. We have chosen to focus on the topic “army” and
its co-occurring topics. The French army is indeed a stable institution during the period, which
oficially does not depend directly on political governments. However, discussions concerning
the army were numerous and repeated in the Chamber, as the MPs had to decide on various
issues related to its functioning (budgets, reforms, conscription, etc.), its activities (wars and
conflicts, external operations, etc.) or political events (such as the infamous Dreyfus afair).
Although soldiers did not have the right to vote from 1872 to 1945, and the army was supposed
to remain politically neutral7, the military were also surprisingly present on the French political
scene in the nineteenth and twentieth centuries, even if they are still discreet in political history
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The Minister of War is the one who deals with Parliament, which keeps a close eye on
him. In practice, the Minister proposes governmental projects, and Parliament chooses whether
or not to support them. The centre of decision making in defence policy, particularly in regard
to projects concerning the colonial army, is therefore in Parliament [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>We hypothesise that topic generation model will allow us to better understand the action
of the army over the identified period, what its fields of intervention were, and to grasp to
what extent the French army (represented by the Minister of War) was able to participate in
the elaboration and execution of the political decisions. To assess this hypothesis, our study is
divided into two parts. First, we assess the consistency of the topics that our model identifies,
with particular attention to the topic “army”. We study the results obtained in the light of
current historical knowledge, in order to verify the validity of the model. We then examine a
few topics co-occurring with the topic “army”, assuming that the validity of these correlations
can be verified with the historical data at our disposal.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data set</title>
      <p>Digitised by the National Library of France and the archives of the National Assembly, the
records of the French parliamentary debates are available online on Gallica, a freely accessible
digital library, together with some precious metadata. Automatic transcription (OCR) have
also been performed on these records, and the resulting texts have been made available online
in ALTO-XML format and in raw text8. Transcription was generated on the fly at the time
of digitisation by an OCR software ( ABBYY FineReader ), and put online without extensive
post-correction.</p>
      <p>A detailed analysis of the quality of this transcription - and how to improve it - is beyond the
scope of this article. Let us just mention that while the current transcription is unfortunately
not accurate enough for performing precise tasks such as named entities recognition, we believe
it to be fit for the purpose of a broad analysis of the vocabulary. Most OCR errors are indeed
7The nickname of the French army was the “Grande Muette” at the time, which meant that soldiers remained
“mute” on political issues, in order to avoid any risk of political destabilisation.</p>
      <p>8They can be retrieved with the following API : https://api.bnf.fr/fr/api-document-de-gallica#/
located in specific parts of the text, namely in the binding of documents, where the pages can
be very curved9. But luckily this does not represent a significant part of the corpus.</p>
      <p>We are interested in the place of the army in the parliamentary debates of the early Third
Republic, a period marked by significant military activity, particularly with the wars of
colonisation (protectorate over Tunisia (1881), Tonkin campaign (1883-1886), exactions committed by the
military (Voulet-Chanoine mission in 1899), etc.) and the resulting tensions with its European
“competitors” (Fashoda Incident in 1898). The army also intervened within the metropolitan
borders to suppress strike movements, ending sometimes in bloodshed (Fusillade de Fourmies
(Fourmies shooting) in 1891). The trauma of the 1870 defeat also led to a reform of the army
from 1871 onwards, which continued in the following years with the expansion of recruitment
(Freycinet laws in 1889). It was also a period marked by various scandals and afairs such as
the Scandale des décorations (Medals scandal) (1887), the Schaebelé Afair (1887), or the arrest
and conviction of Captain Dreyfus (1894-1899). We have decided to limit our work to the years
1881-1899 in order to encompass these events without extending the size of the corpus beyond
our reach. Over this period we dispose of 2597 reports in text format, almost 4 per week, and
over 80 millions words.</p>
      <p>A parliamentary sitting is a long and composite event, during which several unrelated issues
are discussed in succession. For this reason, we have divided the reports according to their
sections (a section corresponds to a single debate, which deals with a well-defined issue),
which usually focus on a single topic. We processed this division automatically by identifying
intermediate headings identified as isolated sentences written in capital letters. Thus, our corpus
consists of 35891 small documents, with an average size of 2200 words.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        The Latent Dirichlet Allocation (LDA) topic generation model was first presented in 2003 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
It is based on a Bayesian probabilistic model, which is derived from the following theoretical
assumption. Before any article is written, there are topics, this term designating semantic fields,
i.e. sets of words linked by their meaning. Then, texts are produced by choosing words from a
small subset of topics with a given probability distribution. In practice, this means that the texts
are the observations derived from hidden variables, namely the topics, and that the statistical
correlations in the texts are the direct results of the semantic similarities. We therefore hope to
ifnd the topics by reversing the generation process. In other words, we want to know the topics
as word distributions and the texts as topic distributions, conditional on the observed word
distribution. Unfortunately, the calculation of the universe probability is not feasible and so we
have to approximate this quantity. Many algorithms have been introduced in the literature to
deal with this issue; here we simply use the original algorithm of [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], namely the variational
mean field method.
      </p>
      <p>
        Topic modelling has been widely used in many areas of the Humanities and Social Sciences
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], as it is a very powerful tool for extracting information from a large corpus in an unsupervised
context, i.e. when classes are not defined a priori. However, the results are particularly reliable
when the assumptions of the model are satisfied by the study corpus. This includes: a large
9The following digitised image illustrates this problem.
number of texts; each text dealing with a limited number of topics; each topic being distributed
over several texts in the corpus; a common conceptual framework shared by all authors. If
newspaper articles are the paradigmatic example [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], parliamentary debates also meet
all these requirements. Once the topics are generated, they can be used as new variables for
the study of vocabulary. This drastically reduces the size of the variable space (from more than
50000 forms to a few dozen topics) and thus makes visualisations possible - obviously with a
significant loss of information. We can for example study the intensity of topics over time [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
or study the correlation between topics.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. General Results</title>
      <sec id="sec-4-1">
        <title>4.1. Structure and semantic coherence of the topics</title>
        <p>The topics provided by the algorithm are remarkably coherent. If we consider the main keywords
of some of them, it is easy to guess what they are representing (see Table 1). For instance, Topic
8 deals with the class struggle and the working class situation, with words like salaire (wages),
patron (boss), syndicat (labour union), grève (strike) or ouvrier (worker). On the other hand,
Topic 11 clearly relates to the army, with words like général (general), régiment (regiment),
troupe (troop), soldat (soldier) or guerre (war).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Categorisation of topics into classes</title>
        <p>These topics can easily be divided into two main categories. The first broad category includes
topics related to the functioning of the Chamber: speaking tours, organisation of the sitting,
votes, etc. The conduct of a sitting (even if it is sometimes disrupted) is highly codified. Even if
the aim of the stenographers is to reproduce the naturalness of the exchanges and speeches, the
transcription of the debates must accurately record each stage of the parliamentary sittings, from
the bill’s introduction to its final vote, but also all the elements relating to the functioning of the
assembly (announcement of leave, composition of committees, questions to the government,
etc.).</p>
        <p>The second category includes topics that are semantically more significant for our study.
The latter captures the diferent issues that dominated the parliamentary debates. Naturally,
there are also some useless topics - for instance Topic 12 is nothing but the list of all French
departments. This is because the names of the departments often appear in debates: at the time
of the verification of election results (each deputy represents a department), during debates
which frequently concern local life, or at the time of the vote on bills because the voters are
identified by their name and the department they represent.</p>
        <p>Also some topics are very similar to each other, especially those dealing with how the
Chamber works. Hence we categorised the 50 topics in 16 classes with the following labels
(Table 2):
In Table 2, we also calculated the contribution of each of these classes in the corpus (Cf.
column “Weight in the corpus”). This allows us to better understand whether a class of topics
was more or less frequently addressed in the corpus.</p>
        <p>
          If we disregard the first two classes, which bring together topics concerning the functioning of
the Chamber of Deputies, we can see that “budget”, “working class”, “economy” and
“trains/communications” are the four classes of topics that appear most often in the corpus. “Budget” is
naturally the most important class of topics, because the key role of the Chamber of Deputies is
to discuss the state budget and to allocate the funds needed to enforce government policy. The
growth of the working class and the rise of socialism are also well reflected in the debates: MPs
address social struggles in their speeches; we also see the (timid) beginning of social legislation
in the 1890s. “Economy” is one the class of topics most often dealt with by the Chamber of
Deputies, as it is frequently the subject of legislation (particularly with the question of taxation).
This class of topics is also frequently present, as it relates to many sectors (agriculture, trade
agreements, industrialisation, etc.). “Train/communications” reflects the significant investment
in the development of communications infrastructures, and the creation of the French railway
network - one of the most developed in Europe at the beginning of the twentieth century. More
generally, an examination of this figures confirms the coherence of the topics we have identified:
they are quite consistent with the major themes that marked political life during the early Third
Republic [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>Beyond this simple comparison between their weight in the corpus, we see (Figure 1) that the
various classes have unequal variances. Let us consider the topic “army”. While the quartiles
are not extremely far from the median, there are some strong outliers. They can go up to 0.2,
and 6 of them are greater than 0.1. It seems that when “army” is the main topic, it tends to
become hegemonic. On the two extremes, the topic “budget” has a very high variance while
“school” never gets to be really prominent, the maximum never goes higher than 0.07.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Distribution of the topic “army” over time</title>
        <p>more represented in the first half of the period, while“colonies” circulates quietly in the corpus,
and gains in importance in the second half of the period with strong peaks in 1886, 1895 and at
the very end of the 1890s.</p>
        <p>Figures 3 and 4 show distribution of the topic “army”, respectively for all the years considered,
and for the year 1884 during which the topic is particularly present. In Figure 3, several peaks
can be seen that were not visible in the previous figure. These peaks can be explained by the
colonial policy conducted by France, by the military reforms that took place during the period,
and by the Dreyfus afair. In 1888 and 1889, the law reducing the length of military service was
discussed and voted. The years 1892 and 1893 were marked by the continuation of colonial
conquests (Comoros, Tunisia, Sudan, Dahomey, Ivory Coast, Siam). The Dreyfus afair began
in 1894 and continued until the end of the period studied. In 1897, the borders of the French
colonial empire are stabilised with the last conquests (Indochina and Madagascar), and the
Franco-Russian military alliance was afirmed in case of war.</p>
        <p>Figure 4 shows that 1884 was a year in which there were several intense discussions about
the army in the Chamber of Deputies. Two issues occupied the MPs. On the one hand, they
deliberated on a bill to reduce the length of military service between April and June (see peaks
in April and June). On the other hand, they also had to discuss the military operations carried
out by France in Tonkin. The way in which the government conducted this war of conquest can
be seen in the shape of the graph: (1) the government asked for new credits in February, which
led to heated debates; (2) the government sought to increase the number of colonial troops to
satisfy its ambitions and proposed a project to this efect in June; (3) the Chamber discussed the
budget in December, and in particular the credits allocated to colonial troops.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Cross study of the topics’ prevalence</title>
      <sec id="sec-5-1">
        <title>5.1. Time-based correlation between topics</title>
        <p>We are now looking for high co-intensity topics, i.e. topics that tend to be frequently associated
with each other in specific parts of the corpus. Correlation is based on the number of text
units that contain a significant percentage of both subjects. These text units are not defined
semantically, but on the basis of a fixed length window of 6000 characters.</p>
        <p>We create a first indicator by dividing our corpus according to the date of production of the
texts composing it. This allows us to divide the corpus into smaller segments (by month or by
year), and then determine whether a high proportion of the topic ”army” is correlated with a
high or low proportion of other topics in the same period. We take the average weight of each
topic per month and calculate the Pearson correlation coeficient between the topic “army” and
all other topics. We then observe that the intensity of the topic “army” over the course of a
month (Figure 5) is positively correlated with the following topics: “colonies”, “navy”, “foreign
afairs”. Perhaps most surprising is the strong correlation with the topic “school”.</p>
        <p>However this correlation is rather weak in absolute terms. In the course of a given period
(even a single parliamentary sitting, i.e. a single day), many diferent issues are addressed by
MPs. Information therefore tends to be spread across most subjects. In particular, some topics
such as the names of MPs, the departments they come from, and the vocabulary describing the
functioning of parliament, are evenly spread over the period. We therefore decided to look for a
correlation at the lowest level. We are in fact looking to answer the following question: what
proportion of the blocks that address with high intensity the topic ”army” (more than 15% of
the vocabulary) also deals with high intensity with another given topic?</p>
        <p>We find a very strong correlation between army and navy (15.5% of documents with a high
proportion of the topic “navy” also have a high proportion of the topic “army”), followed by
“colonies”, “school”, “law enforcement” and “budget” (see Table 3 and Figure 6). Since the topic
“government/parliament” is fairly evenly distributed throughout the corpus, other topics such
as “working class”, “alcohol” or “local politics” are almost completely disconnected from the
topic “army”. The case of the names of the MPs and the departments is specific, as these two
topics are mainly present in specific sections, i.e. the vote count.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Study of topics with a strong correlation with the topic “army”</title>
        <p>
          We aim here to focus on the strong correlations we have just identified. We will examine the
nature of these correlations by “close reading” [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] the texts we are studying. We will also check
the validity of these results in the light of current historical knowledge.
        </p>
        <p>
          Figure 7 shows that there is a strong correlation between “army” and “school”; this is mainly
related to the debates on the reform of military service. The law of 1872 had established a
military service that could last up to five years, but with a certain number of exemptions
(teachers, students of grandes écoles10 and seminarians [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]). The association of “army” with
“school” refers to the exemptions granted to teachers and students of grandes écoles, which the
Republican MPs wanted to put an end to.
        </p>
        <p>
          This correlation was most intense in 1887, although the law removing these exemptions and
reforming the military service was passed in 1889. This was because this law was in the making
from the early 1880s [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Between 1876 and 1889, there were twelve bills related to this issue
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. But it was really in 1887, with the renewed tensions between Germany and France, that
the Ministry of War, in agreement with the Chamber of Deputies, decided to transform the law
of 1872 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The report on this project was proposed and discussed between June and July
1887. After passing through the Senate, the bill was presented to the MPs again in December
1888 and voted on in January 1889 [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>The topic “law enforcement” includes vocabulary related to the creation of the law, as well as
references to punishments and means of control inside the military. Figure 7 reflects the intense
10For more information on the French system of grandes écoles, please see this Wikipedia article.
legislative activity relating to the army during this period. Discussions about military service
laws explain its strong correlation with “army” between 1881 and 1889; they also cause a strong
peak in 1887 for the same reason.</p>
        <p>
          These recruitment issues were also linked to the international context. The increase in
the intensity of the association of “army” with “foreign afairs” in 1894-1895 (see Figure 8)
can be explained by the introduction of a law in November 1894 extending the duration of
incorporation to two years, in order to increase the army’s strength [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. This was a reaction to
the changes that were taking place in the German military, whose growing power frightened
the MPs. Between 1893 and 1894, the number of German soldiers increased following the
introduction of the two-year service. Our model captures this trend well by clearly associating
the topic “army” with the topic “foreign afairs”.
        </p>
        <p>
          The peak in 1895 can be explained by the return of the project to the Chamber in June 1895,
as the German strength had just increased by 70000 men [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The association between “army”
and “foreign afairs” also reveals the competition with Great Britain in colonial afairs. In 1884,
France took control of Annam, while Great Britain extended its influence over Burma. Both
imperialisms were in contact with South China, which led to tensions, especially in 1885 over
Siam. In 1885, there were also strong tensions with the British in the face of growing French
appetite for Madagascar. These tensions were also high at the time of the second Madagascar
expedition in 1895 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] (see peaks in 1885 and 1895 in Figure 8); The peak in 1881 can also be
explained by tensions with another European competitor for the conquest of new territory:
a Franco-Italian crisis broke out in June following the Treaty of Bardo, which placed Tunisia
under the French protectorate.
        </p>
        <p>
          We also note the correlation of the topic “army” with the topic “colonies”. This correlation
refers to the crucial role of the army in the acquisition and defence of colonies. This association
follows a pattern quite similar to the previous association (see Figure 8): the topic is present
throughout the period, but with a strong intensity in the early years (1884-1885), and a second
peak in the mid-1890s. Our model succeeds in capturing the way in which the executive power
imposes its colonialist policy on Parliament. After the defeat of 1870, the colonial enterprises
were blamed for the domestic defeat, as they were said to have taken away the men and funds
needed for national defence. Public opinion - and the MPs with it - was at best indiferent, at
worst hostile, to new conquests. The arrival in power of the opportunist Republicans in 1879
nevertheless saw the renewal of colonial expansion, which resumed in 1880 and continued
intensively until 1885. This policy of conquest was carried out in parallel on several fronts:
notably in Tunisia (1880-1881), Annam and Tokin (1883-1885), not to mention Sudan, Congo
and Madagascar [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The government therefore had to “trick” public opinion and Parliament,
and act on the sly to conceal the extent of its ambitions. Then, as the dificulties accumulated, it
insensibly obtained an increase in credits, the sending of increasingly large reenforcements,
and irresistibly dragged the MPs into the spiral of conquest [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>
          The 1884-1885 peak in Figure 8 is explained by the launching of the Tonkin expedition, for
which the government asked the Chamber for new credits and troops in 1884 and early 1885.
Debates were particularly intense on this subject in 1885, as the dificulties encountered by
the French army in April (Retreat from Lạng Sơn at the end of March 1885) led to an outcry
in the Chamber and the fall of the government [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The Chamber elected in 1885 was more
anti-colonialist than the previous one and avoided any colonial adventure of the importance of
Tonkin; but from 1890 onwards, the opposition began to diminish until it disappeared. The very
principle of colonisation was progressively accepted and increasingly supported by the MPs
[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], even if this did not avoid stormy debates in the assembly. The intensity of the correlation
between “army” and “colonies” from 1894 to 1896 is mainly explained by the second expedition
led by the French army in Madagascar. In November 1894, the government submitted a request
for credits to send an expeditionary corps to the island. This expedition was partly a failure;
and in March 1895, the government was interpellated by the Chamber about the pitiful state of
the troops. In July 1895, the conquest of Madagascar was resumed but it was stalled. A text is
then presented to the Chamber to reform the recruitment of the colonial armies [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          Let us examine the year 1896 in particular. We can see that the correlation between “army”
and “colonies” is quite strong. In March and July 1896, a bill on colonial armies was discussed
in the Chamber. It is interesting to note that, in its second version, the bill proposed to entrust
the entire management of colonial units to the Navy [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The text shows the birth of a new
trend in the Chamber in favour of this branch of the armed forces. On 27 October 1896, the
government proposed a new bill on the colonial army, which the Navy would be responsible for,
as it was the only one capable of ensuring the continuity of transport and logistics11 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. This
explains the strong correlation between the topics “army” and “navy” in 1896 see (Figure 9).
The topics “army” and “navy” are frequently associated, whether for cooperation - (the Navy
transports colonial troops) - or competition between the two branches of the military.
        </p>
        <p>
          This association was rather weak during the 1880s but reached a peak in 1896. Until 1895,
the Navy had been relatively indiferent to colonial troops. In June 1895, however, the Minister
of the Navy claimed responsibility for the management of colonial units from the Ministry of
War. This request was the consequence of the rivalry between the two armies over Madagascar,
as the Navy could not bear the idea that the Army had taken charge of the expedition [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The
ifnancial competition between the two armies was becoming tougher, especially as the Navy
needed new investments to modernise the fleet and train staf [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. This is why in 1896 a great
wave of legislative reforms was launched concerning the organisation of the Navy, notably the
creation of a naval school [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The results of our study show the validity of topic modelling for the analysis of parliamentary
debates. This confirms the interest of using such a method to facilitate the analysis of this
major historical source. This study also allows us to draw a number of interesting insights on
parliamentary debates, which we wish to explore further.</p>
      <p>
        We see that the weight of the descriptive vocabulary of parliamentary activity itself is very
important in the corpus; but this problem is rather well solved thanks to the topic model. We
then observe that parliamentary debates follow their own rhythm. This rhythm is in fact
imposed by the legislative process, which requires long debates before a law is finally voted.
This means that subjects can be dealt with by the Chamber of Deputies long before they become
newsworthy. Conversely, issues that make the news are rarely discussed during parliamentary
sittings; they are usually dealt with long after they have made the headlines. Topic modelling
therefore seems to us to be a method that makes it easier to identify underlying political trends.
11The project was finally rejected in December.
We also note the reactive nature of parliamentary work: this means that a major legislative
efort can take place a few months or even several years after the triggering events (as shown in
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] for instance). Finally, there is another consequence of the way legislative work is carried
out, namely the weight that discussions on sensitive subjects can take on, without leading to a
vote or the production of a law. The Chamber can indeed seize on a subject to interpellate the
government - this is for instance the case of Tonkin after the Retreat from Lạng Sơn in 1885.
      </p>
      <p>While encouraging, these results are still preliminary. We are working in two directions To
further complement and improve them. In order to obtain information on more specific and
hopefully unexpected correlations (e.g. the role of the church in the army, or the influence of
the executive branch), we will use additional tools, such as word embedding, to further divide
the corpus into a few hundred groups, some of them very specific, and to study their life cycle
in relation to the army. To improve our model, we are planning to enlarge the period studied,
and to work on a less faulty corpus. Within the framework of the AGODA project, we are thus
evaluating the solutions available to us to improve the results of the OCR, hoping to further
enhance these first results.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank the Bibliothèque nationale de France for its support in the framework
of the BnF DataLab.
The data and source code are available via GitHub.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Coniez</surname>
          </string-name>
          ,
          <string-name>
            <surname>L'</surname>
          </string-name>
          <article-title>invention du compte rendu intégral des débats en france (1789-1848</article-title>
          ), Parlement[s],
          <source>Revue d'histoire politique 2</source>
          (
          <year>2010</year>
          )
          <fpage>146</fpage>
          -
          <lpage>159</lpage>
          . doi:
          <volume>10</volume>
          .3917/parl.014.0146.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gardey</surname>
          </string-name>
          , Scriptes de la démocratie :
          <article-title>les sténographes et rédacteurs des débats (</article-title>
          <year>1848</year>
          -2005),
          <source>Sociologie du travail 52</source>
          (
          <year>2010</year>
          ). doi:
          <volume>10</volume>
          .4000/sdt.13695.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ouellet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roussel-Beaulieu</surname>
          </string-name>
          ,
          <article-title>Les débats parlementaires au service de l'histoire politique</article-title>
          ,
          <source>Bulletin d'histoire politique 11</source>
          (
          <year>2003</year>
          )
          <fpage>23</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .7202/1060736ar.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lermercier</surname>
          </string-name>
          ,
          <article-title>Le vocabulaire des débats sur la loi de 1841 sur le travail des enfants : Premiers résultats sur la chambre des pairs</article-title>
          ,
          <fpage>4</fpage>
          -
          <lpage>10</lpage>
          mars
          <year>1840</year>
          ,
          <year>2006</year>
          . URL: https://halshs. archives-ouvertes.fr/halshs-0010745.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>C. de Galembert</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Rozenberg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Vigour</surname>
          </string-name>
          ,
          <article-title>Faire parler le parlement: méthodes et enjeux de l'analyse des débats parlementaires pour les sciences sociales, LGDJ-Lextenso éditions, Issy-les-</article-title>
          <string-name>
            <surname>Moulineaux</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pépratx</surname>
          </string-name>
          ,
          <article-title>La majorité politique : Étude des débats parlementaires sur la ifxation d'un seuil</article-title>
          , in: A.
          <string-name>
            <surname>Percheron</surname>
          </string-name>
          , R. Rémond (Eds.), Age et politique, La vie politique,
          <source>Economica</source>
          , Paris,
          <year>1991</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bonin</surname>
          </string-name>
          ,
          <article-title>From antagonist to protagonist: 'democracy' and 'people' in british parliamentary debates</article-title>
          ,
          <fpage>1775</fpage>
          -
          <lpage>1885</lpage>
          , Digital Scholarship in
          <source>the Humanities</source>
          <volume>35</volume>
          (
          <year>2010</year>
          )
          <fpage>759</fpage>
          -
          <lpage>775</lpage>
          . doi:
          <volume>10</volume>
          .1093/ llc/fqz082.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mollin</surname>
          </string-name>
          ,
          <article-title>The hansard hazard: gauging the accuracy of british parliamentary transcripts</article-title>
          ,
          <source>Corporas</source>
          <volume>2</volume>
          (
          <year>2008</year>
          )
          <fpage>187</fpage>
          -
          <lpage>201</lpage>
          . doi:
          <volume>10</volume>
          .3366/cor.
          <year>2007</year>
          .
          <volume>2</volume>
          .2.187.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Clavert</surname>
          </string-name>
          , Vers de nouveaux modes de lecture des sources, in: O. L.
          <string-name>
            <surname>Deuf</surname>
          </string-name>
          (Ed.),
          <article-title>Le temps des humanités digitales</article-title>
          ,
          <source>FYP EDITIONS, Roubaix</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          , Distant Reading, Verso, London,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Puren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vernus</surname>
          </string-name>
          , Agoda :
          <article-title>Analyse sémantique et graphes relationnels pour l'ouverture et l'étude des débats à l'assemblée nationale</article-title>
          ,
          <year>2021</year>
          . URL: https://hal.archives-ouvertes.fr/ hal-03382765.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Shawn</surname>
          </string-name>
          , I. Milligan,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weingart</surname>
          </string-name>
          ,
          <article-title>Exploring big historical data: the historian's macroscope</article-title>
          , Imperial College Press, London,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lavenir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bourgeois</surname>
          </string-name>
          ,
          <article-title>Old people, video games and french press: a topic model approach on a study about discipline, entertainment and self-improvement</article-title>
          .,
          <source>MedieKultur: Journal of media and communication research</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Violla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Verheul</surname>
          </string-name>
          ,
          <article-title>Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the usa,</article-title>
          <year>1898</year>
          -
          <fpage>1920</fpage>
          , Digital Scholarship in
          <source>the Humanities</source>
          <volume>35</volume>
          (
          <year>2020</year>
          )
          <fpage>921</fpage>
          -
          <lpage>943</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqz068.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Forcade</surname>
          </string-name>
          , Éric Duhamel, P. Vial (Eds.),
          <source>Militaires en République</source>
          ,
          <year>1870</year>
          -
          <fpage>1962</fpage>
          , Éditions de la Sorbonne, Paris,
          <year>1999</year>
          . doi:
          <volume>10</volume>
          .4000/books.psorbonne.
          <volume>61562</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>J.-C. Jaufret</surname>
          </string-name>
          , Parlement, gouvernement, commandement : l'armée de métier
          <source>sous la 3è république 1871-1914, Ph.D. thesis</source>
          , Université de Paris I Panthéon Sorbonne, Paris,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          (
          <year>2003</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <article-title>Topic modeling and digital humanities</article-title>
          ,
          <source>Journal of Digital Humanities</source>
          <volume>2</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <article-title>Topics over time: a non-markov continuous-time model of topical trends</article-title>
          ,
          <source>KDD'06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          (
          <year>2006</year>
          )
          <fpage>424</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>J.-M. Mayeur</surname>
          </string-name>
          , Les débuts de la
          <source>IIIe République</source>
          <year>1871</year>
          -1898, Editions du Seuil, Paris,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Crépin</surname>
          </string-name>
          , Défendre la France :
          <article-title>Les Français, la guerre et le service militaire</article-title>
          , de la guerre de Sept Ans à Verdun, Presses universitaires de Rennes, Rennes,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Battesti</surname>
          </string-name>
          ,
          <article-title>La Marine au XIXe siècle</article-title>
          .
          <source>Interventions extérieures et colonies</source>
          , Du May, Paris,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bouche</surname>
          </string-name>
          , Histoire de la colonisation française.
          <source>Flux</source>
          et reflux :
          <fpage>1815</fpage>
          -
          <lpage>1962</lpage>
          , Le Grand livre du mois, Paris,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Monaque</surname>
          </string-name>
          , Une histoire de la marine de guerre française,
          <source>Perrin</source>
          , Paris,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Alerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ridgway</surname>
          </string-name>
          ,
          <article-title>Markov and the duchy of savoy: Segmenting a century with regime-switching models</article-title>
          , Journal de la Société Française de Statistique (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>