=Paper=
{{Paper
|id=Vol-2314/paper5
|storemode=property
|title=Towards a quantitative research framework for historical disciplines
|pdfUrl=https://ceur-ws.org/Vol-2314/paper5.pdf
|volume=Vol-2314
|authors=Barbara McGillivray,Jon Wilson,Tobias Blanke
|dblpUrl=https://dblp.org/rec/conf/comhum/McGillivrayWB18
}}
==Towards a quantitative research framework for historical disciplines==
<pdf width="1500px">https://ceur-ws.org/Vol-2314/paper5.pdf</pdf>
<pre>
    Towards a quantitative research framework for historical disciplines

                      Barbara McGillivray1 , Jon Wilson2 , Tobias Blanke3
              1 The Alan Turing Institute, University of Cambridge, United Kingdom
                 2 Department of History, King’s College London, United Kingdom
          3 Department of Digital Humanities, King’s College London, United Kingdom

           bmcgillivray@turing.ac.uk, {jon.wilson, tobias.blanke}@kcl.ac.uk


1   Background and motivation                                      Donald E. Knuth is maybe the most famous god-
                                                              father of computer science. For him, “[s]cience is
                                                              knowledge which we understand so well that we
The ever-expanding wealth of digital material that            can teach it to a computer; and if we don’t fully
researchers have at their disposal today, coupled             understand something, it is an art to deal with it.
with growing computing power, makes the use of                . . . [T]he process of going from an art to a science
quantitative methods in historical disciplines in-            means that we learn how to automate something”
creasingly more viable. However, applying exist-              (Knuth, 2007). Computing science is defined by the
ing techniques and tools to historical datasets is not        tension to automate processes using digital means
a trivial enterprise (Piotrowski, 2012; McGillivray,          and our inability to do so, because we fail to create
2014). Moreover, scholarly communities react dif-             fully explicit ways of understanding processes. In
ferently to the idea that new research questions and          this sense, a computational approach to collecting
insights can arise from quantitative explorations             and processing (historical) evidence would be a
that could not be made using purely qualitative ap-           science if we could learn to automate it. Many
proaches. Some of them, such as linguistics (Jenset           features of the past can be understood through auto-
and McGillivray, 2017), have been acquainted with             mation. Yet, the problematic nature of the relation-
quantitative methods for a longer time. Others,               ship between sources and reality and the mutability
such as history, have seen a growth in quantitat-             of categories, means it will always rely on a sig-
ive methods on the fringes of the discipline, but             nificant degree of human intuition, and cannot be
have not incorporated them into the mainstream of             fully automated; computational history is an art in
scholarly practice (Hitchcock, 2013).                         Knuth’s terms.
   Historical disciplines, i.e., those focusing on the             The methodological reflections in this paper are
study of the past, possess at least two character-            part of an effort to think about how to define the
istics, which set them apart and require careful              possibilities and limits of quantification and auto-
consideration in this context: the need to work with          mation in historical analysis. Our aim is to as-
closed archives which can only be expanded by                 sist scholars to take full advantage of quantifica-
working on past records (Mayrhofer, 1980), and                tion through a rigorous account of the boundaries
the focus on phenomena that change in a complex               between science and art in Knuth’s terms. Building
fashion over time. First, that means historical re-           on McGillivray et al. (2018), in this contribution we
search is grounded in empirical sources which are             will begin with the framework proposed by Jenset
stable and fixed (one cannot change the archival              and McGillivray (2017) for quantitative historical
record). But they are often hard to access and, re-           linguistics and illustrate it with two case studies.
cording the language and actions of only a small
fraction of historical reality at any moment, have            2   A quantitative framework for
a complex relationship to the past being studied.                 historical linguistics
Secondly, the categories through which the past
is studied themselves change, making modelling,               Jenset and McGillivray (2017)’s framework is the
and the automation of analysis based on a limited             only general framework available for quantitative
number of features in the historical record a fraught         historical linguistics. A comparable framework,
enterprise.                                                   but more limited in scope, can be found in Köhler

                                                         53
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


(2012). Jenset and McGillivray (2017)’s frame-                 lowing assumptions shared by the community,
work starts from the assumption that linguistic his-           other claims, or evidence. A hypothesis origin-
torical reality is lost and the aim of quantitative            ates from previous research, intuition, or logical
research is to arrive at models of and claims on               arguments, and is “a claim that can be tested em-
such reality which are quantitatively driven from              pirically, through statistical hypothesis testing on
evidence and lead to consensus among the schol-                corpus data” (Jenset and McGillivray, 2017, 42).
arly community. The scope of application of this               In this context, “model” means a formalized rep-
framework is delimited to the cases where quantifi-            resentation of a phenomenon, be it statistical or
able evidence (such as n-grams or numerical data)              symbolic (Zuidema and de Boer, 2014). Models
can be gathered from primary sources, typically in             (including those deriving from hypotheses tested
the form of corpora, i.e., collections of electronic           quantitatively against evidence) are research tools
text created with the purpose of linguistic analysis.          embedding claims or hypotheses, useful in order to
   Jenset and McGillivray (2017) define evidence               produce novel claims and hypotheses in turn via “a
in quantitative historical linguistics as the set of           continual process of coming to know by manipulat-
“facts or properties that can be observed, independ-           ing representations” (McCarty, 2004).
ently accessed, or verified by other researchers”                 Based on these definitions, Jenset and McGil-
(Jenset and McGillivray, 2017, 39), and thus ex-               livray (2017) formalize the research process they
clude intuition as inadmissible as evidence. Such              envisage as part of their framework, see Figure 1.
facts can be pre-theoretical (as the fact that the Eng-        The process starts from the historical linguistic real-
lish word the is among the most frequent ones) or              ity, which we assume to be lost for ever. Any re-
based on some hypotheses or assumptions (as the                search model can only aim at approaching this real-
fact that the class of article in English is among the         ity without reaching it completely, and quantitative
most frequent ones, which is based on the assump-              historical linguistics ultimately will produce mod-
tion that the class of articles groups certain words           els of language that are quantitative driven from
together). Quantitative evidence is “based on nu-              evidence. The rest of the diagram shows how this
merical or probabilistic observation or inference”             is achieved. The historical linguistic reality gave
(Jenset and McGillivray, 2017, 39), and the quanti-            rise to a series of primary sources, including docu-
fication should be independently verifiable. On the            ments and other (mainly textual) sources, and these
other hand, distributional evidence has the form               to secondary sources like grammars and diction-
“x occurs in context y”, where context can consist             aries. Based on the knowledge of the language
of words, classes, phonemes, etc. Annotated cor-               we gather from these sources we can draft annota-
pora, where linguistic (morphological, syntactic,              tion schemes which specify the rules for adding
semantic, etc.) information has been encoded in                linguistic information to the corpora and thus ob-
context, are considered as sources of distributional           tain annotated corpora. Corpora are the source of
evidence to study phenomena in historical linguist-            quantitative distributional evidence which can be
ics.                                                           used to test statistical hypotheses, formulated based
                                                               on our intuition of the language and on knowledge
   Following Carrier (2012), Jenset and McGilliv-
                                                               drawn from examples. Such hypotheses can also
ray (2017, 40) define claims as anything that is not
                                                               feed into the creation of linguistic models, which
evidence, and statements are based on evidence or
                                                               aim to represent the historical linguistic reality.
on other claims. The role of claims in the frame-
work concerns their connection with truth, which
                                                               3   Model-building in history
can be stated in categorical terms (as in “the claim
that x belongs to class y is true”) or probabilistic           In contrast with quantitative historical linguistics,
terms (e.g., “x belongs to class y with probabil-              the discipline of history possesses an extraordinary
ity p). Claims possess a strength proportional to              variety of idioms to describe itself, and has much
that of the evidence supporting them. For example,             less rigorous analytical vocabulary to describe its
all other things being equal, claims supported by              method. Yet there are important similarities, which
large evidence are stronger than claims supported              mean Jenset and McGillivray (2017)’s framework
by little evidence.                                            can be translated and modified for use for histor-
   Ultimately, research in historical linguistics aims         ical research more generally. First, historians as-
at making (hopefully strong) claims logically fol-             sume that historical reality is lost, and can only

                                                          54
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


                                                             deal with fuzzy categories would help overcome
                                                             these obstacles.
                                                                What’s more, the use of digital data-sets and ap-
                                                             plication of quantitative techniques to them allows
                                                             historical claims based on the prevalence of cer-
                                                             tain features of the past to be empirically tested.
                                                             Such claims are central to many forms of histor-
                                                             ical argumentation already; about the importance
                                                             of particular concepts or practices at specific mo-
                                                             ments, for example. Of course such claims need
                                                             to be precisely related to the structure of the (di-
Figure 1: Research process from the quantitative             gitised) archive; as ever, limitations must be recog-
historical linguistics framework described in Jenset         nised. But given the amount of material which can
and McGillivray (2017). Figure modified from                 be quickly processed, quantification allows claims
Figure 2.1 in Jenset and McGillivray (2017, 45).             previously asserted through little more the accumu-
                                                             lation of anecdotes to be more rigorously validated.

be understood through traces left in a variety of            4   Languages of power
archives (including human memory). Second, al-               The first case study where we apply Jenset and
though historians rarely explicitly talk about con-          McGillivray (2017)’s framework considers a re-
structing models, their practice largely consists of         cent collaboration between Digital Humanities and
making claims about representations of the past              History at King’s College London (Blanke and
which other disciplines would describe in precisely          Wilson, 2017), to develop a “materialist sociology
such terms. From the process they describe as the            of political texts” following Moretti’s ideas of dis-
‘interpretation’ or ‘analysis’ (Tosh, 2015) of the           tant reading (Moretti, 2013). The project worked
sources, historians create representations which re-         on a corpus of post-1945 UK government White
duce the vast complexity of historical reality to a          Papers to map connections and similarities in polit-
few limited, stylised characteristics; Max Weber’s           ical language from 1945 to 2010. As the corpus
Protestant Ethic, Lewis Namier’s system of fac-              is time-indexed, a quantitative analysis traced the
tional interest or C.A. Bayly’s great uniformity.            changing shape of political language, by tracking
Third, these representations are used to make hy-            clusters of terms relating to particular concepts and
potheses and claims about change over time of                charting the changing meaning of words. Creating
different kinds. These might be about about the              the distributional quantitative evidence involved
endurance or rupture of certain key feature in a             text pre-processing to create a term-document mat-
particular sphere of activity, or about the forces           rix. Using natural language processing libraries,
responsible for causing a particular event or set of         this was annotated with grammatical information,
processes process, for example.                              as well as with a number of dictionaries that reflec-
   We have suggested that history is (if implicitly)         ted facets such as sentiment, ambiguity and so on.
essentially a model-building enterprise. That al-            These allowed the project to use models for histor-
lows many of the hypotheses which historians de-             ical texts which not only read the texts themselves
velop to be theoretically amenable to quantification.        but also to developed ways of classifying them into
The use of quantitative methods (in particular us-           time intervals. More advanced techniques were ap-
ing the analysis of textual corpora) has increased           plied to trace changes of meaning in key political
recently (Guldi and Armitage, 2014). But, most               concepts across time intervals, using topic models
historians are reluctant to quantify because they are        and word embeddings, allowing historiographical
skeptical about formalising their models, believ-            and linguistic hypotheses to be tested.
ing that to do so would imply their possessing a                In Jenset and McGillivray (2017)’s terms, these
degree of categorical rigidity unwarranted by the            various techniques produced a variety of different
complexity of the past. We suggest that more ex-             quantitative distributional evidence, which allowed
plicit reflection on method, and engagement with             a series of hypotheses to be developed and tested.
other fields (such as historical linguistics) which          Intuition, often developed from historical research

                                                        55
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


using non-quantitative techniques, had an import-             terms. There is significant work to be done devel-
ant role in framing hypotheses. But quantitative              oping ways to visually represent the quantitative
evidence was able to impart greater clarity and               features of any corpus of texts.
specificity to intuitional hypotheses, often closing
down multiple possibilities. For example, using               5   Predicting the Past
our dictionaries demonstrated a major break in the
language of White Papers in the mid-1960s, around             Digital humanities generally use computational
the election of Harold Wilson’s Labour government.            modelling for exploratory data analysis. Digital
While this intuitionally made sense, so would a               humanities makes use of the advancements in the
break in the early 1980s, which we did not find,              abilities to visualise and interactively explore in
instead seeing a rupture in the early 1990s.                  a relatively free fashion. Recently, we have wit-
                                                              nessed the emergence of new combinations of ex-
   Combining our chronological analysis with topic
                                                              ploratory data analysis with statistical evidence for
modelling and word embeddings allowed us to
                                                              discovered patterns. In the digital humanities, this
build a series of models of the predominant con-
                                                              is popular, too, if Klingenstein et al. (2014), for
cerns and the structure of political language in
                                                              instance, integrats a historical regression analysis
each epoch. In line with In Jenset and McGilli-
                                                              into their data visualisations. Our first example
vray (2017)’s framework, these models were built
                                                              above is an instance of exploratory data analysis,
from iteratively generating and testing hypotheses.
                                                              using topic modelling and other tools to provide
For example, we tested the frequency of different
                                                              statistical evidence for underlying trends in the doc-
term clusters generated through topic modelling,
                                                              uments, as earlier demonstrated. Models, however,
and the terms whose embedding changed most dra-
                                                              often have another purpose beyond the exploration
matically between each epoch.
                                                              of data. They are part of predictive analytics. Ab-
   Our process of hypothesis generation and test-             bott (2014) is one of the most famous practitioners
ing always had in mind the commonplace assump-                in the field. For him, predictive analytics work on
tions made by historians using non-quantitative               “discovering interesting and meaningful patterns
techniques in the field. In many respectives, quant-          in data. It draws from several related disciplines,
itative distributional evidence produced hypotheses           some of which have been used to discover patterns
at variance with those scholarly norms. For ex-               in data for more than 100 years, including pattern
ample, we found White Papers in the period from               recognition, statistics, machine learning, artificial
1945 to 1964 to be dominated by post-war foreign              intelligence, and data mining.” (Abbott, 2014).
policy concerns, not the construction of the welfare             It is a common misunderstanding to reduce pre-
state; economic language was being dominant in                dictive analytics to attempts to predicting the future.
the period from 1965-1990 not afterwards; and ‘the            It is rather about developing meaningful relation-
state’ as a political agent is more important in the          ships in any data. Predictive analytics compared
later period than before.                                     to traditional analytics is driven by the data un-
   Yet, as challenging as they may be to much of              der observation rather than primarily by human
the historiography of post-war Britain, the form              assumptions on the data. Its discipline strives to
of these hypotheses is very similar to the form of            automate the modelling and finding patterns as far
the claims made in standard historical argumenta-             as this is possible. In this sense, it moves away
tion; there is no dramatic epistemological leap in            from both exploratory and confirmatory data ana-
the type of knowledge being produced. Although                lysis, as it fully considers how computers would
our models were developed using automated tech-               process evidence.
niques, they can be verified qualitatively in the                O’Neil and Schutt (2013) introduce the idea of
same way as non-quantifiable claims, through quo-             predicting the past, which is used to model the
tation, and the interpretation of words and phrases           effects of electronic health records (EHR) and to
in specific contexts.                                         set up new monitoring programs for drugs. For
   One important finding is the need to recognise             O’Neil and Schutt (2013), these integrated datasets
the broad range of different ways in which quant-             were the foundations of novel research attempts to
itative analysis can be expressed. It is important,           predict the past. They cite the ‘Observational Med-
for example, to indicate the absolute frequency of            ical Outcomes Partnership (OMOP)’ in the US that
terms in any series as well as their relation to other        investigates how good we are at predicting what

                                                         56
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


we already know about drug performance in health                 ing data too closely, which negatively impacts its
using past datasets. Once OMOP had integrated                    ability to generalize to new cases. We perform
data from heterogeneous sources, it began to look                extensive cross-validations to avoid over-fitting.
into predicting the past of old drug cases and how                  Predicting the past, however, differs significantly
effective their treatments were. “Employing a vari-              from other approaches, as the model is not prepared
ety of approaches from the fields of epidemiology,               for future addition of data but to analyse existing
statistics, computer science, and elsewhere, OMOP                data. The aim is to understand which (minimal)
seeks to answer a critical challenge: what can med-              set of features makes it likely that observation x
ical researchers learn from assessing these new                  includes feature y. In Blanke (2018), we aimed to
health databases, could a single approach be ap-                 understand which combination of features make it
plied to multiple diseases, and could their findings             likely that a historical person is of gender female,
be proven?” (O’Neil and Schutt, 2013). Predicting                male or unknown. The next step in our method-
the past thus tries to understand how “well the cur-             ology is therefore to apply the best performing
rent methods do on predicting things we actually                 models to the whole data set again to analyse what
already know” (O’Neil and Schutt, 2013).                         gender determinations exist in the data. Is it, e.g.,
   Such a novel approach relating to past data sets              more likely that vagrants were female in London?
should be of interest to the digital history. Digital               The common approaches to gender prediction
history could use the approach to control decisions              in the digital humanities uses predefined dictionar-
on how we organise and divide historical records.                ies of first names and then matches the gender of
An existing example that implies predicting past                 individuals against this dictionary. This has firstly
events by joining historical data sets, is the identific-        the problem that these dictionaries are heavily de-
ation of historical spatio-temporal patterns of IED              pendent on culture and language they relate to. But
usage by the Provisional Irish Republican Army                   this is not the only issue, as dictionary-based ap-
during ‘The Troubles’, used to attribute ‘historical             proaches secondly also assume that errors are ran-
behaviour of terrorism’ (Tench et al., 2016).                    domly distributed. Gender trouble is simply a prob-
                                                                 lem of not recording the right gender in the data.
   In Blanke (2018), we demonstrate how predict-
                                                                 Our predictive analytics approach in Blanke (2018)
ing the past can complement and enhance existing
                                                                 on the other hand does not make this assumption
work in the digital humanities that is mainly con-
                                                                 in advance and judges gender based on the existing
centrated on exploring gender issues as they appear
                                                                 data. This has led in turn to interesting insights
in past datasets. Blevins and Mullen (2015) provide
                                                                 on why certain genders remain unknown to the
an expert introduction into why digital humanities
                                                                 models.
should be interested in predicting genders. Gender
values are often missing from datasets and need to                  In summary, predicting the past is based firstly
be imputed. Predictive analytics can be seen as a                on going through all traditional predictive analytics
corrective to existing data practices and we can pre-            steps to form a stable model that reflects the under-
dict the genders in a dataset. In Blanke (2018), we              lying historical evidence close enough but also does
compare a traditional dictionary-based approach                  not overfit. Secondly, we use this stable model to
with two machine learning strategies. First a classi-            algorithmically analyse historical evidence to gain
fication algorithm is discussed and then three dif-              insights on how a computer would see the relations
ferent rule-based learners are introduced. We can                of evidence.
demonstrate how these rule-based learners are an
effective alternative to the traditional dictionary-
                                                                 6   Conclusion and future work
based method and partly outperform it.                           This comparison leads us to the conclusion that,
    Blanke (2018) develops the predicting the past               despite the broad applicability of Jenset and Mc-
 methodology further and present differences from                Gillivray (2017)’s framework in both cases, some
 other predictive analytics approaches. We follow                important differences emerge between historical
 all the steps of traditional predictive analytics to            linguistics and history. We discuss two. First of
 prepare a stable and reliable model, where we pay               all, the scope of primary source and its quantitat-
 particular attention to avoid overfitting the data,             ive representation is broader in history, including
 one of the main risks in predictive models. An                  not only distributional but also categorical, ordinal,
‘overfitting’ model is one that models existing train-           and numerical evidence. History requires careful

                                                            57
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


discernment of which is most appropriate, and how               Jenset, Gard B. and Barbara McGillivray (2017).
they should be combined.                                        Quantitative Historical Linguistics. A Corpus Frame-
                                                                work. Oxford: Oxford University Press.
   Secondly, the scope for a purely quantitative
approach is less broad: quantitative evidence and               Klingenstein, Sara, Tim Hitchcock, and Simon DeDeo
models can often only contribute to inform hypo-                (2014). The civilizing process in London’s Old Bailey.
theses and claims which rely on qualitative evid-               Proceedings of the National Academy of Sciences, 111
                                                                (26):9419–9424. doi:10.1073/pnas.1405984111.
ence and methods. Often it seems that quantitative
methods are only accepted by historical scholars                Knuth, Donald E. (2007). Computer programming as
if the claims developed by automated techniques                 an art. In ACM Turing award lectures, page 1974.
                                                                ACM.
can also be verified qualitatively, through anecdote,
quotation and so on. In many fields quantification              Köhler, Reinhard (2012). Quantitative Syntax Analysis.
can be accepted because it creates results which                Berlin: de Gruyter Mouton.
look similar to those produced by qualitative re-               Mayrhofer, Manfred (1980). Zur Gestaltung des etymo-
search. But this approach limits the development                logischen Wörterbuchs einer “Großcorpus-Sprache”.
of methods that use quantification to do more than              Wien: Österr. Akademie der Wissenschaften, phil.-hist.
simply re-frame qualitative observations, and in-               Klasse.
stead make statistical arguments about aggregate                McCarty, Willard (2004). Modeling: A study in words
behaviour in its own right. In the future, we plan              and meanings. In Susan Schreibman, Ray Siemens,
to develop these insights further, in order to build            and John Unsworth, eds., A Companion to Digital Hu-
a more comprehensive research framework which                   manities, pages 254–270. Malden, MA: Blackwell.
integrates qualitative and quantitative approaches.             McGillivray, Barbara (2014). Methods in Latin Com-
                                                                putational Linguistics. Leiden: Brill.
Acknowledgments
                                                                McGillivray, Barbara, Giovanni Colavizza, and Tobias
This work was supported by The Alan Turing In-                  Blanke (2018). Towards a quantitative research frame-
stitute under the EPSRC grant EP/N510129/1. BM                  work for historical disciplines. In COMHUM 2018:
                                                                Book of Abstracts for the Workshop on Computational
is supported by the Turing award TU/A/000010
                                                                Methods in the Humanities 2018. Lausanne, Switzer-
(RG88751).                                                      land.
                                                                Moretti, Franco (2013). Distant Reading. London:
References                                                      Verso.
Abbott, Dean (2014). Applied predictive analytics:              O’Neil, Cathy and Rachel Schutt (2013). Doing data
Principles and techniques for the professional data ana-        science: Straight talk from the frontline. Sebastopol,
lyst. Hoboken, NJ: Wiley.                                       CA: O’Reilly.
Blanke, Tobias (2018). Predicting the past. Digital             Piotrowski, Michael (2012). Natural language pro-
Humanities Quarterly, 12(2).                                    cessing for historical texts. San Rafael, CA: Morgan
                                                                & Claypool.
Blanke, Tobias and Jon Wilson (2017). Identifying
epochs in text archives. In 2017 IEEE International             Tench, Stephen, Hannah Fry, and Paul Gill (2016).
Conference on Big Data (Big Data), pages 2219–2224.             Spatio-temporal patterns of IED usage by the Provi-
                                                                sional Irish Republican Army. European Journal of
Blevins, Cameron and Lincoln Mullen (2015). Jane,
                                                                Applied Mathematics, 27(3):377–402.
John. . . Leslie? A historical method for algorithmic
gender prediction. Digital Humanities Quarterly, 9(3).          Tosh, John (2015). The Pursuit of History. Aims, Meth-
Carrier, Richard (2012). Proving history: Bayes’s the-          ods and New Directions in the Study of History. Lon-
orem and the quest for the historical Jesus. Amherst,           don: Routledge, sixth ed.
NY: Prometheus Books.                                           Zuidema, Willem and Bart de Boer (2014). Modeling
Guldi, Jo and David Armitage (2014). The History                in the language sciences. In Robert J. Podesva and
Manifesto. Cambridge: Cambridge University Press.               Devyani Sharma, eds., Research Methods in Linguist-
                                                                ics, pages 428–445. Cambridge: Cambridge University
Hitchcock, Tim (2013). Confronting the digital: Or              Press.
how academic history writing lost the plot. Cultural
and Social History, 10(1):9–23.


                                                           58

</pre>