=Paper=
{{Paper
|id=Vol-2314/paper5
|storemode=property
|title=Towards a quantitative research framework for historical disciplines
|pdfUrl=https://ceur-ws.org/Vol-2314/paper5.pdf
|volume=Vol-2314
|authors=Barbara McGillivray,Jon Wilson,Tobias Blanke
|dblpUrl=https://dblp.org/rec/conf/comhum/McGillivrayWB18
}}
==Towards a quantitative research framework for historical disciplines==
Towards a quantitative research framework for historical disciplines
Barbara McGillivray1 , Jon Wilson2 , Tobias Blanke3
1 The Alan Turing Institute, University of Cambridge, United Kingdom
2 Department of History, King’s College London, United Kingdom
3 Department of Digital Humanities, King’s College London, United Kingdom
bmcgillivray@turing.ac.uk, {jon.wilson, tobias.blanke}@kcl.ac.uk
1 Background and motivation Donald E. Knuth is maybe the most famous god-
father of computer science. For him, “[s]cience is
knowledge which we understand so well that we
The ever-expanding wealth of digital material that can teach it to a computer; and if we don’t fully
researchers have at their disposal today, coupled understand something, it is an art to deal with it.
with growing computing power, makes the use of . . . [T]he process of going from an art to a science
quantitative methods in historical disciplines in- means that we learn how to automate something”
creasingly more viable. However, applying exist- (Knuth, 2007). Computing science is defined by the
ing techniques and tools to historical datasets is not tension to automate processes using digital means
a trivial enterprise (Piotrowski, 2012; McGillivray, and our inability to do so, because we fail to create
2014). Moreover, scholarly communities react dif- fully explicit ways of understanding processes. In
ferently to the idea that new research questions and this sense, a computational approach to collecting
insights can arise from quantitative explorations and processing (historical) evidence would be a
that could not be made using purely qualitative ap- science if we could learn to automate it. Many
proaches. Some of them, such as linguistics (Jenset features of the past can be understood through auto-
and McGillivray, 2017), have been acquainted with mation. Yet, the problematic nature of the relation-
quantitative methods for a longer time. Others, ship between sources and reality and the mutability
such as history, have seen a growth in quantitat- of categories, means it will always rely on a sig-
ive methods on the fringes of the discipline, but nificant degree of human intuition, and cannot be
have not incorporated them into the mainstream of fully automated; computational history is an art in
scholarly practice (Hitchcock, 2013). Knuth’s terms.
Historical disciplines, i.e., those focusing on the The methodological reflections in this paper are
study of the past, possess at least two character- part of an effort to think about how to define the
istics, which set them apart and require careful possibilities and limits of quantification and auto-
consideration in this context: the need to work with mation in historical analysis. Our aim is to as-
closed archives which can only be expanded by sist scholars to take full advantage of quantifica-
working on past records (Mayrhofer, 1980), and tion through a rigorous account of the boundaries
the focus on phenomena that change in a complex between science and art in Knuth’s terms. Building
fashion over time. First, that means historical re- on McGillivray et al. (2018), in this contribution we
search is grounded in empirical sources which are will begin with the framework proposed by Jenset
stable and fixed (one cannot change the archival and McGillivray (2017) for quantitative historical
record). But they are often hard to access and, re- linguistics and illustrate it with two case studies.
cording the language and actions of only a small
fraction of historical reality at any moment, have 2 A quantitative framework for
a complex relationship to the past being studied. historical linguistics
Secondly, the categories through which the past
is studied themselves change, making modelling, Jenset and McGillivray (2017)’s framework is the
and the automation of analysis based on a limited only general framework available for quantitative
number of features in the historical record a fraught historical linguistics. A comparable framework,
enterprise. but more limited in scope, can be found in Köhler
53
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
(2012). Jenset and McGillivray (2017)’s frame- lowing assumptions shared by the community,
work starts from the assumption that linguistic his- other claims, or evidence. A hypothesis origin-
torical reality is lost and the aim of quantitative ates from previous research, intuition, or logical
research is to arrive at models of and claims on arguments, and is “a claim that can be tested em-
such reality which are quantitatively driven from pirically, through statistical hypothesis testing on
evidence and lead to consensus among the schol- corpus data” (Jenset and McGillivray, 2017, 42).
arly community. The scope of application of this In this context, “model” means a formalized rep-
framework is delimited to the cases where quantifi- resentation of a phenomenon, be it statistical or
able evidence (such as n-grams or numerical data) symbolic (Zuidema and de Boer, 2014). Models
can be gathered from primary sources, typically in (including those deriving from hypotheses tested
the form of corpora, i.e., collections of electronic quantitatively against evidence) are research tools
text created with the purpose of linguistic analysis. embedding claims or hypotheses, useful in order to
Jenset and McGillivray (2017) define evidence produce novel claims and hypotheses in turn via “a
in quantitative historical linguistics as the set of continual process of coming to know by manipulat-
“facts or properties that can be observed, independ- ing representations” (McCarty, 2004).
ently accessed, or verified by other researchers” Based on these definitions, Jenset and McGil-
(Jenset and McGillivray, 2017, 39), and thus ex- livray (2017) formalize the research process they
clude intuition as inadmissible as evidence. Such envisage as part of their framework, see Figure 1.
facts can be pre-theoretical (as the fact that the Eng- The process starts from the historical linguistic real-
lish word the is among the most frequent ones) or ity, which we assume to be lost for ever. Any re-
based on some hypotheses or assumptions (as the search model can only aim at approaching this real-
fact that the class of article in English is among the ity without reaching it completely, and quantitative
most frequent ones, which is based on the assump- historical linguistics ultimately will produce mod-
tion that the class of articles groups certain words els of language that are quantitative driven from
together). Quantitative evidence is “based on nu- evidence. The rest of the diagram shows how this
merical or probabilistic observation or inference” is achieved. The historical linguistic reality gave
(Jenset and McGillivray, 2017, 39), and the quanti- rise to a series of primary sources, including docu-
fication should be independently verifiable. On the ments and other (mainly textual) sources, and these
other hand, distributional evidence has the form to secondary sources like grammars and diction-
“x occurs in context y”, where context can consist aries. Based on the knowledge of the language
of words, classes, phonemes, etc. Annotated cor- we gather from these sources we can draft annota-
pora, where linguistic (morphological, syntactic, tion schemes which specify the rules for adding
semantic, etc.) information has been encoded in linguistic information to the corpora and thus ob-
context, are considered as sources of distributional tain annotated corpora. Corpora are the source of
evidence to study phenomena in historical linguist- quantitative distributional evidence which can be
ics. used to test statistical hypotheses, formulated based
on our intuition of the language and on knowledge
Following Carrier (2012), Jenset and McGilliv-
drawn from examples. Such hypotheses can also
ray (2017, 40) define claims as anything that is not
feed into the creation of linguistic models, which
evidence, and statements are based on evidence or
aim to represent the historical linguistic reality.
on other claims. The role of claims in the frame-
work concerns their connection with truth, which
3 Model-building in history
can be stated in categorical terms (as in “the claim
that x belongs to class y is true”) or probabilistic In contrast with quantitative historical linguistics,
terms (e.g., “x belongs to class y with probabil- the discipline of history possesses an extraordinary
ity p). Claims possess a strength proportional to variety of idioms to describe itself, and has much
that of the evidence supporting them. For example, less rigorous analytical vocabulary to describe its
all other things being equal, claims supported by method. Yet there are important similarities, which
large evidence are stronger than claims supported mean Jenset and McGillivray (2017)’s framework
by little evidence. can be translated and modified for use for histor-
Ultimately, research in historical linguistics aims ical research more generally. First, historians as-
at making (hopefully strong) claims logically fol- sume that historical reality is lost, and can only
54
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
deal with fuzzy categories would help overcome
these obstacles.
What’s more, the use of digital data-sets and ap-
plication of quantitative techniques to them allows
historical claims based on the prevalence of cer-
tain features of the past to be empirically tested.
Such claims are central to many forms of histor-
ical argumentation already; about the importance
of particular concepts or practices at specific mo-
ments, for example. Of course such claims need
to be precisely related to the structure of the (di-
Figure 1: Research process from the quantitative gitised) archive; as ever, limitations must be recog-
historical linguistics framework described in Jenset nised. But given the amount of material which can
and McGillivray (2017). Figure modified from be quickly processed, quantification allows claims
Figure 2.1 in Jenset and McGillivray (2017, 45). previously asserted through little more the accumu-
lation of anecdotes to be more rigorously validated.
be understood through traces left in a variety of 4 Languages of power
archives (including human memory). Second, al- The first case study where we apply Jenset and
though historians rarely explicitly talk about con- McGillivray (2017)’s framework considers a re-
structing models, their practice largely consists of cent collaboration between Digital Humanities and
making claims about representations of the past History at King’s College London (Blanke and
which other disciplines would describe in precisely Wilson, 2017), to develop a “materialist sociology
such terms. From the process they describe as the of political texts” following Moretti’s ideas of dis-
‘interpretation’ or ‘analysis’ (Tosh, 2015) of the tant reading (Moretti, 2013). The project worked
sources, historians create representations which re- on a corpus of post-1945 UK government White
duce the vast complexity of historical reality to a Papers to map connections and similarities in polit-
few limited, stylised characteristics; Max Weber’s ical language from 1945 to 2010. As the corpus
Protestant Ethic, Lewis Namier’s system of fac- is time-indexed, a quantitative analysis traced the
tional interest or C.A. Bayly’s great uniformity. changing shape of political language, by tracking
Third, these representations are used to make hy- clusters of terms relating to particular concepts and
potheses and claims about change over time of charting the changing meaning of words. Creating
different kinds. These might be about about the the distributional quantitative evidence involved
endurance or rupture of certain key feature in a text pre-processing to create a term-document mat-
particular sphere of activity, or about the forces rix. Using natural language processing libraries,
responsible for causing a particular event or set of this was annotated with grammatical information,
processes process, for example. as well as with a number of dictionaries that reflec-
We have suggested that history is (if implicitly) ted facets such as sentiment, ambiguity and so on.
essentially a model-building enterprise. That al- These allowed the project to use models for histor-
lows many of the hypotheses which historians de- ical texts which not only read the texts themselves
velop to be theoretically amenable to quantification. but also to developed ways of classifying them into
The use of quantitative methods (in particular us- time intervals. More advanced techniques were ap-
ing the analysis of textual corpora) has increased plied to trace changes of meaning in key political
recently (Guldi and Armitage, 2014). But, most concepts across time intervals, using topic models
historians are reluctant to quantify because they are and word embeddings, allowing historiographical
skeptical about formalising their models, believ- and linguistic hypotheses to be tested.
ing that to do so would imply their possessing a In Jenset and McGillivray (2017)’s terms, these
degree of categorical rigidity unwarranted by the various techniques produced a variety of different
complexity of the past. We suggest that more ex- quantitative distributional evidence, which allowed
plicit reflection on method, and engagement with a series of hypotheses to be developed and tested.
other fields (such as historical linguistics) which Intuition, often developed from historical research
55
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
using non-quantitative techniques, had an import- terms. There is significant work to be done devel-
ant role in framing hypotheses. But quantitative oping ways to visually represent the quantitative
evidence was able to impart greater clarity and features of any corpus of texts.
specificity to intuitional hypotheses, often closing
down multiple possibilities. For example, using 5 Predicting the Past
our dictionaries demonstrated a major break in the
language of White Papers in the mid-1960s, around Digital humanities generally use computational
the election of Harold Wilson’s Labour government. modelling for exploratory data analysis. Digital
While this intuitionally made sense, so would a humanities makes use of the advancements in the
break in the early 1980s, which we did not find, abilities to visualise and interactively explore in
instead seeing a rupture in the early 1990s. a relatively free fashion. Recently, we have wit-
nessed the emergence of new combinations of ex-
Combining our chronological analysis with topic
ploratory data analysis with statistical evidence for
modelling and word embeddings allowed us to
discovered patterns. In the digital humanities, this
build a series of models of the predominant con-
is popular, too, if Klingenstein et al. (2014), for
cerns and the structure of political language in
instance, integrats a historical regression analysis
each epoch. In line with In Jenset and McGilli-
into their data visualisations. Our first example
vray (2017)’s framework, these models were built
above is an instance of exploratory data analysis,
from iteratively generating and testing hypotheses.
using topic modelling and other tools to provide
For example, we tested the frequency of different
statistical evidence for underlying trends in the doc-
term clusters generated through topic modelling,
uments, as earlier demonstrated. Models, however,
and the terms whose embedding changed most dra-
often have another purpose beyond the exploration
matically between each epoch.
of data. They are part of predictive analytics. Ab-
Our process of hypothesis generation and test- bott (2014) is one of the most famous practitioners
ing always had in mind the commonplace assump- in the field. For him, predictive analytics work on
tions made by historians using non-quantitative “discovering interesting and meaningful patterns
techniques in the field. In many respectives, quant- in data. It draws from several related disciplines,
itative distributional evidence produced hypotheses some of which have been used to discover patterns
at variance with those scholarly norms. For ex- in data for more than 100 years, including pattern
ample, we found White Papers in the period from recognition, statistics, machine learning, artificial
1945 to 1964 to be dominated by post-war foreign intelligence, and data mining.” (Abbott, 2014).
policy concerns, not the construction of the welfare It is a common misunderstanding to reduce pre-
state; economic language was being dominant in dictive analytics to attempts to predicting the future.
the period from 1965-1990 not afterwards; and ‘the It is rather about developing meaningful relation-
state’ as a political agent is more important in the ships in any data. Predictive analytics compared
later period than before. to traditional analytics is driven by the data un-
Yet, as challenging as they may be to much of der observation rather than primarily by human
the historiography of post-war Britain, the form assumptions on the data. Its discipline strives to
of these hypotheses is very similar to the form of automate the modelling and finding patterns as far
the claims made in standard historical argumenta- as this is possible. In this sense, it moves away
tion; there is no dramatic epistemological leap in from both exploratory and confirmatory data ana-
the type of knowledge being produced. Although lysis, as it fully considers how computers would
our models were developed using automated tech- process evidence.
niques, they can be verified qualitatively in the O’Neil and Schutt (2013) introduce the idea of
same way as non-quantifiable claims, through quo- predicting the past, which is used to model the
tation, and the interpretation of words and phrases effects of electronic health records (EHR) and to
in specific contexts. set up new monitoring programs for drugs. For
One important finding is the need to recognise O’Neil and Schutt (2013), these integrated datasets
the broad range of different ways in which quant- were the foundations of novel research attempts to
itative analysis can be expressed. It is important, predict the past. They cite the ‘Observational Med-
for example, to indicate the absolute frequency of ical Outcomes Partnership (OMOP)’ in the US that
terms in any series as well as their relation to other investigates how good we are at predicting what
56
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
we already know about drug performance in health ing data too closely, which negatively impacts its
using past datasets. Once OMOP had integrated ability to generalize to new cases. We perform
data from heterogeneous sources, it began to look extensive cross-validations to avoid over-fitting.
into predicting the past of old drug cases and how Predicting the past, however, differs significantly
effective their treatments were. “Employing a vari- from other approaches, as the model is not prepared
ety of approaches from the fields of epidemiology, for future addition of data but to analyse existing
statistics, computer science, and elsewhere, OMOP data. The aim is to understand which (minimal)
seeks to answer a critical challenge: what can med- set of features makes it likely that observation x
ical researchers learn from assessing these new includes feature y. In Blanke (2018), we aimed to
health databases, could a single approach be ap- understand which combination of features make it
plied to multiple diseases, and could their findings likely that a historical person is of gender female,
be proven?” (O’Neil and Schutt, 2013). Predicting male or unknown. The next step in our method-
the past thus tries to understand how “well the cur- ology is therefore to apply the best performing
rent methods do on predicting things we actually models to the whole data set again to analyse what
already know” (O’Neil and Schutt, 2013). gender determinations exist in the data. Is it, e.g.,
Such a novel approach relating to past data sets more likely that vagrants were female in London?
should be of interest to the digital history. Digital The common approaches to gender prediction
history could use the approach to control decisions in the digital humanities uses predefined dictionar-
on how we organise and divide historical records. ies of first names and then matches the gender of
An existing example that implies predicting past individuals against this dictionary. This has firstly
events by joining historical data sets, is the identific- the problem that these dictionaries are heavily de-
ation of historical spatio-temporal patterns of IED pendent on culture and language they relate to. But
usage by the Provisional Irish Republican Army this is not the only issue, as dictionary-based ap-
during ‘The Troubles’, used to attribute ‘historical proaches secondly also assume that errors are ran-
behaviour of terrorism’ (Tench et al., 2016). domly distributed. Gender trouble is simply a prob-
lem of not recording the right gender in the data.
In Blanke (2018), we demonstrate how predict-
Our predictive analytics approach in Blanke (2018)
ing the past can complement and enhance existing
on the other hand does not make this assumption
work in the digital humanities that is mainly con-
in advance and judges gender based on the existing
centrated on exploring gender issues as they appear
data. This has led in turn to interesting insights
in past datasets. Blevins and Mullen (2015) provide
on why certain genders remain unknown to the
an expert introduction into why digital humanities
models.
should be interested in predicting genders. Gender
values are often missing from datasets and need to In summary, predicting the past is based firstly
be imputed. Predictive analytics can be seen as a on going through all traditional predictive analytics
corrective to existing data practices and we can pre- steps to form a stable model that reflects the under-
dict the genders in a dataset. In Blanke (2018), we lying historical evidence close enough but also does
compare a traditional dictionary-based approach not overfit. Secondly, we use this stable model to
with two machine learning strategies. First a classi- algorithmically analyse historical evidence to gain
fication algorithm is discussed and then three dif- insights on how a computer would see the relations
ferent rule-based learners are introduced. We can of evidence.
demonstrate how these rule-based learners are an
effective alternative to the traditional dictionary-
6 Conclusion and future work
based method and partly outperform it. This comparison leads us to the conclusion that,
Blanke (2018) develops the predicting the past despite the broad applicability of Jenset and Mc-
methodology further and present differences from Gillivray (2017)’s framework in both cases, some
other predictive analytics approaches. We follow important differences emerge between historical
all the steps of traditional predictive analytics to linguistics and history. We discuss two. First of
prepare a stable and reliable model, where we pay all, the scope of primary source and its quantitat-
particular attention to avoid overfitting the data, ive representation is broader in history, including
one of the main risks in predictive models. An not only distributional but also categorical, ordinal,
‘overfitting’ model is one that models existing train- and numerical evidence. History requires careful
57
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
discernment of which is most appropriate, and how Jenset, Gard B. and Barbara McGillivray (2017).
they should be combined. Quantitative Historical Linguistics. A Corpus Frame-
work. Oxford: Oxford University Press.
Secondly, the scope for a purely quantitative
approach is less broad: quantitative evidence and Klingenstein, Sara, Tim Hitchcock, and Simon DeDeo
models can often only contribute to inform hypo- (2014). The civilizing process in London’s Old Bailey.
theses and claims which rely on qualitative evid- Proceedings of the National Academy of Sciences, 111
(26):9419–9424. doi:10.1073/pnas.1405984111.
ence and methods. Often it seems that quantitative
methods are only accepted by historical scholars Knuth, Donald E. (2007). Computer programming as
if the claims developed by automated techniques an art. In ACM Turing award lectures, page 1974.
ACM.
can also be verified qualitatively, through anecdote,
quotation and so on. In many fields quantification Köhler, Reinhard (2012). Quantitative Syntax Analysis.
can be accepted because it creates results which Berlin: de Gruyter Mouton.
look similar to those produced by qualitative re- Mayrhofer, Manfred (1980). Zur Gestaltung des etymo-
search. But this approach limits the development logischen Wörterbuchs einer “Großcorpus-Sprache”.
of methods that use quantification to do more than Wien: Österr. Akademie der Wissenschaften, phil.-hist.
simply re-frame qualitative observations, and in- Klasse.
stead make statistical arguments about aggregate McCarty, Willard (2004). Modeling: A study in words
behaviour in its own right. In the future, we plan and meanings. In Susan Schreibman, Ray Siemens,
to develop these insights further, in order to build and John Unsworth, eds., A Companion to Digital Hu-
a more comprehensive research framework which manities, pages 254–270. Malden, MA: Blackwell.
integrates qualitative and quantitative approaches. McGillivray, Barbara (2014). Methods in Latin Com-
putational Linguistics. Leiden: Brill.
Acknowledgments
McGillivray, Barbara, Giovanni Colavizza, and Tobias
This work was supported by The Alan Turing In- Blanke (2018). Towards a quantitative research frame-
stitute under the EPSRC grant EP/N510129/1. BM work for historical disciplines. In COMHUM 2018:
Book of Abstracts for the Workshop on Computational
is supported by the Turing award TU/A/000010
Methods in the Humanities 2018. Lausanne, Switzer-
(RG88751). land.
Moretti, Franco (2013). Distant Reading. London:
References Verso.
Abbott, Dean (2014). Applied predictive analytics: O’Neil, Cathy and Rachel Schutt (2013). Doing data
Principles and techniques for the professional data ana- science: Straight talk from the frontline. Sebastopol,
lyst. Hoboken, NJ: Wiley. CA: O’Reilly.
Blanke, Tobias (2018). Predicting the past. Digital Piotrowski, Michael (2012). Natural language pro-
Humanities Quarterly, 12(2). cessing for historical texts. San Rafael, CA: Morgan
& Claypool.
Blanke, Tobias and Jon Wilson (2017). Identifying
epochs in text archives. In 2017 IEEE International Tench, Stephen, Hannah Fry, and Paul Gill (2016).
Conference on Big Data (Big Data), pages 2219–2224. Spatio-temporal patterns of IED usage by the Provi-
sional Irish Republican Army. European Journal of
Blevins, Cameron and Lincoln Mullen (2015). Jane,
Applied Mathematics, 27(3):377–402.
John. . . Leslie? A historical method for algorithmic
gender prediction. Digital Humanities Quarterly, 9(3). Tosh, John (2015). The Pursuit of History. Aims, Meth-
Carrier, Richard (2012). Proving history: Bayes’s the- ods and New Directions in the Study of History. Lon-
orem and the quest for the historical Jesus. Amherst, don: Routledge, sixth ed.
NY: Prometheus Books. Zuidema, Willem and Bart de Boer (2014). Modeling
Guldi, Jo and David Armitage (2014). The History in the language sciences. In Robert J. Podesva and
Manifesto. Cambridge: Cambridge University Press. Devyani Sharma, eds., Research Methods in Linguist-
ics, pages 428–445. Cambridge: Cambridge University
Hitchcock, Tim (2013). Confronting the digital: Or Press.
how academic history writing lost the plot. Cultural
and Social History, 10(1):9–23.
58