<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a quantitative research framework for historical disciplines Barbara McGillivray1, Jon Wilson2, Tobias Blanke3</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barbara McGillivray</string-name>
          <email>bmcgillivray@turing.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jon Wilson</string-name>
          <email>jon.wilson@kcl.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Blanke</string-name>
          <email>tobias.blanke@kcl.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Digital Humanities, King's College London</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of History, King's College London</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Alan Turing Institute, University of Cambridge</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>53</fpage>
      <lpage>58</lpage>
      <abstract>
        <p>The ever-expanding wealth of digital material that researchers have at their disposal today, coupled with growing computing power, makes the use of quantitative methods in historical disciplines increasingly more viable. However, applying existing techniques and tools to historical datasets is not a trivial enterprise (Piotrowski, 2012; McGillivray, 2014). Moreover, scholarly communities react differently to the idea that new research questions and insights can arise from quantitative explorations that could not be made using purely qualitative approaches. Some of them, such as linguistics (Jenset and McGillivray, 2017), have been acquainted with quantitative methods for a longer time. Others, such as history, have seen a growth in quantitative methods on the fringes of the discipline, but have not incorporated them into the mainstream of scholarly practice (Hitchcock, 2013).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Historical disciplines, i.e., those focusing on the
study of the past, possess at least two
characteristics, which set them apart and require careful
consideration in this context: the need to work with
closed archives which can only be expanded by
working on past records
        <xref ref-type="bibr" rid="ref13">(Mayrhofer, 1980)</xref>
        , and
the focus on phenomena that change in a complex
fashion over time. First, that means historical
research is grounded in empirical sources which are
stable and fixed (one cannot change the archival
record). But they are often hard to access and,
recording the language and actions of only a small
fraction of historical reality at any moment, have
a complex relationship to the past being studied.
Secondly, the categories through which the past
is studied themselves change, making modelling,
and the automation of analysis based on a limited
number of features in the historical record a fraught
enterprise.
      </p>
      <p>
        Donald E. Knuth is maybe the most famous
godfather of computer science. For him, “[s]cience is
knowledge which we understand so well that we
can teach it to a computer; and if we don’t fully
understand something, it is an art to deal with it.
. . . [T]he process of going from an art to a science
means that we learn how to automate something”
        <xref ref-type="bibr" rid="ref11">(Knuth, 2007)</xref>
        . Computing science is defined by the
tension to automate processes using digital means
and our inability to do so, because we fail to create
fully explicit ways of understanding processes. In
this sense, a computational approach to collecting
and processing (historical) evidence would be a
science if we could learn to automate it. Many
features of the past can be understood through
automation. Yet, the problematic nature of the
relationship between sources and reality and the mutability
of categories, means it will always rely on a
significant degree of human intuition, and cannot be
fully automated; computational history is an art in
Knuth’s terms.
      </p>
      <p>
        The methodological reflections in this paper are
part of an effort to think about how to define the
possibilities and limits of quantification and
automation in historical analysis. Our aim is to
assist scholars to take full advantage of
quantification through a rigorous account of the boundaries
between science and art in Knuth’s terms. Building
on
        <xref ref-type="bibr" rid="ref16">McGillivray et al. (2018)</xref>
        , in this contribution we
will begin with the framework proposed by
        <xref ref-type="bibr" rid="ref8">Jenset
and McGillivray (2017)</xref>
        for quantitative historical
linguistics and illustrate it with two case studies.
      </p>
    </sec>
    <sec id="sec-2">
      <title>A quantitative framework for historical linguistics</title>
      <p>
        <xref ref-type="bibr" rid="ref8">Jenset and McGillivray (2017)</xref>
        ’s framework is the
only general framework available for quantitative
historical linguistics. A comparable framework,
but more limited in scope, can be found in
        <xref ref-type="bibr" rid="ref12">Köhler
(2012)</xref>
        .
        <xref ref-type="bibr" rid="ref8">Jenset and McGillivray (2017)</xref>
        ’s
framework starts from the assumption that linguistic
historical reality is lost and the aim of quantitative
research is to arrive at models of and claims on
such reality which are quantitatively driven from
evidence and lead to consensus among the
scholarly community. The scope of application of this
framework is delimited to the cases where
quantifiable evidence (such as n-grams or numerical data)
can be gathered from primary sources, typically in
the form of corpora, i.e., collections of electronic
text created with the purpose of linguistic analysis.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref8">Jenset and McGillivray (2017)</xref>
        define evidence
in quantitative historical linguistics as the set of
“facts or properties that can be observed,
independently accessed, or verified by other researchers”
        <xref ref-type="bibr" rid="ref3 ref8">(Jenset and McGillivray, 2017, 39)</xref>
        , and thus
exclude intuition as inadmissible as evidence. Such
facts can be pre-theoretical (as the fact that the
English word the is among the most frequent ones) or
based on some hypotheses or assumptions (as the
fact that the class of article in English is among the
most frequent ones, which is based on the
assumption that the class of articles groups certain words
together). Quantitative evidence is “based on
numerical or probabilistic observation or inference”
        <xref ref-type="bibr" rid="ref3 ref8">(Jenset and McGillivray, 2017, 39)</xref>
        , and the
quantification should be independently verifiable. On the
other hand, distributional evidence has the form
“x occurs in context y”, where context can consist
of words, classes, phonemes, etc. Annotated
corpora, where linguistic (morphological, syntactic,
semantic, etc.) information has been encoded in
context, are considered as sources of distributional
evidence to study phenomena in historical
linguistics.
      </p>
      <p>
        Following
        <xref ref-type="bibr" rid="ref5">Carrier (2012)</xref>
        , Jenset and
McGillivray (2017, 40) define claims as anything that is not
evidence, and statements are based on evidence or
on other claims. The role of claims in the
framework concerns their connection with truth, which
can be stated in categorical terms (as in “the claim
that x belongs to class y is true”) or probabilistic
terms (e.g., “x belongs to class y with
probability p). Claims possess a strength proportional to
that of the evidence supporting them. For example,
all other things being equal, claims supported by
large evidence are stronger than claims supported
by little evidence.
      </p>
      <p>
        Ultimately, research in historical linguistics aims
at making (hopefully strong) claims logically
following assumptions shared by the community,
other claims, or evidence. A hypothesis
originates from previous research, intuition, or logical
arguments, and is “a claim that can be tested
empirically, through statistical hypothesis testing on
corpus data”
        <xref ref-type="bibr" rid="ref3 ref8">(Jenset and McGillivray, 2017, 42)</xref>
        .
In this context, “model” means a formalized
representation of a phenomenon, be it statistical or
symbolic
        <xref ref-type="bibr" rid="ref23 ref6 ref9">(Zuidema and de Boer, 2014)</xref>
        . Models
(including those deriving from hypotheses tested
quantitatively against evidence) are research tools
embedding claims or hypotheses, useful in order to
produce novel claims and hypotheses in turn via “a
continual process of coming to know by
manipulating representations”
        <xref ref-type="bibr" rid="ref14">(McCarty, 2004)</xref>
        .
      </p>
      <p>
        Based on these definitions,
        <xref ref-type="bibr" rid="ref8">Jenset and
McGillivray (2017)</xref>
        formalize the research process they
envisage as part of their framework, see Figure 1.
The process starts from the historical linguistic
reality, which we assume to be lost for ever. Any
research model can only aim at approaching this
reality without reaching it completely, and quantitative
historical linguistics ultimately will produce
models of language that are quantitative driven from
evidence. The rest of the diagram shows how this
is achieved. The historical linguistic reality gave
rise to a series of primary sources, including
documents and other (mainly textual) sources, and these
to secondary sources like grammars and
dictionaries. Based on the knowledge of the language
we gather from these sources we can draft
annotation schemes which specify the rules for adding
linguistic information to the corpora and thus
obtain annotated corpora. Corpora are the source of
quantitative distributional evidence which can be
used to test statistical hypotheses, formulated based
on our intuition of the language and on knowledge
drawn from examples. Such hypotheses can also
feed into the creation of linguistic models, which
aim to represent the historical linguistic reality.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Model-building in history</title>
      <p>
        In contrast with quantitative historical linguistics,
the discipline of history possesses an extraordinary
variety of idioms to describe itself, and has much
less rigorous analytical vocabulary to describe its
method. Yet there are important similarities, which
mean
        <xref ref-type="bibr" rid="ref8">Jenset and McGillivray (2017)</xref>
        ’s framework
can be translated and modified for use for
historical research more generally. First, historians
assume that historical reality is lost, and can only
be understood through traces left in a variety of
archives (including human memory). Second,
although historians rarely explicitly talk about
constructing models, their practice largely consists of
making claims about representations of the past
which other disciplines would describe in precisely
such terms. From the process they describe as the
‘interpretation’ or ‘analysis’
        <xref ref-type="bibr" rid="ref22">(Tosh, 2015)</xref>
        of the
sources, historians create representations which
reduce the vast complexity of historical reality to a
few limited, stylised characteristics; Max Weber’s
Protestant Ethic, Lewis Namier’s system of
factional interest or C.A. Bayly’s great uniformity.
Third, these representations are used to make
hypotheses and claims about change over time of
different kinds. These might be about about the
endurance or rupture of certain key feature in a
particular sphere of activity, or about the forces
responsible for causing a particular event or set of
processes process, for example.
      </p>
      <p>
        We have suggested that history is (if implicitly)
essentially a model-building enterprise. That
allows many of the hypotheses which historians
develop to be theoretically amenable to quantification.
The use of quantitative methods (in particular
using the analysis of textual corpora) has increased
recently
        <xref ref-type="bibr" rid="ref23 ref6 ref9">(Guldi and Armitage, 2014)</xref>
        . But, most
historians are reluctant to quantify because they are
skeptical about formalising their models,
believing that to do so would imply their possessing a
degree of categorical rigidity unwarranted by the
complexity of the past. We suggest that more
explicit reflection on method, and engagement with
other fields (such as historical linguistics) which
deal with fuzzy categories would help overcome
these obstacles.
      </p>
      <p>What’s more, the use of digital data-sets and
application of quantitative techniques to them allows
historical claims based on the prevalence of
certain features of the past to be empirically tested.
Such claims are central to many forms of
historical argumentation already; about the importance
of particular concepts or practices at specific
moments, for example. Of course such claims need
to be precisely related to the structure of the
(digitised) archive; as ever, limitations must be
recognised. But given the amount of material which can
be quickly processed, quantification allows claims
previously asserted through little more the
accumulation of anecdotes to be more rigorously validated.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Languages of power</title>
      <p>
        The first case study where we apply
        <xref ref-type="bibr" rid="ref8">Jenset and
McGillivray (2017)</xref>
        ’s framework considers a
recent collaboration between Digital Humanities and
History at King’s College London
        <xref ref-type="bibr" rid="ref3 ref8">(Blanke and
Wilson, 2017)</xref>
        , to develop a “materialist sociology
of political texts” following Moretti’s ideas of
distant reading
        <xref ref-type="bibr" rid="ref17">(Moretti, 2013)</xref>
        . The project worked
on a corpus of post-1945 UK government White
Papers to map connections and similarities in
political language from 1945 to 2010. As the corpus
is time-indexed, a quantitative analysis traced the
changing shape of political language, by tracking
clusters of terms relating to particular concepts and
charting the changing meaning of words. Creating
the distributional quantitative evidence involved
text pre-processing to create a term-document
matrix. Using natural language processing libraries,
this was annotated with grammatical information,
as well as with a number of dictionaries that
reflected facets such as sentiment, ambiguity and so on.
These allowed the project to use models for
historical texts which not only read the texts themselves
but also to developed ways of classifying them into
time intervals. More advanced techniques were
applied to trace changes of meaning in key political
concepts across time intervals, using topic models
and word embeddings, allowing historiographical
and linguistic hypotheses to be tested.
      </p>
      <p>
        In
        <xref ref-type="bibr" rid="ref8">Jenset and McGillivray (2017)</xref>
        ’s terms, these
various techniques produced a variety of different
quantitative distributional evidence, which allowed
a series of hypotheses to be developed and tested.
Intuition, often developed from historical research
using non-quantitative techniques, had an
important role in framing hypotheses. But quantitative
evidence was able to impart greater clarity and
specificity to intuitional hypotheses, often closing
down multiple possibilities. For example, using
our dictionaries demonstrated a major break in the
language of White Papers in the mid-1960s, around
the election of Harold Wilson’s Labour government.
While this intuitionally made sense, so would a
break in the early 1980s, which we did not find,
instead seeing a rupture in the early 1990s.
      </p>
      <p>
        Combining our chronological analysis with topic
modelling and word embeddings allowed us to
build a series of models of the predominant
concerns and the structure of political language in
each epoch. In line with In
        <xref ref-type="bibr" rid="ref8">Jenset and
McGillivray (2017)</xref>
        ’s framework, these models were built
from iteratively generating and testing hypotheses.
For example, we tested the frequency of different
term clusters generated through topic modelling,
and the terms whose embedding changed most
dramatically between each epoch.
      </p>
      <p>Our process of hypothesis generation and
testing always had in mind the commonplace
assumptions made by historians using non-quantitative
techniques in the field. In many respectives,
quantitative distributional evidence produced hypotheses
at variance with those scholarly norms. For
example, we found White Papers in the period from
1945 to 1964 to be dominated by post-war foreign
policy concerns, not the construction of the welfare
state; economic language was being dominant in
the period from 1965-1990 not afterwards; and ‘the
state’ as a political agent is more important in the
later period than before.</p>
      <p>Yet, as challenging as they may be to much of
the historiography of post-war Britain, the form
of these hypotheses is very similar to the form of
the claims made in standard historical
argumentation; there is no dramatic epistemological leap in
the type of knowledge being produced. Although
our models were developed using automated
techniques, they can be verified qualitatively in the
same way as non-quantifiable claims, through
quotation, and the interpretation of words and phrases
in specific contexts.</p>
      <p>One important finding is the need to recognise
the broad range of different ways in which
quantitative analysis can be expressed. It is important,
for example, to indicate the absolute frequency of
terms in any series as well as their relation to other
terms. There is significant work to be done
developing ways to visually represent the quantitative
features of any corpus of texts.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Predicting the Past</title>
      <p>
        Digital humanities generally use computational
modelling for exploratory data analysis. Digital
humanities makes use of the advancements in the
abilities to visualise and interactively explore in
a relatively free fashion. Recently, we have
witnessed the emergence of new combinations of
exploratory data analysis with statistical evidence for
discovered patterns. In the digital humanities, this
is popular, too, if
        <xref ref-type="bibr" rid="ref9">Klingenstein et al. (2014)</xref>
        , for
instance, integrats a historical regression analysis
into their data visualisations. Our first example
above is an instance of exploratory data analysis,
using topic modelling and other tools to provide
statistical evidence for underlying trends in the
documents, as earlier demonstrated. Models, however,
often have another purpose beyond the exploration
of data. They are part of predictive analytics.
        <xref ref-type="bibr" rid="ref1">Abbott (2014)</xref>
        is one of the most famous practitioners
in the field. For him, predictive analytics work on
“discovering interesting and meaningful patterns
in data. It draws from several related disciplines,
some of which have been used to discover patterns
in data for more than 100 years, including pattern
recognition, statistics, machine learning, artificial
intelligence, and data mining.”
        <xref ref-type="bibr" rid="ref1">(Abbott, 2014)</xref>
        .
      </p>
      <p>It is a common misunderstanding to reduce
predictive analytics to attempts to predicting the future.
It is rather about developing meaningful
relationships in any data. Predictive analytics compared
to traditional analytics is driven by the data
under observation rather than primarily by human
assumptions on the data. Its discipline strives to
automate the modelling and finding patterns as far
as this is possible. In this sense, it moves away
from both exploratory and confirmatory data
analysis, as it fully considers how computers would
process evidence.</p>
      <p>
        O’Neil and Schutt (2013) introduce the idea of
predicting the past, which is used to model the
effects of electronic health records (EHR) and to
set up new monitoring programs for drugs. For
O’Neil and Schutt (2013), these integrated datasets
were the foundations of novel research attempts to
predict the past. They cite the ‘Observational
Medical Outcomes Partnership (OMOP)’ in the US that
investigates how good we are at predicting what
we already know about drug performance in health
using past datasets. Once OMOP had integrated
data from heterogeneous sources, it began to look
into predicting the past of old drug cases and how
effective their treatments were. “Employing a
variety of approaches from the fields of epidemiology,
statistics, computer science, and elsewhere, OMOP
seeks to answer a critical challenge: what can
medical researchers learn from assessing these new
health databases, could a single approach be
applied to multiple diseases, and could their findings
be proven?”
        <xref ref-type="bibr" rid="ref18">(O’Neil and Schutt, 2013)</xref>
        . Predicting
the past thus tries to understand how “well the
current methods do on predicting things we actually
already know”
        <xref ref-type="bibr" rid="ref18">(O’Neil and Schutt, 2013)</xref>
        .
      </p>
      <p>
        Such a novel approach relating to past data sets
should be of interest to the digital history. Digital
history could use the approach to control decisions
on how we organise and divide historical records.
An existing example that implies predicting past
events by joining historical data sets, is the
identification of historical spatio-temporal patterns of IED
usage by the Provisional Irish Republican Army
during ‘The Troubles’, used to attribute ‘historical
behaviour of terrorism’
        <xref ref-type="bibr" rid="ref20">(Tench et al., 2016)</xref>
        .
      </p>
      <p>
        In
        <xref ref-type="bibr" rid="ref2">Blanke (2018)</xref>
        , we demonstrate how
predicting the past can complement and enhance existing
work in the digital humanities that is mainly
concentrated on exploring gender issues as they appear
in past datasets.
        <xref ref-type="bibr" rid="ref4">Blevins and Mullen (2015)</xref>
        provide
an expert introduction into why digital humanities
should be interested in predicting genders. Gender
values are often missing from datasets and need to
be imputed. Predictive analytics can be seen as a
corrective to existing data practices and we can
predict the genders in a dataset. In
        <xref ref-type="bibr" rid="ref2">Blanke (2018)</xref>
        , we
compare a traditional dictionary-based approach
with two machine learning strategies. First a
classification algorithm is discussed and then three
different rule-based learners are introduced. We can
demonstrate how these rule-based learners are an
effective alternative to the traditional
dictionarybased method and partly outperform it.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref2">Blanke (2018)</xref>
        develops the predicting the past
methodology further and present differences from
other predictive analytics approaches. We follow
all the steps of traditional predictive analytics to
prepare a stable and reliable model, where we pay
particular attention to avoid overfitting the data,
one of the main risks in predictive models. An
‘overfitting’ model is one that models existing
training data too closely, which negatively impacts its
ability to generalize to new cases. We perform
extensive cross-validations to avoid over-fitting.
      </p>
      <p>
        Predicting the past, however, differs significantly
from other approaches, as the model is not prepared
for future addition of data but to analyse existing
data. The aim is to understand which (minimal)
set of features makes it likely that observation x
includes feature y. In
        <xref ref-type="bibr" rid="ref2">Blanke (2018)</xref>
        , we aimed to
understand which combination of features make it
likely that a historical person is of gender female,
male or unknown. The next step in our
methodology is therefore to apply the best performing
models to the whole data set again to analyse what
gender determinations exist in the data. Is it, e.g.,
more likely that vagrants were female in London?
      </p>
      <p>
        The common approaches to gender prediction
in the digital humanities uses predefined
dictionaries of first names and then matches the gender of
individuals against this dictionary. This has firstly
the problem that these dictionaries are heavily
dependent on culture and language they relate to. But
this is not the only issue, as dictionary-based
approaches secondly also assume that errors are
randomly distributed. Gender trouble is simply a
problem of not recording the right gender in the data.
Our predictive analytics approach in
        <xref ref-type="bibr" rid="ref2">Blanke (2018)</xref>
        on the other hand does not make this assumption
in advance and judges gender based on the existing
data. This has led in turn to interesting insights
on why certain genders remain unknown to the
models.
      </p>
      <p>In summary, predicting the past is based firstly
on going through all traditional predictive analytics
steps to form a stable model that reflects the
underlying historical evidence close enough but also does
not overfit. Secondly, we use this stable model to
algorithmically analyse historical evidence to gain
insights on how a computer would see the relations
of evidence.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and future work</title>
      <p>
        This comparison leads us to the conclusion that,
despite the broad applicability of
        <xref ref-type="bibr" rid="ref8">Jenset and
McGillivray (2017)</xref>
        ’s framework in both cases, some
important differences emerge between historical
linguistics and history. We discuss two. First of
all, the scope of primary source and its
quantitative representation is broader in history, including
not only distributional but also categorical, ordinal,
and numerical evidence. History requires careful
discernment of which is most appropriate, and how
they should be combined.
      </p>
      <p>Secondly, the scope for a purely quantitative
approach is less broad: quantitative evidence and
models can often only contribute to inform
hypotheses and claims which rely on qualitative
evidence and methods. Often it seems that quantitative
methods are only accepted by historical scholars
if the claims developed by automated techniques
can also be verified qualitatively, through anecdote,
quotation and so on. In many fields quantification
can be accepted because it creates results which
look similar to those produced by qualitative
research. But this approach limits the development
of methods that use quantification to do more than
simply re-frame qualitative observations, and
instead make statistical arguments about aggregate
behaviour in its own right. In the future, we plan
to develop these insights further, in order to build
a more comprehensive research framework which
integrates qualitative and quantitative approaches.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by The Alan Turing
Institute under the EPSRC grant EP/N510129/1. BM
is supported by the Turing award TU/A/000010
(RG88751).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abbott</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Applied predictive analytics: Principles and techniques for the professional data analyst</article-title>
          . Hoboken, NJ: Wiley.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Blanke</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tobias</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Predicting the past</article-title>
          .
          <source>Digital Humanities Quarterly</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Blanke</surname>
          </string-name>
          , Tobias and Jon Wilson (
          <year>2017</year>
          ).
          <article-title>Identifying epochs in text archives</article-title>
          .
          <source>In 2017 IEEE International Conference on Big Data (Big Data)</source>
          , pages
          <fpage>2219</fpage>
          -
          <lpage>2224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Blevins</surname>
          </string-name>
          , Cameron and Lincoln
          <string-name>
            <surname>Mullen</surname>
          </string-name>
          (
          <year>2015</year>
          ). Jane, John. . .
          <string-name>
            <surname>Leslie</surname>
          </string-name>
          ?
          <article-title>A historical method for algorithmic gender prediction</article-title>
          .
          <source>Digital Humanities Quarterly</source>
          ,
          <volume>9</volume>
          (
          <issue>3</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Carrier</surname>
          </string-name>
          , Richard (
          <year>2012</year>
          ).
          <article-title>Proving history: Bayes's theorem and the quest for the historical Jesus</article-title>
          . Amherst, NY: Prometheus Books.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Guldi</surname>
          </string-name>
          , Jo and David
          <string-name>
            <surname>Armitage</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The History Manifesto</article-title>
          . Cambridge: Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hitchcock</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tim</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Confronting the digital: Or how academic history writing lost the plot</article-title>
          .
          <source>Cultural and Social History</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ):
          <fpage>9</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Jenset</surname>
            ,
            <given-names>Gard B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Barbara McGillivray</surname>
          </string-name>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Klingenstein</surname>
            , Sara,
            <given-names>Tim</given-names>
          </string-name>
          <string-name>
            <surname>Hitchcock</surname>
          </string-name>
          , and
          <string-name>
            <surname>Simon DeDeo</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The civilizing process in London's Old Bailey</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>111</volume>
          (
          <issue>26</issue>
          ):
          <fpage>9419</fpage>
          -
          <lpage>9424</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1405984111.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>Donald E.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Computer programming as an art</article-title>
          .
          <source>In ACM Turing award lectures</source>
          , page
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Köhler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Reinhard</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Quantitative Syntax Analysis</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Mayrhofer</surname>
          </string-name>
          ,
          <string-name>
            <surname>Manfred</surname>
          </string-name>
          (
          <year>1980</year>
          ).
          <article-title>Zur Gestaltung des etymologischen Wörterbuchs einer “Großcorpus-Sprache”</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>McCarty</surname>
          </string-name>
          ,
          <string-name>
            <surname>Willard</surname>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Modeling: A study in words and meanings</article-title>
          . In Susan Schreibman, Ray Siemens, and John Unsworth, eds., A Companion to Digital Humanities, pages
          <fpage>254</fpage>
          -
          <lpage>270</lpage>
          . Malden, MA: Blackwell.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Methods in Latin Computational Linguistics</article-title>
          . Leiden: Brill.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>McGillivray</surname>
            , Barbara,
            <given-names>Giovanni</given-names>
          </string-name>
          <string-name>
            <surname>Colavizza</surname>
          </string-name>
          , and
          <string-name>
            <surname>Tobias Blanke</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Towards a quantitative research framework for historical disciplines</article-title>
          .
          <source>In COMHUM 2018: Book of Abstracts for the Workshop on Computational Methods in the Humanities</source>
          <year>2018</year>
          . Lausanne, Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Moretti</surname>
          </string-name>
          ,
          <string-name>
            <surname>Franco</surname>
          </string-name>
          (
          <year>2013</year>
          ). Distant Reading. London: Verso.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>O'Neil</surname>
          </string-name>
          , Cathy and Rachel
          <string-name>
            <surname>Schutt</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Doing data science: Straight talk from the frontline</article-title>
          . Sebastopol, CA:
          <string-name>
            <given-names>O</given-names>
            <surname>'Reilly.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Piotrowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>Michael</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Natural language processing for historical texts</article-title>
          . San Rafael, CA: Morgan &amp; Claypool.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Tench</surname>
            , Stephen,
            <given-names>Hannah</given-names>
          </string-name>
          <string-name>
            <surname>Fry</surname>
          </string-name>
          , and Paul Gill (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>Spatio-temporal patterns of IED usage by the Provisional Irish Republican Army</article-title>
          .
          <source>European Journal of Applied Mathematics</source>
          ,
          <volume>27</volume>
          (
          <issue>3</issue>
          ):
          <fpage>377</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Tosh</surname>
          </string-name>
          , John (
          <year>2015</year>
          ).
          <article-title>The Pursuit of History. Aims, Methods and New Directions in the Study of History</article-title>
          . London: Routledge, sixth ed.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Zuidema</surname>
          </string-name>
          , Willem and Bart de Boer (
          <year>2014</year>
          ).
          <article-title>Modeling in the language sciences</article-title>
          .
          <source>In Robert J. Podesva and Devyani</source>
          Sharma, eds., Research Methods in Linguistics, pages
          <fpage>428</fpage>
          -
          <lpage>445</lpage>
          . Cambridge: Cambridge University Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>