<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Computational Humanities Research Conference, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Detecting Sequential Genre Change in Eighteenth-Century Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jinbin Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yann Ciarán Ryan</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IiroRastas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FilipGinter</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikko Tolonen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rohit Babbar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TurkuNLP, University of Turku</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Machine classi昀椀cation of historical books into genres is a common task for NLP-based classi昀椀ers and has a number of applications, from literary analysis to information retrieval. However it is not a straightforward task, as genre labels can be ambiguous and subject to temporal change, and moreoever many books consist of mixed or miscellaneous genres. In this paper we describe a work-in-progress method by which genre predictions can be used to determine longer sequences of genre change within books, which we test out with visualisations of some hand-picked texts. We apply state-of-the-art methods to the task, including a BERT-based transformer and character-level Perceiver model, both pre-trained on a large collection of eighteenth century works (ECCO), using a new set of hand-annotated documents created to re昀氀ect historical divisions. Results show that both models perform signi昀椀cantly better than a linear baseline, particularly when ECCO-BERT is combined with t昀椀df features, though for this task the character-level model provides no obvious advantage. Initial evaluation of the genre sequence method shows it may in the future be useful in determining and dividing the multiple genres of miscellaneous and hybrid historical texts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BERT</kwd>
        <kwd>text classi昀椀cation</kwd>
        <kwd>genre change</kwd>
        <kwd>ECCO</kwd>
        <kwd>Perceiver</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Thinking about large-scale development of early modern public discourse through the use of
structured data is an exciting opportunity as was established by Moretti some time ag1o9. ][
Besides the use of already available bibliographic data for “distant reading”, a useful further
element is to use unstructured textual databases as source material for the creation of new
structured data on 昀椀elds that are currently poorly availabl1e5.][One such classi昀椀cation 昀椀eld
is genre. Readily available genre information is o昀琀en sporadic, but the opportunities to use it
– especially when we think that many documents are composed of several sequential genres –
can open a new window to the development of public discourse. With better structured data,
we will be able to study the systematization of particular genres in a new manner and take a
fresh look on authorship and the relevance of publisher networks.</p>
      <p>
        Much work in literary history and the history of the book has relied on the analysis of
generic categories (for examples see2[
        <xref ref-type="bibr" rid="ref2 ref29 ref30 ref31">0, 33, 34, 35, 2, 19</xref>
        ]). Computational genre classi昀椀cation
is a complex problem. Two key reasons are that genre divisions change over time, and not
every book can be unambiguously assigned a single genre label. Existing methods for genre
detection o昀琀en assume each text or pre-de昀椀ned chunk such as a chapter or section can be
classi昀椀ed as a single genre or a distribution of genre probabilitie7s, [
        <xref ref-type="bibr" rid="ref34 ref6">38, 6</xref>
        ], which does not
re昀氀ect the reality of many eighteenth century texts. One important exception to this is the
page-level classi昀椀cation of Underwood et al.3[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], subsequently used to detect sequences of
genre using a hidden Markov model.37[]
      </p>
      <p>This paper describes a number of improvements to existing methods: 昀椀rst, rather than
relying on existing modern or broad classi昀椀cation systems, we use a newly-created training set
of documents, with a custom-designed, domain-speci昀椀c taxonomy which attempts to balance
pragmatism with capturing meaningful and 昀椀ne-grained eighteenth-century organisational
categories. Second, we use a BERT transformer model which has been speci昀椀cally trained on
eighteenth century texts, which performs signi昀椀cantly better than base BERT, and third, we
propose a method by which we hope this 昀椀ne-grained classi昀椀cation can be used to represent
books as sequences and combinations of genres.</p>
      <p>
        We report on and compare results from a number of classi昀椀ers: a document-level classi昀椀er
that uses only one BERT input segment for each document (ECCO-BERT-Seq), a classi昀椀er for
text chunks, which can also be aggregated on a document-level (ECCO-BERT-Chunk), and a
character-level Perceiver model using the same input as ECCO-BERT-S1eqT.he BERT model
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] has achieved great improvements on various modern language datasets in comparison
to previous deep learning methods. Recently, there have also been some models which are
pre-trained on historical corpora of di昀erent language2s1[
        <xref ref-type="bibr" rid="ref16">, 16, 39</xref>
        ], and pre-trained language
models are also used in the historical domain, such as predicting the ye2a1r],[named entity
recognition [
        <xref ref-type="bibr" rid="ref1 ref13 ref16 ref25">16, 13, 1, 27</xref>
        ] and emotion analysis. 2[
        <xref ref-type="bibr" rid="ref23 ref6">6, 25</xref>
        ] We also face some challenges from
OCR recognition errors1[
        <xref ref-type="bibr" rid="ref27">0, 29</xref>
        ] when using pre-trained models for historical data.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The ECCO Dataset</title>
      <p>
        The data used both for model training and for predictions comes from Eighteenth Century
Collections Online (ECCO). ECCO is a set of 180,000 digitised documents published originally in
the eighteenth century, created by the so昀琀ware and education company Gale. 5[] These
digitised images have been converted into readable text data using Optical Character Recognition
(OCR). Despite its size, a recent study comparing ECCO to the English Short Title Catalogue
(ESTC) has highlighted signi昀椀cant gaps and imbalances[32], and the ESTC itself is known to
be incomplete. [
        <xref ref-type="bibr" rid="ref20">22</xref>
        ] These attributes, and the impact of them on several downstream tasks,
1In this paper the words ’book’ and ’document’ have distinct meanings. ’Book’ is used to denote an edition of a
physical book, for example ’there are over 400,000 books listed in the English Short Title Catalogue’. ’Document’
by contrast, is reserved for a single text document as used for data for the classi昀椀cation method and other tasks.
Not all documents in the ECCO data map to a single book, and vice-versa.
have been covered in detail in previous papers30[
        <xref ref-type="bibr" rid="ref14 ref8">, 8, 14</xref>
        ] and are just brie昀氀y outlined here.
First, the distribution of documents in ECCO is uneven and skewed towards the end of the
century and second, the OCR contains signi昀椀cant noise and errors. Additionally, not all texts
are in the English language, and many are reprints of works published in earlier centuries. The
former have been excluded but the latter are retained for our training and test data. Despite
these caveats, ECCO is the largest and most complete source we have for eighteenth-century
text data. Though it has its own institutional history and biases, it is complete enough that it
contains not only the more ‘important’ or ‘literary’ genres, nor is it focused solely on canonical
works. Its data and digitised images are used extensively, forming the basis of many scholarly
enquiries and research questions.3[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data Annotation</title>
      <p>Key to the work leading up to this paper was to create a usable training set of documents
annotated with genre labels. We began with a sample set of book records and a set of preliminary
genre labels. These books were then labelled by two annotators with domain expertise. At
this stage, we revisited the labels, and made some adjustments to those which had
particularly low inter-annotator agreement. Once the set of genre labels had been 昀椀nalised, we
annotated a large set (5,672 individual works, which correspond to 37,574 known editions, of which
30,119 correspond to ECCO documents) with genre information. A昀琀er this second round, we
again checked for inter-annotation agreement, coming to a consensus following a discussion
of each disagreement. The eventual 43 昀椀ne-grained categories were then collapsed into main
categories for some of the classi昀椀cation tasks. These book labels were then mapped to the
equivalent ECCO document IDs. The 昀椀nal set of labels are given in appendix A.</p>
      <p>
        Existing categorical distinctions were either too broad (for example 昀椀ction and non-昀椀ction)
or too 昀椀ne-grained (for example the many historical literary divisions, particularly poetic) for
our needs. Our categories attempt to re昀氀ect the divisions as found in contemporary sources
such as catalogues. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] Additionally, they are closely related to the divisions used by modern
domain experts writing on the history of the book, for example the chapters of the
highlyregarded edited collectioBnooks and their Readers in Eighteenth-Century England, which
contains chapters organised along similar divisions to our ow2n3., [
        <xref ref-type="bibr" rid="ref22">24</xref>
        ] We note that other recent
attempts to categorise eighteenth century book genres use a similar system of division1.8][
The selection is intended to provide useful genre categorisation for scholarly inquiry into book
history and book production. The selection was also pragmatic, with the aim of ending up with
a manageable number of genres, for example so that each class had enough data for the training
and test sets. They were also made with particular questions in mind, which we hoped would
help us to analyse works of Scottish Enlightenment thought, for instance helping to distinguish
patterns within scienti昀椀c or philosophical publishing.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>In this section, we introduce the pre-trained ECCO-BERT model, 昀椀ne-tuning models and
baselines.2 We denote the training dataset as{( , )} =1, where is the book, and is the genre
of . Our goal is to learn a function( ) to predict the genre for book or the genre of a
chunk in book .</p>
      <sec id="sec-4-1">
        <title>4.1. Multi-granular Classification with ECCO-BERT</title>
        <p>
          ECCO-BERT [
          <xref ref-type="bibr" rid="ref19">21</xref>
          ] is a pre-trained language model trained on the ECCO dataset, the
con昀椀guration of which is the same as the bert-base-cased model11[] except for the vocabulary size.
The model is pre-trained with a masked language modelling task, as well as a next sentence
prediction task. The 昀椀ne-tuned ECCO-BERT consists of two parts, one is the transformer
encoder and the other is the linear layer on the top of mean pooling output of the encoder, which
scores di昀erent genres. The Transformer model architecture on which the model is based can
accept inputs up to a relatively short maximum length, in the ECCO-BERT case the standard
maximum of 512 input tokens applies. Inputs longer than this maximum length need to be split
into chunks.
        </p>
        <p>Because we want the training and prediction of the model to take into account the full
information of the document, a document is torn into di昀erent chunks of 510 tokens each to train
the model and predict results, since the maximum input size of ECCO-BERT is 512 tokens (510
input tokens and 2 special tokens expected by the model). For training the model, we assume
that each chunk has the same genre as the document, and the model is trained with the
resulting (chunk, label) pairs. During the inference procedure, we 昀椀rst split the document into
chunks. The 昀椀ne-tuned model then scores each chunk; the predicted genre probability of the
document is the average of all chunks’ probability. The inference process is shown in Figure
1. We call this model ECCO-BERT-Chunk. For comparison, we also train a model conditioned
only on the 昀椀rst 510 sub-words of the document as input, which is denoted as ECCO-BERT-Seq.</p>
        <p>Although the ECCO-BERT-Chunk model considers all chunks to make the 昀椀nal judgment,
its prediction process is very slow since a book o昀琀en contains a lot of chunks. At the same time,
the much faster ECCO-BERT-Seq is only conditioned on the 昀椀rst 510 sub-words, so it might
lose some important information of other parts in the book. To solve this problem, we trained
2The model implementation is available at https://github.com/HPC-HD/ECCO-genre-classi昀椀cation. The original
ECCO-Bert model has been released and is available at https://huggingface.co/TurkuNLP/eccobert-base-cased-v1
a linear model by concatenating the tf-idf features of the full text with the pooling output of the
椀昀ne-tuned ECCO-BERT-Seq. The input can be denoted as [Φ ( ), Φ − ( [∶ 510])],
where Φ represents the transformer encoder and the vectorizer of tf-idf. We call the model
ECCO-BERT-t昀椀df, all results shows in Table1.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline Models</title>
        <p>There are two baseline models we adopt for comparison. The input of linear model is tf-idf
features of the full document. The model only contains the linear layer, the fan-out of the
linear model is the number of main or sub categories. The bert-base-cased is released1b1y], [
which we 昀椀ne-tuned directly with our training data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>There are 30,119 documents annotated by experts. 6,024 documents were randomly selected
and split into development and test datasets, with 3,012 documents each. The labels contain 10
main categories and 43 sub-categories. The genre labels are presentedAin.1.</p>
      <sec id="sec-5-1">
        <title>5.1. Experimental Details</title>
        <p>The sequence length of all BERT models is set to be 512. For 昀椀ne-tuning the ECCO-BERT-Seq
model and bert-base-cased model, we only adopt the 昀椀rst 510 sub-words of the document as
input. These models are trained for 100 epochs on 1 NVIDIA V100. ECCO-BERT-Chunk is
椀昀ne-tuned on 4 NVIDIA A100 GPUs; the main category model and the sub-category model
were trained for 21 and 20 epoches respectively, using an early stop strategy.</p>
        <p>
          The loss function of the linear model is cross entropy. We perform training for 200 epochs
with SGD with momentum [
          <xref ref-type="bibr" rid="ref26">28</xref>
          ] and a batch size of 32. The number of tf-idf features is 500,000.
        </p>
        <p>The ECCO-BERT-t昀椀df models are trained for 220 epochs with SGD with momentum. The
feature extractors are the encoders of 昀椀ne-tuned ECCO-BERT-Seq and vectorizer of linear base
models. In order to make the model make more use of tf-idf features, at the 昀椀rst 200 epoches,
we mask the features from ECCO-BERT-Seq. The number of tf-idf features is 500,000, the
dimension of features extracted from ECCO-BERT-Seq is 768.</p>
        <p>In addition to the primary ECCO-BERT model, we also trained the Perceiver IO mod9e]l [
on the same data as the BERT models. Perceiver is a Transformer model that decouples input
size from overall model size and allows the model to scale linearly with the size of the input
as well as model depth. Perceiver IO generalizes Perceiver further by allowing for arbitrary
outputs. Due to their linear scaling characteristics, the Perceiver models make it practical to use
character-level input data which could result in a model that is more robust against
characterlevel OCR artefacts in the ECCO dataset. Testing this property is our main motivation for using
Perceiver IO on this task. We pre-trained Perceiver on the ECCO data for 1 million steps with
an e昀ective batch size of 768. Training is done similarly to ECCO-BERT, except that the next
sentence prediction task is not used. Fine-tuning for the genre classi昀椀cation task is also similar
to the BERT models, except that un昀椀ltered, byte-level data is used as model inputs.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Genre Model Performance</title>
        <p>We report the models’ accuracy for main categories and sub-categories in Ta1b.leThe
confusion matrix of ECCO-BERT-t昀椀df is shown in Figure2. There is a signi昀椀cant gap between
椀昀ne-tuned bert-base-cased model and other models based on ECCO-BERT, since the
bert-basecased model is pre-trained on modern language corpus, was not exposed to OCR noise during
pre-training, and the language has naturally evolved between 18th century and present-day
English. Although ECCO-BERT-Seq is only conditioned on the 昀椀rst 510 tokens of the
document, its results are also competitive compared to ECCO-BERT-Chunk and ECCO-BERT-t昀椀df
which consider the full document. As shown in Tab1le,ECCO-BERT-t昀椀df performs best since
it combines the transformer feature and t昀椀df of the full document. ECCO-BERT-t昀椀df is also
much faster than ECCO-BERT-Chunk because extracting t昀椀df is much faster than inference of
transformer models.</p>
        <p>Of particular note is the performance of all ECCO-BERT models over base BERT and the
linear model, when looking at the more 昀椀ne-grained categories. Somewhat disappointingly,
the 昀椀ne-tuned Perceiver IO models do not perform better than BERT-based models on this task
in our evaluation. This would indicate that the OCR noise does not interfere with the genre
detection task enough to degrade the performance of BERT-based models.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Document-level Evaluation and Prediction results</title>
        <p>Here we report on both the evaluation of the document-level results for the main categories.
The confusion matrix in2 shows that the precision of the literature category is the highest while
education is the lowest. We also use the ECCO-BERT-t昀椀df model to predict unlabeled ECCO
data and obtain model-predicted genre distributions. There are 177,494 unlabeled documents
in total. The breakdown of predicted categories are shown in Figu3r.eAs our label taxonomy
is custom-made, there is no ground truth for the entirety of ECCO to fully evaluate the accuracy
of the predictions. However the predictions roughly match up with our expectations: previous
analyses of the ESTC, using the existing Dewey Decimal System labels, have found that the
most common subject category is religion.4][
Sales Catalogues
Philosophy 2.25%</p>
        <p>Arts</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Fine-grained analysis with ECCO-BERT-Seq</title>
      <sec id="sec-6-1">
        <title>6.1. Sequential Genre Change</title>
        <p>
          As well as using the ECCO-BERT-Seq to generate document-level predictions using average
values, we can use the individual chunk predictions directly. Here we propose a method to
use this paragraph-level detection to detect chunks within documents where the change from
one genre to another is signi昀椀cant and sustained. Because the predicted genre generally
oscillates signi昀椀cantly from one individual chunk to the next, we needed a method to capture
only sustained changes, ignoring shorter breaks within a ’run’ of the same genre. To do this,
we used the Kleinberg algorithm for detecting ’bursts’ of activity in time-series da1ta2.] [This
uses a hidden Markov process to probabilistically determine when a subsequent event will
occur. When events occur more rapidly and for sustained periods in comparison to this
determination, these are labelled bursts. The detection of the bursts were computed using R bursts
package [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which implements the Kleinberg algorithm.
        </p>
        <p>To adapt this method, the most probable prediction for each chunk within each document
was treated as a time series data point for Kleinberg. We have calculated sections for main and
subcategories separately. The method allows for ’fuzzy’ and overlapping sections of genres.
Additionally we have experimented with only retaining highly-probable classi昀椀cations which
helped to further 昀椀lter out noise. There are drawbacks: because the burst method looks for
change rather than simply all clusters of events, currently not all sections are detected if most
of the text is of a single genre.</p>
        <p>To give some examples, we take some exemplary texts and calculate genre bursts. To
visualise the changes in genre, top genre predictions (over .5 probability) are charted as a
scatterplot in the paragraph sequence, coloured by genre. Burst start and end points are overlaid
as coloured areas. As the method looks for periods of change rather than absolute values, it
ignores the main category of the book (which is detected by the document-level method
successfully anyway) and in most cases highlights sustained excerpts where the detected genre is
di昀erent to the dominant one. Here, we see that David Hume’sPolitical Discourses (Figure4,
A) contains discrete sections on economics (categorised as scienti昀椀c improvement), philosophy
(a section on the balance of power), history (a section on ’ancient nations’) and 昀椀nally law (a
chapter on the idea of the commonwealth)W.ealth of Nations (Figure4, B) begins with a section
on labour and society categorised here as philosophy and smaller sections on law (a discussion
on a speci昀椀c statute), and in the education genre. Most of the book is not classi昀椀ed as its
dominant genre (economics and trade, under the higher-level category scienti昀椀c improvement) as
it does not involve change. VillierM’siscellaneous Works (Figure4, C) detects a large number
of overlapping genre changes. FinallyR,obinson Crusoe (Figure 4, D) is also mostly without
detected bursts, but of note is a section of religious genre, corresponding to a section in the
plot where Crusoe is ill and has prophetic dreams.</p>
        <p>A1.0
0.9
0.8
0.7
0.6
0.5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion and Conclusion</title>
      <p>In this paper we aimed to describe the process to detect sections of fuzzy and overlapping
genre excerpts within individual editions. The results show that at the level of 昀椀ne-grained
divisions (43 subcategories), a model which combines the t昀椀df feature of the full document and
the features of a 昀椀ne-tuned ECCO-BERT model performs signi昀椀cantly better than baselines,
suggesting they may be particularly useful for such tasks. That the BERT model performed so
well on 昀椀ne-grained categories is signi昀椀cant because existing methods to look at genre have
generally used very broad divisions (such as 昀椀ction and non-昀椀ction). The kinds of questions we
are interested in use more 昀椀ne-grained categories, for example looking at the rise of medical
textbooks in certain publishers. This kind of sequencing also has other potential uses, for
example document retrieval. On the present task, we did not observe any improvement o昀ered
by the Perceiver model, which we speci昀椀cally included to test a character-level model which is
capable of accounting for OCR artefacts. At present, we think this is due to a combination of
two factors: Firstly, the base performance on the task is around 95% accuracy, leaving only very
little headroom for improvement with more advanced models. And secondly, the task is by its
nature a document-level task and the good performance of the linear baseline demonstrates that
enough information is present in the data even without explicitly accounting for OCR errors.
It is therefore possible that the advantages of character-based models such as the Perceiver
will be demonstrated on tasks where the correct modelling of individual word occurrences
in their context plays a more signi昀椀cant role. These would include various text tagging and
information retrieval tasks.</p>
      <p>In our future work we hope to further develop the sequencing method, and investigate the
genres in their own right, for instance looking at the sequence patterns of individual authors,
the relationship between intra-book diversity and the success of particular authors or
publishers, and understanding co-occurrence between genres.
[20]</p>
      <p>M. Poovey. “Mary Wollstonecra昀琀: The Gender of Genres in Late Eighteenth-Century
England”. In:NOVEL: A Forum on Fiction 15.2 (1982), pp. 111–126. url: http://www.jsto
r.org/stable/134521.9</p>
    </sec>
    <sec id="sec-8">
      <title>A. Appendix</title>
      <sec id="sec-8-1">
        <title>A.1. The main categories and sub-categories</title>
        <sec id="sec-8-1-1">
          <title>Main categories</title>
        </sec>
        <sec id="sec-8-1-2">
          <title>Arts</title>
        </sec>
        <sec id="sec-8-1-3">
          <title>Education</title>
        </sec>
        <sec id="sec-8-1-4">
          <title>History Law</title>
        </sec>
        <sec id="sec-8-1-5">
          <title>Literature</title>
        </sec>
        <sec id="sec-8-1-6">
          <title>Philosophy</title>
        </sec>
        <sec id="sec-8-1-7">
          <title>Politics</title>
        </sec>
        <sec id="sec-8-1-8">
          <title>Religion</title>
        </sec>
        <sec id="sec-8-1-9">
          <title>Sales Catalogues</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Baptiste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Favre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Auguste</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Henriot</surname>
          </string-name>
          . “
          <article-title>Transferring Modern Named Entity Recognition to the Historical Domain: How to Take the Step?” InW: orkshop on Natural Language Processing for Digital Humanities (</article-title>
          <year>NLP4DH</year>
          ).
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Benedict</surname>
          </string-name>
          . “
          <article-title>The Paradox of the Anthology: Collecting and Di昀érence in EighteenthCentury Britain”</article-title>
          .
          <source>In:New Literary History 34.2</source>
          (
          <issue>2003</issue>
          ), pp.
          <fpage>231</fpage>
          -
          <lpage>256</lpage>
          . url: http://www.jst or.
          <source>org/stable/2005777</source>
          .8
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Binder</surname>
          </string-name>
          .bursts:
          <article-title>Markov Model for Bursty Behavior in Streams</article-title>
          .
          <year>2022</year>
          . url: https://CRAN.Rproject.org/package=burst.s
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Feather</surname>
          </string-name>
          . “
          <article-title>British Publishing in the Eighteenth Century: a preliminary subject analysis”</article-title>
          .
          <source>In: The Library s6-VIII.1</source>
          (
          <issue>1986</issue>
          ), pp.
          <fpage>32</fpage>
          -
          <lpage>46</lpage>
          . doi:
          <volume>10</volume>
          .1093/library/s6-VIII.
          <year>1</year>
          .3.2url: https: //doi.org/10.1093/library/s6
          <source>-VIII.1.3.2</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Gale</surname>
          </string-name>
          .Eighteenth Century Collections Online. url: https://www.gale.com/intl/primary
          <article-title>-so urces/eighteenth-century-collections-onli</article-title>
          .ne
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Goyal</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. Prem</given-names>
            <surname>Prakash</surname>
          </string-name>
          . “
          <article-title>Statistical and Deep Learning Approaches for Literary Genre Classi昀椀cation”</article-title>
          .
          <source>In: Advances in Data and Information Sciences. Ed</source>
          . by
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kolhe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          . Vol.
          <volume>318</volume>
          . Singapore: Springer Singapore,
          <year>2022</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>305</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-16-5689-7\_26. url: https://link.spri nger.
          <source>com/10</source>
          .1007/
          <fpage>978</fpage>
          -981-16-5689-7%5C%
          <fpage>5F26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          . “
          <source>Automated Genre Classi昀椀cation of Books Using Machine Learning and Natural Language Processing”. In2:019 9th International Conference on Cloud Computing</source>
          , Data Science &amp;
          <string-name>
            <surname>Engineering (Con昀氀uence) . Noida</surname>
          </string-name>
          , India: Ieee,
          <year>2019</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .1109/confluence.
          <year>2019</year>
          .
          <volume>8776935</volume>
          . url: https://ieeexplore.ieee.org/doc ument/8776935/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] [9] [10]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Hill</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          . “
          <article-title>Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study”</article-title>
          .
          <source>InD:igital Scholarship in the Humanities 34.4</source>
          (
          <issue>2019</issue>
          ), pp.
          <fpage>825</fpage>
          -
          <lpage>843</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqz024. url: https://academic.o up.com/dsh/article/34/4/825/547612.2
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaegle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Alayrac</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doersch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Koppula</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zoran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Brock</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Héna昀</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Botvinick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Vinyals</surname>
            , and
            <given-names>J. CarreiraP.erceiver IO</given-names>
          </string-name>
          :
          <article-title>A General Architecture for Structured Inputs</article-title>
          &amp; Outputs.
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>doi: 10</source>
          .48550/arxiv.2107.14795. url: https://arxiv.org/abs/2107.1479 5.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          , G. Worthey,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Dubnicek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Downie</surname>
          </string-name>
          . “
          <article-title>Impact of OCR Quality on BERT Embeddings in the Domain Classi昀椀cation of Book Excerpts</article-title>
          .” In: Chr.
          <year>2021</year>
          , pp.
          <fpage>266</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. D. M.-W. C. Kenton</surname>
            and
            <given-names>L. K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          . “Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding”</article-title>
          .
          <source>IPnr:oceedings of naacL-HLT</source>
          .
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          . “
          <article-title>Bursty and Hierarchical Structure in Streams”</article-title>
          .
          <source>DInat:a Mining and Knowledge Discovery</source>
          <volume>7</volume>
          .4 (
          <issue>2003</issue>
          ), pp.
          <fpage>373</fpage>
          -
          <lpage>397</lpage>
          . doi:
          <volume>10</volume>
          .1023/a:1024940629314. url: https://doi.or g/10.1023/A:
          <fpage>1024940629314</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Labusch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kulturbesitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Neudecker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zellhöfer</surname>
          </string-name>
          . “
          <article-title>BERT for named entity recognition in contemporary and historical German”</article-title>
          .
          <source>PInro:ceedings of the 15th conference on natural language processing</source>
          .
          <year>2019</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahti</surname>
          </string-name>
          , E. Mäkelä, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          . “
          <article-title>Quantifying Bias and Uncertainty in Historical Data Collections with Probabilistic Programming”</article-title>
          . In: (
          <year>2020</year>
          ). urhltt:ps://helda.helsink i.fi/handle/10138/327728.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marjanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Roivainen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          . “
          <article-title>Bibliographic Data Science and the History of the Book (c</article-title>
          .
          <fpage>1500</fpage>
          -1800)
          <article-title>”</article-title>
          .
          <source>In:Cataloging &amp; Classi昀椀cation Quarterly 57.1</source>
          (
          <issue>2019</issue>
          ), pp.
          <fpage>5</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1080/01639374.
          <year>2018</year>
          .
          <volume>1543747</volume>
          . url: https://doi.org/10.1080/01639374.20 18.1543747.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          . “
          <article-title>Adapting vs. Pre-training Language Models for Historical Languages”</article-title>
          .
          <source>In: Journal of Data Mining &amp; Digital Humanities Nlp4dh</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .462 98/jdmdh.9152. url: https://jdmdh.episciences.org/9690.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Manson</surname>
          </string-name>
          .
          <article-title>A catalogue of the entire and genuine library and prints of Robert Salusbury Gotton</article-title>
          ,
          <string-name>
            <surname>Esq. F.A.S. [</surname>
          </string-name>
          <article-title>electronic resource] : Comprehending an extensive and valuable collection of books of coins, medals and antiquities, with a few 昀椀nk missals and other manuscripts on vellum, which, with some other select parcels of books lately purchased, are now on sale for ready money, at the price printed in the catalogue, and on the 昀椀rst leaf of each-book, By John Manson, bookseller, No 5, Duke's-Court, St. Martin's-Lane, where catalogues (Price 6d) may be had</article-title>
          .
          <source>[London</source>
          ,
          <volume>1789</volume>
          , [2], 102 pages.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Mazella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Willan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stravoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Barta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>James</surname>
          </string-name>
          . “
          <article-title>“All the modes of story”: Genre and the Gendering of Authorship in the Year 1771”</article-title>
          .
          <source>IAnB:O: Interactive Journal for Women in the Arts</source>
          ,
          <fpage>1640</fpage>
          -
          <lpage>1830</lpage>
          12.
          <issue>1</issue>
          (
          <year>2022</year>
          ). doi: http://doi.org/10.5038/
          <fpage>2157</fpage>
          -
          <lpage>71</lpage>
          29.12.1.1256. url: https://digitalcommons.usf.edu/abo/vol12/iss1/1.0 [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          .Distant reading. London ; New York: Verso,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>I.</given-names>
            <surname>Rastas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. C.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. L. I.</given-names>
            <surname>Tiihonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qaraei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Repo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Babbar</surname>
          </string-name>
          , E. Mäkelä,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ginter</surname>
          </string-name>
          . “
          <article-title>Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model”</article-title>
          .
          <source>In:Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change. The Association for Computational Linguistics</source>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Raven</surname>
          </string-name>
          .
          <article-title>The business of books: booksellers and the English book trade</article-title>
          ,
          <fpage>1450</fpage>
          -
          <lpage>1850</lpage>
          . New Haven: Yale University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23] I. Rivers, ed.
          <source>Books and their readers in eighteenth century England</source>
          . Leicester: Leicester Univ. Press [u.a.],
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24] I. Rivers, ed.Books and
          <article-title>their readers in eighteenth-century England: new essays</article-title>
          . London New York: Leicester University Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dennerlein</surname>
          </string-name>
          , and C. Wol昀. “
          <article-title>Emotion Classi昀椀cation in German Plays with Transformer-based Language Models Pretrained on Historical and Contemporary Language”</article-title>
          .
          <source>In: Association for Computational Linguistics</source>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dennerlein</surname>
          </string-name>
          , and
          <string-name>
            <surname>C. Wol昀.</surname>
          </string-name>
          “
          <article-title>Using Deep Learning for Emotion Analysis of 18th and 19th Century German Plays”</article-title>
          . In: (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>März</surname>
          </string-name>
          . “
          <string-name>
            <surname>Triple E-E昀ective</surname>
          </string-name>
          <article-title>Ensembling of Embeddings and Language Models for NER of Historical German</article-title>
          .”
          <source>ICnL:EF (Working notes)</source>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martens</surname>
          </string-name>
          , G. Dahl, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          . “
          <article-title>On the importance of initialization and momentum in deep learning”</article-title>
          .
          <source>InI:nternational conference on machine learning. Pmlr</source>
          .
          <year>2013</year>
          , pp.
          <fpage>1139</fpage>
          -
          <lpage>1147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Colavizza.</surname>
          </string-name>
          “
          <article-title>An Assessment of the Impact of OCR Noise on Language Models”</article-title>
          .
          <source>In:arXiv preprint arXiv:2202.00470</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [30] [31] [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mäkelä</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ijaz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahti</surname>
          </string-name>
          . “
          <article-title>Corpus Linguistics and Eighteenth Century Collections Online (ECCO)”</article-title>
          .
          <source>InR: esearch in Corpus Linguistics 9.1</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .32714/ricl.
          <source>09.01.0 3</source>
          . url: https://ricl.aelinco.es/index.php/ricl/article/view/.161
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mäkelä</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ijaz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahti</surname>
          </string-name>
          . “
          <article-title>Corpus Linguistics and Eighteenth Century Collections Online (ECCO)”</article-title>
          .
          <source>InR: esearch in Corpus Linguistics 9.1</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .32714/ricl.
          <source>09.01.0 3</source>
          . url: https://ricl.aelinco.es/index.php/ricl/article/view/.161
          <string-name>
            <given-names>M.</given-names>
            <surname>Tolonen</surname>
          </string-name>
          , E. Mäkelä, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahti</surname>
          </string-name>
          . “
          <article-title>The Anatomy Of Eighteenth Century Collections Online (Ecco)”</article-title>
          .
          <source>In:Eighteenth-century studies 56.1</source>
          (
          <issue>2022</issue>
          ), pp.
          <fpage>95</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          .
          <article-title>Distant horizons: digital evidence and literary change</article-title>
          . Chicago: The University of Chicago Press,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          . “
          <article-title>Genre Theory and Historicism”</article-title>
          .
          <source>InJo:urnal of Cultural Analytics 2.2</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .22148/16.008. url: https://culturalanalytics.org/article/110.63
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          . “
          <article-title>The Life Cycles of Genres”</article-title>
          .
          <source>InJ:ournal of Cultural Analytics 2.2</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .22148/16.005. url: https://culturalanalytics.org/article/110.61
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          . “
          <article-title>Understanding Genre in a Collection of a Million Volumes, Interim Report”</article-title>
          . In: (
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          .6084/m9.figshare.
          <volume>1281251</volume>
          .v1. url: https://figshare.com/article s/
          <source>journal%5C%5Fcontribution/Understanding%5C%5FGenre%5C%5Fin%5C%5Fa%5C%5 FCollection%5C%5Fof%5C%5Fa%5C%5FMillion%5C%5FVolumes%5C%5FInterim%5C%5 FReport/1281251.</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Auvil</surname>
          </string-name>
          , and B.
          <source>CapitanuM. apping Mutable Genres in Structurally Complex Volumes</source>
          .
          <year>2013</year>
          . doi:
          <volume>10</volume>
          .1109/BigData.
          <year>2013</year>
          .
          <volume>6691676</volume>
          . url: http://arxiv.org /abs/1309.3323.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Worsham</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalita</surname>
          </string-name>
          . “
          <article-title>Genre Identi昀椀cation and the Compositional E昀ect of Genre in Literature”</article-title>
          .
          <source>In:Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe</source>
          , New Mexico, USA: Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>1963</fpage>
          -
          <lpage>1973</lpage>
          . url: https://aclanthology.org/C18-116.7 [39]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Son</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oh</surname>
          </string-name>
          . “HUE:
          <article-title>Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea”</article-title>
          .
          <article-title>InF:indings of the Association for Computational Linguistics: NAACL 2022</article-title>
          . Seattle, United States: Association for Computational Linguistics,
          <year>2022</year>
          , pp.
          <fpage>1832</fpage>
          -
          <lpage>1844</lpage>
          . url:https://aclanthology.org/
          <year>2022</year>
          .findings-n aacl.
          <volume>140</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>