1. Introduction

Computational Humanities Research Conference, December

Detecting Sequential Genre Change in Eighteenth-Century Texts

Jinbin Zhang

Yann Ciarán Ryan

IiroRastas

FilipGinter

Mikko Tolonen

Rohit Babbar

0 0 Aalto University , Finland 1 TurkuNLP, University of Turku , Finland 2 University of Helsinki , Finland

2022

1 2 14

Machine classi昀椀cation of historical books into genres is a common task for NLP-based classi昀椀ers and has a number of applications, from literary analysis to information retrieval. However it is not a straightforward task, as genre labels can be ambiguous and subject to temporal change, and moreoever many books consist of mixed or miscellaneous genres. In this paper we describe a work-in-progress method by which genre predictions can be used to determine longer sequences of genre change within books, which we test out with visualisations of some hand-picked texts. We apply state-of-the-art methods to the task, including a BERT-based transformer and character-level Perceiver model, both pre-trained on a large collection of eighteenth century works (ECCO), using a new set of hand-annotated documents created to re昀氀ect historical divisions. Results show that both models perform signi昀椀cantly better than a linear baseline, particularly when ECCO-BERT is combined with t昀椀df features, though for this task the character-level model provides no obvious advantage. Initial evaluation of the genre sequence method shows it may in the future be useful in determining and dividing the multiple genres of miscellaneous and hybrid historical texts.

eol>BERT text classi昀椀cation genre change ECCO Perceiver

1. Introduction

Thinking about large-scale development of early modern public discourse through the use of structured data is an exciting opportunity as was established by Moretti some time ag1o9. ][ Besides the use of already available bibliographic data for “distant reading”, a useful further element is to use unstructured textual databases as source material for the creation of new structured data on 昀椀elds that are currently poorly availabl1e5.][One such classi昀椀cation 昀椀eld is genre. Readily available genre information is o昀琀en sporadic, but the opportunities to use it – especially when we think that many documents are composed of several sequential genres – can open a new window to the development of public discourse. With better structured data, we will be able to study the systematization of particular genres in a new manner and take a fresh look on authorship and the relevance of publisher networks.

Much work in literary history and the history of the book has relied on the analysis of generic categories (for examples see2[ 0, 33, 34, 35, 2, 19 ]). Computational genre classi昀椀cation is a complex problem. Two key reasons are that genre divisions change over time, and not every book can be unambiguously assigned a single genre label. Existing methods for genre detection o昀琀en assume each text or pre-de昀椀ned chunk such as a chapter or section can be classi昀椀ed as a single genre or a distribution of genre probabilitie7s, [ 38, 6 ], which does not re昀氀ect the reality of many eighteenth century texts. One important exception to this is the page-level classi昀椀cation of Underwood et al.3[ 6 ], subsequently used to detect sequences of genre using a hidden Markov model.37[]

This paper describes a number of improvements to existing methods: 昀椀rst, rather than relying on existing modern or broad classi昀椀cation systems, we use a newly-created training set of documents, with a custom-designed, domain-speci昀椀c taxonomy which attempts to balance pragmatism with capturing meaningful and 昀椀ne-grained eighteenth-century organisational categories. Second, we use a BERT transformer model which has been speci昀椀cally trained on eighteenth century texts, which performs signi昀椀cantly better than base BERT, and third, we propose a method by which we hope this 昀椀ne-grained classi昀椀cation can be used to represent books as sequences and combinations of genres.

We report on and compare results from a number of classi昀椀ers: a document-level classi昀椀er that uses only one BERT input segment for each document (ECCO-BERT-Seq), a classi昀椀er for text chunks, which can also be aggregated on a document-level (ECCO-BERT-Chunk), and a character-level Perceiver model using the same input as ECCO-BERT-S1eqT.he BERT model [ 11 ] has achieved great improvements on various modern language datasets in comparison to previous deep learning methods. Recently, there have also been some models which are pre-trained on historical corpora of di昀erent language2s1[ , 16, 39 ], and pre-trained language models are also used in the historical domain, such as predicting the ye2a1r],[named entity recognition [ 16, 13, 1, 27 ] and emotion analysis. 2[ 6, 25 ] We also face some challenges from OCR recognition errors1[ 0, 29 ] when using pre-trained models for historical data.

2. The ECCO Dataset

The data used both for model training and for predictions comes from Eighteenth Century Collections Online (ECCO). ECCO is a set of 180,000 digitised documents published originally in the eighteenth century, created by the so昀琀ware and education company Gale. 5[] These digitised images have been converted into readable text data using Optical Character Recognition (OCR). Despite its size, a recent study comparing ECCO to the English Short Title Catalogue (ESTC) has highlighted signi昀椀cant gaps and imbalances[32], and the ESTC itself is known to be incomplete. [ 22 ] These attributes, and the impact of them on several downstream tasks, 1In this paper the words ’book’ and ’document’ have distinct meanings. ’Book’ is used to denote an edition of a physical book, for example ’there are over 400,000 books listed in the English Short Title Catalogue’. ’Document’ by contrast, is reserved for a single text document as used for data for the classi昀椀cation method and other tasks. Not all documents in the ECCO data map to a single book, and vice-versa. have been covered in detail in previous papers30[ , 8, 14 ] and are just brie昀氀y outlined here. First, the distribution of documents in ECCO is uneven and skewed towards the end of the century and second, the OCR contains signi昀椀cant noise and errors. Additionally, not all texts are in the English language, and many are reprints of works published in earlier centuries. The former have been excluded but the latter are retained for our training and test data. Despite these caveats, ECCO is the largest and most complete source we have for eighteenth-century text data. Though it has its own institutional history and biases, it is complete enough that it contains not only the more ‘important’ or ‘literary’ genres, nor is it focused solely on canonical works. Its data and digitised images are used extensively, forming the basis of many scholarly enquiries and research questions.3[ 1 ]

3. Data Annotation

Key to the work leading up to this paper was to create a usable training set of documents annotated with genre labels. We began with a sample set of book records and a set of preliminary genre labels. These books were then labelled by two annotators with domain expertise. At this stage, we revisited the labels, and made some adjustments to those which had particularly low inter-annotator agreement. Once the set of genre labels had been 昀椀nalised, we annotated a large set (5,672 individual works, which correspond to 37,574 known editions, of which 30,119 correspond to ECCO documents) with genre information. A昀琀er this second round, we again checked for inter-annotation agreement, coming to a consensus following a discussion of each disagreement. The eventual 43 昀椀ne-grained categories were then collapsed into main categories for some of the classi昀椀cation tasks. These book labels were then mapped to the equivalent ECCO document IDs. The 昀椀nal set of labels are given in appendix A.

Existing categorical distinctions were either too broad (for example 昀椀ction and non-昀椀ction) or too 昀椀ne-grained (for example the many historical literary divisions, particularly poetic) for our needs. Our categories attempt to re昀氀ect the divisions as found in contemporary sources such as catalogues. [ 17 ] Additionally, they are closely related to the divisions used by modern domain experts writing on the history of the book, for example the chapters of the highlyregarded edited collectioBnooks and their Readers in Eighteenth-Century England, which contains chapters organised along similar divisions to our ow2n3., [ 24 ] We note that other recent attempts to categorise eighteenth century book genres use a similar system of division1.8][ The selection is intended to provide useful genre categorisation for scholarly inquiry into book history and book production. The selection was also pragmatic, with the aim of ending up with a manageable number of genres, for example so that each class had enough data for the training and test sets. They were also made with particular questions in mind, which we hoped would help us to analyse works of Scottish Enlightenment thought, for instance helping to distinguish patterns within scienti昀椀c or philosophical publishing.

4. Method

In this section, we introduce the pre-trained ECCO-BERT model, 昀椀ne-tuning models and baselines.2 We denote the training dataset as{( , )} =1, where is the book, and is the genre of . Our goal is to learn a function( ) to predict the genre for book or the genre of a chunk in book .

4.1. Multi-granular Classification with ECCO-BERT

ECCO-BERT [ 21 ] is a pre-trained language model trained on the ECCO dataset, the con昀椀guration of which is the same as the bert-base-cased model11[] except for the vocabulary size. The model is pre-trained with a masked language modelling task, as well as a next sentence prediction task. The 昀椀ne-tuned ECCO-BERT consists of two parts, one is the transformer encoder and the other is the linear layer on the top of mean pooling output of the encoder, which scores di昀erent genres. The Transformer model architecture on which the model is based can accept inputs up to a relatively short maximum length, in the ECCO-BERT case the standard maximum of 512 input tokens applies. Inputs longer than this maximum length need to be split into chunks.

Because we want the training and prediction of the model to take into account the full information of the document, a document is torn into di昀erent chunks of 510 tokens each to train the model and predict results, since the maximum input size of ECCO-BERT is 512 tokens (510 input tokens and 2 special tokens expected by the model). For training the model, we assume that each chunk has the same genre as the document, and the model is trained with the resulting (chunk, label) pairs. During the inference procedure, we 昀椀rst split the document into chunks. The 昀椀ne-tuned model then scores each chunk; the predicted genre probability of the document is the average of all chunks’ probability. The inference process is shown in Figure 1. We call this model ECCO-BERT-Chunk. For comparison, we also train a model conditioned only on the 昀椀rst 510 sub-words of the document as input, which is denoted as ECCO-BERT-Seq.

Although the ECCO-BERT-Chunk model considers all chunks to make the 昀椀nal judgment, its prediction process is very slow since a book o昀琀en contains a lot of chunks. At the same time, the much faster ECCO-BERT-Seq is only conditioned on the 昀椀rst 510 sub-words, so it might lose some important information of other parts in the book. To solve this problem, we trained 2The model implementation is available at https://github.com/HPC-HD/ECCO-genre-classi昀椀cation. The original ECCO-Bert model has been released and is available at https://huggingface.co/TurkuNLP/eccobert-base-cased-v1 a linear model by concatenating the tf-idf features of the full text with the pooling output of the 椀昀ne-tuned ECCO-BERT-Seq. The input can be denoted as [Φ ( ), Φ − ( [∶ 510])], where Φ represents the transformer encoder and the vectorizer of tf-idf. We call the model ECCO-BERT-t昀椀df, all results shows in Table1.

4.2. Baseline Models

There are two baseline models we adopt for comparison. The input of linear model is tf-idf features of the full document. The model only contains the linear layer, the fan-out of the linear model is the number of main or sub categories. The bert-base-cased is released1b1y], [ which we 昀椀ne-tuned directly with our training data.

5. Results

There are 30,119 documents annotated by experts. 6,024 documents were randomly selected and split into development and test datasets, with 3,012 documents each. The labels contain 10 main categories and 43 sub-categories. The genre labels are presentedAin.1.

5.1. Experimental Details

The sequence length of all BERT models is set to be 512. For 昀椀ne-tuning the ECCO-BERT-Seq model and bert-base-cased model, we only adopt the 昀椀rst 510 sub-words of the document as input. These models are trained for 100 epochs on 1 NVIDIA V100. ECCO-BERT-Chunk is 椀昀ne-tuned on 4 NVIDIA A100 GPUs; the main category model and the sub-category model were trained for 21 and 20 epoches respectively, using an early stop strategy.

The loss function of the linear model is cross entropy. We perform training for 200 epochs with SGD with momentum [ 28 ] and a batch size of 32. The number of tf-idf features is 500,000.

The ECCO-BERT-t昀椀df models are trained for 220 epochs with SGD with momentum. The feature extractors are the encoders of 昀椀ne-tuned ECCO-BERT-Seq and vectorizer of linear base models. In order to make the model make more use of tf-idf features, at the 昀椀rst 200 epoches, we mask the features from ECCO-BERT-Seq. The number of tf-idf features is 500,000, the dimension of features extracted from ECCO-BERT-Seq is 768.

In addition to the primary ECCO-BERT model, we also trained the Perceiver IO mod9e]l [ on the same data as the BERT models. Perceiver is a Transformer model that decouples input size from overall model size and allows the model to scale linearly with the size of the input as well as model depth. Perceiver IO generalizes Perceiver further by allowing for arbitrary outputs. Due to their linear scaling characteristics, the Perceiver models make it practical to use character-level input data which could result in a model that is more robust against characterlevel OCR artefacts in the ECCO dataset. Testing this property is our main motivation for using Perceiver IO on this task. We pre-trained Perceiver on the ECCO data for 1 million steps with an e昀ective batch size of 768. Training is done similarly to ECCO-BERT, except that the next sentence prediction task is not used. Fine-tuning for the genre classi昀椀cation task is also similar to the BERT models, except that un昀椀ltered, byte-level data is used as model inputs.

5.2. Genre Model Performance

We report the models’ accuracy for main categories and sub-categories in Ta1b.leThe confusion matrix of ECCO-BERT-t昀椀df is shown in Figure2. There is a signi昀椀cant gap between 椀昀ne-tuned bert-base-cased model and other models based on ECCO-BERT, since the bert-basecased model is pre-trained on modern language corpus, was not exposed to OCR noise during pre-training, and the language has naturally evolved between 18th century and present-day English. Although ECCO-BERT-Seq is only conditioned on the 昀椀rst 510 tokens of the document, its results are also competitive compared to ECCO-BERT-Chunk and ECCO-BERT-t昀椀df which consider the full document. As shown in Tab1le,ECCO-BERT-t昀椀df performs best since it combines the transformer feature and t昀椀df of the full document. ECCO-BERT-t昀椀df is also much faster than ECCO-BERT-Chunk because extracting t昀椀df is much faster than inference of transformer models.

Of particular note is the performance of all ECCO-BERT models over base BERT and the linear model, when looking at the more 昀椀ne-grained categories. Somewhat disappointingly, the 昀椀ne-tuned Perceiver IO models do not perform better than BERT-based models on this task in our evaluation. This would indicate that the OCR noise does not interfere with the genre detection task enough to degrade the performance of BERT-based models.

5.3. Document-level Evaluation and Prediction results

Here we report on both the evaluation of the document-level results for the main categories. The confusion matrix in2 shows that the precision of the literature category is the highest while education is the lowest. We also use the ECCO-BERT-t昀椀df model to predict unlabeled ECCO data and obtain model-predicted genre distributions. There are 177,494 unlabeled documents in total. The breakdown of predicted categories are shown in Figu3r.eAs our label taxonomy is custom-made, there is no ground truth for the entirety of ECCO to fully evaluate the accuracy of the predictions. However the predictions roughly match up with our expectations: previous analyses of the ESTC, using the existing Dewey Decimal System labels, have found that the most common subject category is religion.4][ Sales Catalogues Philosophy 2.25%

Arts

6. Fine-grained analysis with ECCO-BERT-Seq 6.1. Sequential Genre Change

As well as using the ECCO-BERT-Seq to generate document-level predictions using average values, we can use the individual chunk predictions directly. Here we propose a method to use this paragraph-level detection to detect chunks within documents where the change from one genre to another is signi昀椀cant and sustained. Because the predicted genre generally oscillates signi昀椀cantly from one individual chunk to the next, we needed a method to capture only sustained changes, ignoring shorter breaks within a ’run’ of the same genre. To do this, we used the Kleinberg algorithm for detecting ’bursts’ of activity in time-series da1ta2.] [This uses a hidden Markov process to probabilistically determine when a subsequent event will occur. When events occur more rapidly and for sustained periods in comparison to this determination, these are labelled bursts. The detection of the bursts were computed using R bursts package [ 3 ], which implements the Kleinberg algorithm.

To adapt this method, the most probable prediction for each chunk within each document was treated as a time series data point for Kleinberg. We have calculated sections for main and subcategories separately. The method allows for ’fuzzy’ and overlapping sections of genres. Additionally we have experimented with only retaining highly-probable classi昀椀cations which helped to further 昀椀lter out noise. There are drawbacks: because the burst method looks for change rather than simply all clusters of events, currently not all sections are detected if most of the text is of a single genre.

To give some examples, we take some exemplary texts and calculate genre bursts. To visualise the changes in genre, top genre predictions (over .5 probability) are charted as a scatterplot in the paragraph sequence, coloured by genre. Burst start and end points are overlaid as coloured areas. As the method looks for periods of change rather than absolute values, it ignores the main category of the book (which is detected by the document-level method successfully anyway) and in most cases highlights sustained excerpts where the detected genre is di昀erent to the dominant one. Here, we see that David Hume’sPolitical Discourses (Figure4, A) contains discrete sections on economics (categorised as scienti昀椀c improvement), philosophy (a section on the balance of power), history (a section on ’ancient nations’) and 昀椀nally law (a chapter on the idea of the commonwealth)W.ealth of Nations (Figure4, B) begins with a section on labour and society categorised here as philosophy and smaller sections on law (a discussion on a speci昀椀c statute), and in the education genre. Most of the book is not classi昀椀ed as its dominant genre (economics and trade, under the higher-level category scienti昀椀c improvement) as it does not involve change. VillierM’siscellaneous Works (Figure4, C) detects a large number of overlapping genre changes. FinallyR,obinson Crusoe (Figure 4, D) is also mostly without detected bursts, but of note is a section of religious genre, corresponding to a section in the plot where Crusoe is ill and has prophetic dreams.

A1.0 0.9 0.8 0.7 0.6 0.5

7. Discussion and Conclusion

In this paper we aimed to describe the process to detect sections of fuzzy and overlapping genre excerpts within individual editions. The results show that at the level of 昀椀ne-grained divisions (43 subcategories), a model which combines the t昀椀df feature of the full document and the features of a 昀椀ne-tuned ECCO-BERT model performs signi昀椀cantly better than baselines, suggesting they may be particularly useful for such tasks. That the BERT model performed so well on 昀椀ne-grained categories is signi昀椀cant because existing methods to look at genre have generally used very broad divisions (such as 昀椀ction and non-昀椀ction). The kinds of questions we are interested in use more 昀椀ne-grained categories, for example looking at the rise of medical textbooks in certain publishers. This kind of sequencing also has other potential uses, for example document retrieval. On the present task, we did not observe any improvement o昀ered by the Perceiver model, which we speci昀椀cally included to test a character-level model which is capable of accounting for OCR artefacts. At present, we think this is due to a combination of two factors: Firstly, the base performance on the task is around 95% accuracy, leaving only very little headroom for improvement with more advanced models. And secondly, the task is by its nature a document-level task and the good performance of the linear baseline demonstrates that enough information is present in the data even without explicitly accounting for OCR errors. It is therefore possible that the advantages of character-based models such as the Perceiver will be demonstrated on tasks where the correct modelling of individual word occurrences in their context plays a more signi昀椀cant role. These would include various text tagging and information retrieval tasks.

In our future work we hope to further develop the sequencing method, and investigate the genres in their own right, for instance looking at the sequence patterns of individual authors, the relationship between intra-book diversity and the success of particular authors or publishers, and understanding co-occurrence between genres. [20]

M. Poovey. “Mary Wollstonecra昀琀: The Gender of Genres in Late Eighteenth-Century England”. In:NOVEL: A Forum on Fiction 15.2 (1982), pp. 111–126. url: http://www.jsto r.org/stable/134521.9

A. Appendix A.1. The main categories and sub-categories Main categories Arts Education History Law Literature Philosophy Politics Religion Sales Catalogues

[1]

Baptiste ,

Favre ,

Auguste , and

Henriot . “ Transferring Modern Named Entity Recognition to the Historical Domain: How to Take the Step?” InW: orkshop on Natural Language Processing for Digital Humanities ( NLP4DH ). 2021 .

[2]

B. M.

Benedict . “ The Paradox of the Anthology: Collecting and Di昀érence in EighteenthCentury Britain” . In:New Literary History 34.2 ( 2003 ), pp. 231 - 256 . url: http://www.jst or. org/stable/2005777 .8

[3]

Binder .bursts: Markov Model for Bursty Behavior in Streams . 2022 . url: https://CRAN.Rproject.org/package=burst.s

[4]

Feather . “ British Publishing in the Eighteenth Century: a preliminary subject analysis” . In: The Library s6-VIII.1 ( 1986 ), pp. 32 - 46 . doi: 10 .1093/library/s6-VIII. 1 .3.2url: https: //doi.org/10.1093/library/s6 -VIII.1.3.2

[5] Gale .Eighteenth Century Collections Online. url: https://www.gale.com/intl/primary -so urces/eighteenth-century-collections-onli .ne

[6]

Goyal and

V. Prem

Prakash . “ Statistical and Deep Learning Approaches for Literary Genre Classi昀椀cation” . In: Advances in Data and Information Sciences. Ed . by

Tiwari ,

M. C.

Trivedi ,

M. L.

Kolhe ,

Mishra , and

B. K.

Singh . Vol. 318 . Singapore: Springer Singapore, 2022 , pp. 297 - 305 . doi: 10 .1007/ 978 -981-16-5689-7\_26. url: https://link.spri nger. com/10 .1007/ 978 -981-16-5689-7%5C% 5F26 .

[7]

Gupta ,

Agarwal , and

Jain . “ Automated Genre Classi昀椀cation of Books Using Machine Learning and Natural Language Processing”. In2:019 9th International Conference on Cloud Computing , Data Science & Engineering (Con昀氀uence) . Noida , India: Ieee, 2019 , pp. 269 - 272 . doi: 10 .1109/confluence. 2019 . 8776935 . url: https://ieeexplore.ieee.org/doc ument/8776935/.

[8] [9] [10]

M. J.

Hill and

Hengchen . “ Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study” . InD:igital Scholarship in the Humanities 34.4 ( 2019 ), pp. 825 - 843 . doi: 10 .1093/llc/fqz024. url: https://academic.o up.com/dsh/article/34/4/825/547612.2

Jaegle ,

Borgeaud , J.-B. Alayrac , C.

Doersch , C.

Ionescu , D.

Ding , S.

Koppula , D.

Zoran , A.

Brock , E.

Shelhamer , O.

Héna昀 , M. M. Botvinick , A.

Zisserman , O.

Vinyals , and J. CarreiraP.erceiver IO : A General Architecture for Structured Inputs & Outputs. 2021 .

doi: 10 .48550/arxiv.2107.14795. url: https://arxiv.org/abs/2107.1479 5.

Jiang ,

Hu , G. Worthey,

R. C.

Dubnicek ,

Underwood , and

J. S.

Downie . “ Impact of OCR Quality on BERT Embeddings in the Domain Classi昀椀cation of Book Excerpts .” In: Chr. 2021 , pp. 266 - 279 .

[11] J. D. M.-W. C. Kenton and L. K. Toutanova . “Bert: Pre-training of deep bidirectional transformers for language understanding” . IPnr:oceedings of naacL-HLT . 2019 , pp. 4171 - 4186 .

[12]

Kleinberg . “ Bursty and Hierarchical Structure in Streams” . DInat:a Mining and Knowledge Discovery 7 .4 ( 2003 ), pp. 373 - 397 . doi: 10 .1023/a:1024940629314. url: https://doi.or g/10.1023/A: 1024940629314 .

[13]

Labusch ,

Kulturbesitz ,

Neudecker , and

Zellhöfer . “ BERT for named entity recognition in contemporary and historical German” . PInro:ceedings of the 15th conference on natural language processing . 2019 , pp. 9 - 11 .

[14]

Lahti , E. Mäkelä, and

Tolonen . “ Quantifying Bias and Uncertainty in Historical Data Collections with Probabilistic Programming” . In: ( 2020 ). urhltt:ps://helda.helsink i.fi/handle/10138/327728.

[15]

Lahti ,

Marjanen ,

Roivainen , and

Tolonen . “ Bibliographic Data Science and the History of the Book (c . 1500 -1800) ” . In:Cataloging & Classi昀椀cation Quarterly 57.1 ( 2019 ), pp. 5 - 23 . doi: 10 .1080/01639374. 2018 . 1543747 . url: https://doi.org/10.1080/01639374.20 18.1543747.

[16]

Manjavacas and

Fonteyn . “ Adapting vs. Pre-training Language Models for Historical Languages” . In: Journal of Data Mining & Digital Humanities Nlp4dh ( 2022 ). doi: 10 .462 98/jdmdh.9152. url: https://jdmdh.episciences.org/9690.

[17]

Manson . A catalogue of the entire and genuine library and prints of Robert Salusbury Gotton , Esq. F.A.S. [ electronic resource] : Comprehending an extensive and valuable collection of books of coins, medals and antiquities, with a few 昀椀nk missals and other manuscripts on vellum, which, with some other select parcels of books lately purchased, are now on sale for ready money, at the price printed in the catalogue, and on the 昀椀rst leaf of each-book, By John Manson, bookseller, No 5, Duke's-Court, St. Martin's-Lane, where catalogues (Price 6d) may be had . [London , 1789 , [2], 102 pages.

Mazella ,

Willan ,

Bishop ,

Stravoski ,

Barta , and

James . “ “All the modes of story”: Genre and the Gendering of Authorship in the Year 1771” . IAnB:O: Interactive Journal for Women in the Arts , 1640 - 1830 12. 1 ( 2022 ). doi: http://doi.org/10.5038/ 2157 - 71 29.12.1.1256. url: https://digitalcommons.usf.edu/abo/vol12/iss1/1.0 [19]

Moretti .Distant reading. London ; New York: Verso, 2013 .

[21]

Rastas ,

Y. C.

Ryan ,

I. L. I.

Tiihonen ,

Qaraei ,

Repo ,

Babbar , E. Mäkelä,

Tolonen , and

Ginter . “ Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model” . In:Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change. The Association for Computational Linguistics . 2022 .

[22]

Raven . The business of books: booksellers and the English book trade , 1450 - 1850 . New Haven: Yale University Press, 2007 .

[23] I. Rivers, ed. Books and their readers in eighteenth century England . Leicester: Leicester Univ. Press [u.a.], 1982 .

[24] I. Rivers, ed.Books and their readers in eighteenth-century England: new essays . London New York: Leicester University Press, 2001 .

[25]

Schmidt ,

Dennerlein , and C. Wol昀. “ Emotion Classi昀椀cation in German Plays with Transformer-based Language Models Pretrained on Historical and Contemporary Language” . In: Association for Computational Linguistics . 2021 .

[26]

Schmidt ,

Dennerlein , and C. Wol昀. “ Using Deep Learning for Emotion Analysis of 18th and 19th Century German Plays” . In: ( 2021 ).

[27]

Schweter and

März . “ Triple E-E昀ective Ensembling of Embeddings and Language Models for NER of Historical German .” ICnL:EF (Working notes) . 2020 .

[28]

Sutskever ,

Martens , G. Dahl, and

Hinton . “ On the importance of initialization and momentum in deep learning” . InI:nternational conference on machine learning. Pmlr . 2013 , pp. 1139 - 1147 .

[29]

Todorov and G. Colavizza. “ An Assessment of the Impact of OCR Noise on Language Models” . In:arXiv preprint arXiv:2202.00470 ( 2022 ).

[30] [31] [32]

Tolonen ,

Mäkelä ,

Ijaz , and

Lahti . “ Corpus Linguistics and Eighteenth Century Collections Online (ECCO)” . InR: esearch in Corpus Linguistics 9.1 ( 2021 ), pp. 19 - 34 . doi: 10 .32714/ricl. 09.01.0 3 . url: https://ricl.aelinco.es/index.php/ricl/article/view/.161

Tolonen ,

Mäkelä ,

Ijaz , and

Tolonen , E. Mäkelä, and

Lahti . “ The Anatomy Of Eighteenth Century Collections Online (Ecco)” . In:Eighteenth-century studies 56.1 ( 2022 ), pp. 95 - 123 .

[33]

Underwood . Distant horizons: digital evidence and literary change . Chicago: The University of Chicago Press, 2019 .

[34]

Underwood . “ Genre Theory and Historicism” . InJo:urnal of Cultural Analytics 2.2 ( 2016 ). doi: 10 .22148/16.008. url: https://culturalanalytics.org/article/110.63

[35]

Underwood . “ The Life Cycles of Genres” . InJ:ournal of Cultural Analytics 2.2 ( 2016 ). doi: 10 .22148/16.005. url: https://culturalanalytics.org/article/110.61

[36]

Underwood . “ Understanding Genre in a Collection of a Million Volumes, Interim Report” . In: ( 2014 ). doi: 10 .6084/m9.figshare. 1281251 .v1. url: https://figshare.com/article s/ journal%5C%5Fcontribution/Understanding%5C%5FGenre%5C%5Fin%5C%5Fa%5C%5 FCollection%5C%5Fof%5C%5Fa%5C%5FMillion%5C%5FVolumes%5C%5FInterim%5C%5 FReport/1281251.

[37]

Underwood ,

M. L.

Black ,

Auvil , and B. CapitanuM. apping Mutable Genres in Structurally Complex Volumes . 2013 . doi: 10 .1109/BigData. 2013 . 6691676 . url: http://arxiv.org /abs/1309.3323.

[38]

Worsham and

Kalita . “ Genre Identi昀椀cation and the Compositional E昀ect of Genre in Literature” . In:Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe , New Mexico, USA: Association for Computational Linguistics, 2018 , pp. 1963 - 1973 . url: https://aclanthology.org/C18-116.7 [39]

Yoo ,

Jin ,

Son ,

Bak ,

Cho , and

Oh . “HUE: Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea” . InF:indings of the Association for Computational Linguistics: NAACL 2022 . Seattle, United States: Association for Computational Linguistics, 2022 , pp. 1832 - 1844 . url:https://aclanthology.org/ 2022 .findings-n aacl. 140 .