Probabilistic Analysis of Early Modern British Book Prices Iiro Tiihonen1,2 , Mikko Tolonen1 and Leo Lahti2 1 Faculty of Humanities, University of Helsinki, Finland 2 Faculty of Technology, University of Turku, Finland Abstract Books are a valuable exception to the general rule that quantitative information about early modern history is scarce, as their survival rate during the period has varied between low and high tens of percents, and descriptive information summarizing their properties has been collected to library catalogues. However, one critical element that is essential for the numeric characterisation of a print product is most often missing - its price. In this paper, we use an exceptionally large data set of price information extracted from the English Short Title Catalogue (ESTC) for the early modern period to train a probabilistic model that predicts the price of a print product based on its physical properties. Our results suggest that just the simple physical properties of the print products can explain a significant proportion of the variation in prices. We use the model to quantitatively address the debated question about development of print product prices in eighteenth century Britain. We interpret the predictions of the model as a data driven narrative, and many of the developments it brings up can be readily linked with the relevant historical literature. Keywords bibliographic data science, book history, book prices, statistical modeling, early modern period 1. Introduction Bibliographic data gives insight to various historical phenomena from vernacularisation and political crises to the effect of books themselves to early modern society. As bibliographic data is transformed into machine readable form, the application of modern data science becomes possible [10, 15, 8]. Price is recognised as an important aspect of bibliographical information in various branches of historical and social scientific literature, as it can be used to approximate the accessibility of information and culture by wealth and functioning of the book trade [5, 13, 14, 12, 3, 1]. As questions about unequal access to information and culture and the way (book) trade worked are relevant for the understanding of early modern societies, price of a print product is a valuable piece of information. Unfortunately it is often not available, and even analyses that make use of relatively large bibliographical data sets become much more limited when the price is concerned [6]. Our starting point is one of the historical debates for which the question of book prices is essential. The core of this debate is about the effect of the 1774 affirmation of the Statute of Anne that transformed copyrights of books from perpetual ownership to last 14 years in Great Britain. By applying a quantitative model that describes book price formation, we could CHR 2021: Computational Humanities Research Conference, November 17–19, 2021, Amsterdam, The Netherlands £ iiro.tiihonen@helsinki.fi (I. Tiihonen); mikko.tolonen@helsinki.fi (M. Tolonen); leo.lahti@utu.fi (L. Lahti) DZ 0000-0003-0703-4556 (I. Tiihonen); 0000-0003-2892-8911 (M. Tolonen); 0000-0001-5537-637X (L. Lahti) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) 39 untangle the variation related basic physical attributes, and obtained results that suggest that this decrease of prices did indeed happen. The model itself is the most important result of the paper, as expanding on it opens the possibility for further historical research and extrapolation of data. There have been attempts to establish overviews about book prices and their variation in different periods and regions of early modern Europe [5, 13, 3]. Economists have been at the forefront of formal quantitative work on early modern book prices, and have used mathematical models for purposes like derivation of general level time series [4, 16] and hypothesis testing regarding the impact of specific variables of the model like indicators of competitive business environment and logistic costs [6]. We use our model for hypothesis testing as well. However, we also stress that our model has more general value as a tool for extrapolation and further hypothesis testing. We argue that carefully designed quantitative models open up a much wider array of possibilities for a detailed systematic characterization of the factors underlying the observed price variation [9]. Moreover, noteworthy patterns in price formation can be also detected by identifying systematic patterns that cannot be explained by the quantitative model. This article describes and implements a model that aims to quantify the influence of physical attributes on the observed price variation. These factors already give significant insight to the formation of price for the majority of print products, and validation in leave-out data indicates that our model can also predict price variation for new documents with a relatively good accuracy. The data set we use is based on the English Short Title Catalogue (ESTC), an Union Catalogue of early modern print products of British Isles and North America comprised of the metadata information of various libraries, most notably the British National library. The tabular version used in this paper is the result of long parsing, harmonisation and enrichment process of the Helsinki Computational History Group (COMHIS). Critically, this version of the ESTC includes price information for roughly 30000 records as well. Of these, 28274 records with a parsed price were deemed reliable enough in their information for modeling.1 The data collection that we use is an order of magnitude larger than most data sets on early modern book prices. A recent study used a data set of 3000 purchase prices for books from the entirety of reformation era Europe, and characterised it as an unique data for the study of book markets [6]. Another study considers an inventory of a Venetian book shop with 12 000 entries “one of the most extensive and significant sources for the study of book prices in early modern Europe.” [1]. Our data set provides exceptionally good coverage for eighteenth century Great Britain and additionally includes remarkably rich background data on the documents’ dimensions, genres, and other properties [8]. In addition, British economic history and related background information is better known than perhaps that of any other European country, allowing us to normalise the prices with respect to price index. 1 We removed categories of advertisements, pilot guides and prospectus as the prices mentioned in these records was most often for another product. We also removed periodicals and many kinds of public records, as multiple issues and parts are often collected together in the ESTC, but it was often unclear if the price was for a single issue or part only. We also removed cases where we found that the harmonised ESTC’s information for relevant variables like page count or plates was wrong. When found, erroneous prices were removed or corrected if possible. 40 Figure 1: The observations used in this article, 28251 in total, by publication year. Most of the price information is from the eighteenth century. 2. Materials and Methods 2.1. Data and Preprocessing The price information within the ESTC is scattered across the general notes field 500a of this MARC-annotated catalogue. This means that the price information had to be parsed amidst natural language. Roughly 6 percent of the half million ESTC records have a parsed price. In some cases, multiple prices are provided for different bindings and discounts for those that brought multiple copies, and sometimes the word price appears in other contexts than the price of the record. These more complicated cases sometimes led to errors in the extraction of price. The most suspicious prices (e.g very high) were manually checked and corrected if necessary. Unfortunately there is no information on why the note about price was made for these specific records, making it hard to evaluate how representative our data set is. Prior to modeling, we made choices of data processing and selection to focus our analysis. We selected smallest price, adjusted it to the general price level and focused on the years 1680- 1800. The choice of the smallest price might matter, as variants with a greater price might follow a different logic of pricing. To adjust prices to the overall price level of the publishing year and to be able to use the same price for normal linear and Poisson models, we regularised 41 each book price with the following formula: yij ∗ yij = int( ) (1) pj Where yij is the regularised price of the ith print product published in the year j, yij ∗ the original price in decimals of pence, pj is the price index of London for the year j from the Prices and Wages in London and Southern England, 1259-1914 data set2 and the operator int rounds the obtained value to an integer. We limited our modeling to the observations from the period 1680-1800 as data prior to that is very scarce, our main interest is the eighteenth century and the exchange rate of our main monetary units (penny, shilling, pound) stayed stable during the selected period [7]. This selection dropped the number of observations to 28251. 2.2. Quantitative Model Existing literature made it reasonable to assume that the amount of paper and other materials used in a print product would have connection to its price [11, 14]. As the harmonised ESTC data collection also has estimates of these attributes, they made for a good starting points for modeling prices. The paper and plate size of the print product are both fixed effects in our model. They affect the price in a linear fashion: an increase of a fixed size (e.g addition of a sheet of paper) to a print product always adds the same amount of price to it. Linearity is both convenient to model and roughly corresponds with the notion that materials formed a significant part of the production costs of books in the early modern period. Another question to address was that we had multiple reasons to assume that this general trend would not hold at all for some works. The phenomenon of luxury printing [13] is known in the literature of book history. The high price of a luxury item might not be strongly related to its basic physical properties, but to prestige or some other quality hard to measure. The very highly priced print products were in general very difficult for our model. Thus, our attempt was to capture a general association of price and size, but we are aware that it is prone to large errors. Additionally, both the price and the predictor variables include instances of false information, that can lead to severe errors. These considerations motivated us to use a Bayesian regression model, where the error term is Student’s t-distributed. As this kind of model has a wider tail than the normal distribution usually used in regression models, it is less affected by deviations from the general trend. The posterior predictions of a Bayesian model are also more robust, as they include the uncertainty in the parameters. Our model is defined as follows: Let Y be the price of a print product, x the vector of constant 1 and the associated features of physical properties (size in terms of paper and plates, paper and plate pages are assumed to be of the same size for the same observation) and β = (β0 , β1 , β2 ) the vector of the effects of the physical features and of the constant. Now, our regression model (note the ’vectorised’ notation) is defined as Y |β, σ, X ∼ T (2.02, xT β, σ) (2) β ∼ T (1, 0, 1) (3) σ ∼ IG(1, 1). (4) 2 Availabe at https://gpih.ucdavis.edu/Datafilelist.htm 42 Figure 2: Predicted price (mean of the posterior predictive distribution) and residuals divided with the standard deviation of the test data. The visualisations were done with different X and Y axis to demonstrate the importance of variation by price and the effect of outliers. 814 Observations of the total test data are missing from the top panel and 243 from the middle panel. Where T marks the Student’s t-distribution and IG inverse gamma distribution. The model fitting was implemented with STAN. The paper and plate variables were standardised prior to fitting. The posterior distribution of parameters and evaluation metrics of predictions are sum- marised in table3 1. Standard deviation’s change in the amount of paper has a larger effect than standard deviations change in plates. However, as majority of observations do not have plates, a standard deviations increase in plates should not be directly compared with a stan- dard deviations increase in paper, as later corresponds to a larger physical increase. As plate estimates are also more prone to large errors than paper estimates, the posterior estimate for the effect of plates is less reliable. The data was split into 50% training and 50% test sets. 43 Table 1 Posterior Distribution of Parameters and Model Performance Evaluation Metrics. Parameter 1st Quantile Median 3rd Quantile R2 EMAE 0.46 0.4 β0 (constant) 38 38 38 β1 (paper) 56 56 56 β2 (plate) 11 11 12 σ 9.4 9.4 9.5 3. Results We applied the model to evaluate the price trends of early modern Great Britain. These predictions are illustrated in figure 3, which plots the residuals of our model by year and publication place for roughly averaged sized print products. Here the residual of an observation is defined as ϵi = Yi − Yˆi . (5) Where Yi is the real and Ŷ the predicted price of a print product. If the residuals of certain period differ significantly from 0, we can get insight to potential temporal trends of prices without modeling them explicitly. Negative residuals indicate that the model overpredicts prices and vise versa for positive residuals. As the residuals increase as a function of the print products size, we limited the analysis to print products that were of average (half a standard deviation from zero on standardised scale) size in terms of paper and plates. In the case of our model, we can say that variation in residuals by time should not be explained by (possible) associated changes in the physical composition of print products. We focused on the median and middle half of residuals to decrease the influence of outliers. Our main question to address was the disputed effect of the affirmation of the Statute of Anne in 1774 that transformed copyrights of books from perpetual to the length of 14 years in Great Britain. Some have seen this as a revolutionary turning point that affected the prices and hence the accessibility of print products and their contents significantly [2], while others have challenged the impact of legislation to prices and the related claim about over priced printing prior to 1774 [14, pp. 25–30]. The larger context for which the question is relevant is how we understand eighteenth century British history with respect to access to culture and information: did the changes in legislation create a phenomenon of greater access to books? Regarding this question, the national character of the ESTC (it is overwhelmingly made of British print products) is ideal. Additionally, the question is important for understanding argued differences in printing (price and quality) between Britain and continental Europe. The first decades of the eighteenth century at figure 3 could be interpreted as an increase of prices until 1730’s. after which a long relatively stationary period follows. This increase is interesting, as Licensing of the Press Act lapsed in 1695. This lapse created a situation where the monopoly of the Stationer’s Company - the guild responsible for printing - ceased. Our time series does not reflect this change that arguably opened the way for cheaper competing printing.4 3 EMAE=Explained Mean Absolute Error. Defined as the proportion of the Mean Absolute Error explained by the model compared to the Mean Absolute Error of always predicting the median of the test data. 4 This absence of any noticeable effect caused by the change of legislation might be explained by the possi- bilities for cheap competition being exploited in other places than London. 44 Figure 3: The median residuals of the test data for roughly average sized (standardised plates and paper ± 0.5 SD from Zero) print products 1680-1800 in London. Confidence Interval from 1st quantile of residuals to the 3rd quantile. Standardised by dividing with the standard deviation of the test data which was used to make this figure. The increase is followed by stagnation that lasts until the last decade of the eighteenth century. It is interesting considering the claims about the structure of the British book trade of the period that make the affirmation of the Statute of Anne a noteworthy event. The change that the Statute arguably caused was to break the monopolistic arrangement of book trade centered around control of valuable copyrights that gave those holding them control over the price. This control meant that prices would (mostly) not fall it the copyright was not free from copyright restrictions [13]. The stagnation could be interpreted as stability of high prices made possible by the power that publishers (copyright holders) had over the prices. This stable state is followed by a significant decrease in the 1790’s and the first year of the nineteenth century. The decrease would align well with views that claim that monopolies and artificially high prices started crumbling at the end of the century. However, the change in residuals does not align perfectly with the change in legislation. One possible explanation is that the effect of the change in legislation increased in time, as larger quantities of print products became free from copyright. Although we can not argue that we observed the causal workings of copyright monopoly and its collapse, we can say that if that narrative was true, the variation we observed would make sense. 45 Our initial results differ from the results of earlier quantitative approaches. One index of the book purchasing power of eighteenth century’s average Britton suggests that books became significantly harder to acquire for the average person during the later half of the eighteenth century [13]. This result is contrary to ours, and Gregory Clark’s Productivity in Book Production in England, 1470–1860 index has a pattern of late eighteenth century decrease in the ratio of wages to book prices as well [3]. The models performance was evaluated both for the sake of the specific historical research case and to see how much potential it has as a more general tool of quantitative historical research. Figure 2 demonstrates that the model captures the trend of the majority of works quite well, but struggles significantly with some cases, overpredicting or underpredicting sig- nificantly. The model explains 46 percent of the variation in terms of R2 and 40 percent of the Mean Absolute Error compared to the median of the test data. Later metric is more reliable, as R2 is more heavily affected by outliers, some of which were removed because of unreliable paper or plate information in the harmonised ESTC. As is it is most likely that the data still contains erroneous information, we argue that our robust model captures an association of physical properties and price, and the strength of this association might be stronger than indi- cated by the evaluation metrics. At the same time, some outliers are most likely the products of different logic of pricing, e.g luxury products. 4. Conclusions The model succeeded in the task that motivated its creation, as it provided initial insight to the question of eighteenth century book price trends in Great Britain. The disputed decrease of prices was present in our analysis. Interestingly, our results differed from some of the earlier time series that approximate the relative prices of print products. A more thorough comparison and standardisation of these different quantitative approaches - including ours - could help the discourse about eighteenth century book prices to converge. Meaningful application of the model in historical research requires evaluation of its perfor- mance. Our evaluation suggests that the model has significant predictive and explanatory power. This encourages us to develop the model further. The incorporation of spatiotemporal effects and addition of publisher and edition (first or re-publication) information could im- prove the models performance and usefulness significantly. It is also important to remember that there were significant outliers, and these might represent separate domain(s) of pricing. Detection and analysis of these cases in a quantitative manner is a possible topic for further research. The most definite result is the proof of concept. We have demonstrated the usefulness of our model both in terms of technical measures and as a tool of which predictions can be interpreted as a historical narrative. There are various other applications for the model as well. For example, it can be used for extrapolating prices for the much larger set of works in the ESTC that do not have the price information but do have the features used in the prediction. Hopefully both the technical development and the application of the model will continue in the future. Acknowledgments We thank the Participants of Academy of Finland project “Rise of commercial society and 46 eighteenth-century publishing” (333716) -seminar, Richard Sher and Helsinki Computational History Group members for discussions and previous work that made this article possible. We thank the reviewers for comments that helped us to improve this article. We thank AI Academy, University of Turku, for funding the work on this article. References [1] F. Ammannati and A. Nuovo. “Investigating Book Prices in Early Modern Europe: Ques- tions and Sources”. In: JLIS.it 8.3 (2017). [2] W. S. Clair. “The Political Economy of Reading”. In: The John Coffin Memorial Lecture in the History of the Book. 2005. url: https://ies.sas.ac.uk/sites/default/files/files/ Publications/StClair%5C%5FPolEcReading%5C%5F2012.pdf. [3] G. Clark. A Farewell to Alms: A Brief Economic History of the World. Princeton Uni- versity Press, 2007. [4] G. Clark. “Lifestyles of the Rich and Famous: Living Costs of the Rich versus the Poor in England, 1209-1869”. In: Towards a global history of prices and wages. 2004. url: http://www.iisg.nl/hpw/papers/clark.pdf. [5] J. Dittmar. “Book Prices in Early Modern Europe: an Economic Perspective”. In: Buying and Selling: The Business of Books in Early Modern Europe. Ed. by S. Graheli. Brill, 2019, pp. 72–87. [6] J. Dittmar and S. Seabold. “New Media and Competition: Printing and Europe’s Trans- formation After Gutenberg ”. In: Conditionally Accepted to the Journal of Political Econ- omy (2021). [7] C. Emsley, T. Hitchcock, and R. Shoemaker. London History - Currency, Coinage and the Cost of Living. 2021. url: www.oldbaileyonline.org,. [8] L. Lahti, N. Ilomäki, and M. Tolonen. “A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470-1800”. In: LIBER Quarterly 25.2 (2015), pp. 87– 116. doi: http://doi.org/10.18352/lq.10112. [9] L. Lahti, E. Mäkelä, and M. Tolonen. “Quantifying Bias and Uncertainty in Historical Data Collections With Probabilistic Programming”. In: Proc. CEUR Workshop Pro- ceedings on Computational Humanities Research. 2020, pp. 80–89. url: http://ceur- ws.org/Vol-2723/short46.pdf. [10] L. Lahti, J. Marjanen, H. Roivanen, and M. Tolonen. “Bibliographic Data Science and the History of the Book (c. 1500–1800)”. In: Cataloging & Classification Quarterly 57.1 (2019), pp. 5–23. doi: 10.1080/01639374.2018.1543747. [11] D. McKenzie. “Printing and Publishing 1557–1700: Constraints on the London Book Trades”. In: The Cambridge History of the Book in Britain. Ed. by J. Barnard and D. McKenzie. Vol. 4. The Cambridge History of the Book in Britain. Cambridge University Press, 2002, pp. 553–567. doi: 10.1017/chol9780521661829.028. [12] S. Pinker. The Better Angels of Our Nature: Why Violence Has Declined. Viking Books, 2011. 47 [13] J. Raven. “Book as a Commodity”. In: The Cambridge History of the Book in Britain. Ed. by M. Suarez and M. Turner. Vol. 5. The Cambridge History of the Book in Britain. Cambridge University Press, 2009, pp. 83–117. doi: :10.1017/chol9780521810173.005. [14] R. Sher. The Enlightenment and the Book : Scottish Authors and Their Publishers in Eighteenth-Century Britain, Ireland, and America. The University of Chicago Press, 2006. [15] M. Tolonen, L. Lahti, H. Roivainen, and J. Marjanen. “A Quantitative Approach to Book-Printing in Sweden and Finland, 1640–1828”. In: Historical Methods: A Journal of Quantitative and Interdisciplinary History 52.1 (2019), pp. 57–78. doi: 10 . 1080 / 01615440.2018.1526657. [16] J. van Zanden. The Long Road to the Industrial Revolution : The European Economy in a Global Perspective, 1000-1800. Brill, 2009. A. Computational Reproducibility The script used to produce the analyses in this article is available at: https://github.com/COMHIS/article_2021_probabilistic_analysis_of_early_modern_british_ book_prices. 48