=Paper=
{{Paper
|id=Vol-2723/short21
|storemode=property
|title=Networking Archives: Quantitative History and the Contingent
Archive
|pdfUrl=https://ceur-ws.org/Vol-2723/short21.pdf
|volume=Vol-2723
|authors=Yann Ryan,Sebastian Ahnert,Ruth Ahnert
|dblpUrl=https://dblp.org/rec/conf/chr/RyanAA20
}}
==Networking Archives: Quantitative History and the Contingent
Archive==
Networking Archives: Quantitative History and the Contingent Archive Yann Ryana , Sebastian Ahnertb and Ruth Ahnerta a Queen Mary, University of London, Mile End Road, London E1 6NS b University of Cambridge, Cambridge CB2 1TN Abstract Recent years have seen a growth in the use of network analysis on large datasets of correspondence, but studies of the epistemological basis for findings have not seen a commensurate increase. The latter are important because although large, these datasets can only ever represent a fraction of the total available correspondence, and most historically contingent letter archives have significant amounts of missing or uncertain data or records. This paper outlines three approaches to the study of missing network data: first, we suggest some strategies for dealing with missing data, beginning with understanding in detail the type and extent of missing data, second, we outline a method for understanding the effect that missing data has specifically on historical letter archives, which compares rank correlations of metrics between the full network and progressively smaller random sub-samples. The experiments show that the most basic metric of network structure, degree, is remarkably robust to random letter removal even when large samples of letters have been removed. Last, the paper argues that the combinatory effect of joined-up letter networks can be used to further the understanding of the structure of seventeenth-century letter networks and intellectual exchange. Keywords network analysis, archival history, missing data, state paper office, seventeenth century 1. Introduction ‘Archives are the … laboratories of the historian’ [19, p. 9]. Alexandra Walsham’s metaphor opens her history of early modern archives, gesturing to the way that archives both supply and shape the narratives that historians have written. Archives have become laboratories in another way too, rendering us with the metadata and digitised collections for computational experiments. However, as Walsham rightly notes, scholars have ‘rarely paused to consider how and why these repositories came into being, despite the fact that these processes have fundamentally shaped and coloured our knowledge of the past’ [19, p. 9]. This is as true of the digital historian as it is of the analogue: digital scholarship is good at grappling with the relationship between the part and the whole when the whole is the digital collection or corpara with which they are working, but there remains relatively little work on how representative those archives and collections are of the larger contexts within which they sit. Andrew Piper argues that ‘where the social sciences often speak in terms of CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands £ y.c.y.ryan@qmul.ac.uk (Y. Ryan); sea31@cam.ac.uk (S. Ahnert); r.r.ahnert@qmul.ac.uk (R. Ahnert) Å https://www.phy.cam.ac.uk/directory/ahnerts (S. Ahnert); https://www.qmul.ac.uk/sed/staff/ahnertr.html (R. Ahnert) DZ 0000-0002-0877-7063 (Y. Ryan); 0000-0002-8503-1580 (R. Ahnert) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 385 “samples” and “bias”, the notion of “representativeness” suggests there is not ultimately some stable, knowable whole against which to limit one’s “bias” ’ [13, p. 9]. Whilst as a field we have moved past the claims that large data sets give us any kind of comprehensiveness (see [8, pp. 7–8] and rebuttal [2]), there is still little work that shows how the models developed in the field of computational humanities are affected by the historically specific and contingent histories of the archives they leverage. This paper seeks to demonstrate ways that the contours of archives might be quantitatively described in order to understand the impact of those contingent histories on computational analysis of that archive at scale, and applies the methods to one such union of multiple large collections of correspondence. We suggest that these quantitative measures may be useful for others working with similarly large yet idiosyncratic network datasets. 1.1. Networking Archives - a case study The project from which this work arises, ‘Networking Archives’, seeks to reconstruct epistolary networks by bringing together metadata from multiple early modern archives and catalogues to form a meta-archive of ~450,000 letters. The analysis and techniques described in this paper are a work in progress from a larger project: outputs will include a monograph as well the code used for this analysis and others. The letter, by design, is a technology of dispersal [6, p. 9]. The efforts to re-assemble the oeuvre of specific letter writers or communities is behind the tradition of the ‘collected correspondence’, edited volumes that reunite the incoming and outgoing missives of a given individual or individuals. Online, fully-searchable repositories of correspondence text and metadata, such as The Electronic Enlightenment and the Epsilon project bring with them the potential for researchers to aggregate correspondence data at scale, in order to begin the task of reconstructing communities of knowledge. The model of aggregation can be extrapolated, to include ever larger and larger bodies of letters. But the same problem remains, that we are dealing with overlapping partial worldviews. The question is how we can model the impact of that partiality on the knowledge we can reconstruct. In this paper, intended as a report on work-in-progress, we share the ways we are dealing with and acknowledging partial and perspectival data. The networks of correspondence which form the basis of this project come from two digital sources: Early Modern Letters Online (EMLO) and a curated, cleaned dataset derived from State Papers Online (SPO).1 EMLO is a union of more than a hundred individual catalogues of correspondence, including a large core of metadata from digitised versions of the Bodleian card catalogue (BCC hereafter). This card catalogue was the product of work by two twentieth century employees and one volunteer in the Bodleian and ultimately based on individual and idiosyncratic acquisitions by the Library over time, resulting in an ‘ad hoc’ and ‘iterative’ set of metadata [10]. EMLO has been formed for the purpose of understanding a particular phenomenon—the Republic of Letters—and as such makes no claims to ‘representativeness’ in terms of a more general European network of information. Instead, EMLO is best described as a curated collection based around the correspondence networks of three key groups: a group of early seventeenth-century scholars known as the ‘Dutch Humanists’, a loosely-defined ‘circle’ of correspondents with the mid seventeenth-century polymath Samuel Hartlib at its centre, and the early members of the Royal Society from the 1660s onwards. 1 This data preparation builds on tools developed by the team members for previous projects, reported on in Hyvönen et al. (2019). Analysis was carried out on a version of the data from late 2018. 386 The correspondence in ‘State Papers Online’ (SPO), which is the digitised papers of the State Paper Office, has cohered as the result of a very different set of processes. The State Paper Office was established in 1610 with the aim of collecting the English state’s private and working manuscripts in a single place, and was to become the principal archive and working library for the parliamentary executive, while essentially being the private papers of the monarch. We might therefore expect this ‘official’ record of the English state to present a more unified or coherent worldview than EMLO. In fact, the State Papers are also full of partial or shifting perspectives: individual secretaries often viewed their official documents as ‘private’ and kept them as their possessions on leaving office. Some, such as the papers of secretary Morrice disappeared almost entirely, while those of Coventry with further private family papers added in were only returned much later.2 Document collection in the early years was often chaotic and piecemeal (Marshall, n.d.). Thus, in March 1669 Charles II issued a warrant that all officers of state were to allow Joseph Williamson, the Keeper of State Papers ‘to peruse all such records & memorialls as now are in your custody, & from them to make Transcripts of whatsoever Treatyes, Leagues, Commissions conduceing thereto, & publike Grants which he […] shall deeme fit for Our Service’ [14, p. 188]. Records swelled in consequence thereafter. Furthermore, the metadata we have available to us is derived from nineteenth-century printed calendars, which do not include as such the Northern department of the Office of the secretaries of state as it existed from the Restoration onwards, although confusingly some other papers of the Northern secretaries are included. 1.2. Evaluating the Contours of Multiple Archives As a record of social interactions between discrete entities, these datasets are naturally suited to analysis using tools from network science—a field concerned with understanding the structure and dynamics of complex networks as a series of mathematical phenomena. While EMLO and SPO have very different histories —one being a modern union catalogue and the other the product of a long and complex institutional history —the elements of contingency that have shaped them result in very similar network topologies. Both contain a small number of highly connected nodes (the entities or points to be connected in a network graph), and a large number of poorly connected nodes. Moreover there exists a continuum of connectivity between these two extremes that renders these networks ‘scale-free’, meaning that in every subregion of the network, on almost every scale, we have a few relatively well connected nodes and many relatively poorly connected ones. A scale-free network can be recognised by plotting the distribution of the number of connections for each node - its ‘degree’ - which in the case of scale-free networks follows a straight line on a double logarithmic plot of this kind (figure 1). So what underlies this broad structural similarity, and how does it play out across the different dimensions of our data? To tease this apart we have analysed the contours of our respective archives along five dimensions: size-distribution, space, time, absence, and intercon- nectedness. Each of these has helped reveal the way that the respective histories of the archive can impact analyses. We do not have space to go into all five of these here, but aspects of size-distribution, absence and interconnectedness are particularly instructive in explaining the structure observed in figure 1. The reason for the topology of the networks that we see in figure 1 above can be gleaned in large part by understanding the letter distribution across its constituent parts. In the case of 2 For example the Conway Papers were removed from the State Papers office by Secretary Conway while he was still in office, and only returned by their then owner, John Croker, in 1857. 387 EMLO Stuart SP 10000 1000 Occurrences 100 10 1 1 10 100 1000 1 10 100 1000 Unweighted Degree Score Figure 1: Degree distribution of EMLO and State Papers Networks, plotted on a double logarithmic scale. EMLO that means understanding the number of letters per catalogue. It is made up of about 110 constituent catalogues, and the distribution of the number of letters across these catalogues is heavily skewed with 80% of the letters coming from the top twenty-three catalogues (21%). Moreover the nature of these respective catalogues has a large impact on the contours of those constitutive parts: a correspondence collection usually, though not always, prioritises an individual at its centre. While most real-world networks tend to cohere around a series of key hubs, this phenomenon is exaggerated in the EMLO data because it is in effect sampled from the perspective of the hubs, which is not dissimilar to the way that many traditional datasets used for Social Network Analysis (SNA) are based on surveys or responses to a questionnaire which relies on participants listing their connections [see 20]. Figure 1 shows that the degree distribution of the nodes in the State Papers Stuart network is strikingly similar to that derived from EMLO’s data. We could also describe this network as in effect a collection of ego networks (the name for a network derived from the perspective of a single node), as it is in essence the worldview of the principal secretaries of state plus a limited set of other nodes. Other analysis has shown that networks based on earlier Tudor correspondence had similar degree distributions, as most of the connections are assigned to the small number of individuals who formed the ‘gravitational centre’ of the English state [1, p. 32]. The topology however is also a product of missing data. We have found that the patterns of this missing data vary not only between archives, but also between catalogues within EMLO. Contributors to EMLO have collated correspondence inventories based on contemporary or modern print editions; born-digital editions or listings; library catalogues; and/or individual or project archival research, so therefore the metadata available to us varies in type and quality, despite extensive standardisation and reconciliation. The missing information is systematic rather than random, and there is a strong correlation between missing data in some pairs of fields: records missing recipients are more likely to be missing authors, for example. The Stuart SPO also has missing data but it is not always directly comparable: no data on letter destination has been recorded in SPO, and there is very little truly ‘missing’ author or recipient data in SPO, though there are numerous inferred, ambiguous or unidentified person 388 (a) (b) 150000 150000 observations observations 100000 100000 50000 50000 0 0 Destination Origin Date Recipient Author Origin Date missing present missing (or date range) present (c) 1600 1625 1650 1675 1700 Average date range 100 200 300 Figure 2: Missing data in EMLO and SPO (Stuart). (a) shows the missing portions of key fields in EMLO. (b) shows missing sections of the origin field, and the section of SPO where the date is given as a range rather than a specific date. Figure (c) shows this date range data further broken down: ‘warmer’ colours indicate a larger mean range for that year. text strings. Most missing or unknown dates have been given a date range, sometimes spanning the lifetime of the correspondents in the letter, or a period of rule. As an example of the granularity with which we are approaching missing or uncertain data: of the approx. 177,000 records in the Stuart SPO data, 34,829 have an estimated date range rather than a single date (figure 2 (b)). 33,386 of these have a date range of one year or less. The majority of these (21,311) have a date range of exactly 10 days (11 days after 1700) because of uncertainty between Julian and Gregorian calendars. This can be visualised with years represented by vertical bars, and the mean date range by colour, with ‘hotter’ colours indicating less accurate dates (figure 2 (c)). This plot shows a progression from less to more accurate data over the course of the seventeenth century. This fine-grained data on the ‘known’ partiality of our networks has helped us add necessary caveats or controls to our results: for example, knowing that the SPO and BCC data usually does not contain information on place of destination means that our analysis of the geographic origins and transmission of intelligence or information takes that into account. Similarly, temporal analysis of the formation of these networks must allow for the fact that early records are more likely to be missing either a sender or recipient. 1.3. Missing Data and Network Analysis Recent years have seen a number of projects in the computational humanities using network theory as the basis for historical analysis [1, 4, 18]. Many of these are based on incomplete data. Because historians are used to working with partial archives, there is some well-founded scepticism about the impact of such absence on large-scale computational analysis. However, 389 in network science and cognate fields, there is a more established discourse and methodology around the nature and likely impact of incomplete data. For example, missing social network analysis data can be the result of boundary specification problems (difficulties with establishing the parameters of the network) or data collection inaccuracy (incomplete or biased surveys, for example). Studies have found that these types of missing data are likely to result in individual missing nodes or edges [9, p. 248]. Our data, by comparison, is more likely to have missing hub nodes along with most of their connections, because entire personal archives have been lost, or are currently unavailable. As the figures above have shown, data derived from archives like ours are also likely to be missing or have inaccurate years or year ranges because longitudinal historical data can be interrupted by war, change in executive, or bureaucratic practices. However, there is little understanding of the impact of this missing data on network results. Intuitively, one might assume that network metrics based on partial archives are therefore unreliable. It is possible, however, to quantitatively test this assumption. Researchers have previously used quantitative methods to estimate the effect of missing nodes and edges on network measures [16, 17]. Costenbader and Valente, for example, analysed the sensitivity of eleven centrality measures using social networks derived from questionnaire results and found that eigenvector centrality (a recursive measure of a node’s importance, based on the importance of its connections) was particularly robust as a measure of centrality using sampled data [3, p. 299]. To understand the effect that missing network information might have on our network, we adapted a technique developed by Matthew Peeples for measuring the sensitivity of network measures to random node removal from archaeological networks [12]. This technique involves randomly removing progressively larger random samples from the network, and comparing the results to those found in the full network. Using this, we can infer how various types of gaps in the data might affect the rankings of individual nodes (figure 3). The code for this, as well as a version with a user-friendly interface, will be made available on the project’s Github repository to coincide with a publication publishing full results. What we have found is that even with very large amounts of letters removed, for those remaining nodes, there remains a strong correlation between the original ranked score of a node and its rank in the random subsample of the network, and very little variation between random samples. The model shows that individuals who are not removed are very likely to remain in similar ranked positions for degree scores despite significant missing data. Degree is a key measurement of a node’s centrality to a network and we surmise that network analysis of correspondence in large historical collections yields results that are remarkably robust to missing data, likely because we have multiple letters—evidence of links—for many of the node relationships. While caution must still be exercised, because in some cases none of the possible links between two individuals will have survived, but it does give us confidence about the use of this most basic measure of network centrality in historical findings. This robustness is partially because of another phenomenon that we have found to be the result of working with many combined catalogues: the emergence of secondary ‘informal’ catalogues, found at the intersections between the primary ego networks. Like the measure of betweenness centrality— a metric which gives a score to each node based on the number of times it is traversed on the shortest paths between every other pair of nodes and which is often used to find individuals with structural importance but not necessarily the highest total connections—looking at overlapping or intersecting neighbourhoods helps us to find individuals who bridge multiple overlapping ego networks, whose significance was harder to derive before the creation of union catalogues and meta-archives such as EMLO. 390 1.00 1.00 Spearmans Rho Spearmans Rho 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0 25 50 75 100 0 25 50 75 100 Percentage Removed Percentage Removed Figure 3: Effect of random letter removal on degree ranks. In the left-hand plot, we removed progressively larger random samples of letters from the full network, recalculated degree ranks and then plotted the resulting Spearman’s rank correlation coefficient between the original and sampled network (this process was repeated 50 times). The right-hand plot is the result when removing edges from a random Barabasi- Albert network with the same number of nodes and edges, which follows a more severe pattern of decay in correlations as larger numbers of edges are removed. What this tells us is that degree rank correlations are remarkably robust to random letter removals from the network. 1.4. Intersections One of the key aims of the Networking Archives project is to examine the extent to which the multiple overlapping ego networks of the EMLO archive are intermeshed internally, and to what extent they in turn overlap with the political archive of SPO. The reconciliation of EMLO and SPO is still being completed so the following results are based on EMLO only. The purpose for which EMLO was originally created means that the catalogues of corre- spondence are deeply enmeshed and overlapping. Even when the individuals at the centre of catalogues are not directly connected by an edge, it is likely they will have correspondents in common. We exploit this fact to uncover or highlight ‘emergent’ ego networks of individuals whose correspondence has not been consolidated into a single collection. To do this, we devel- oped a tool that finds the intersecting set of correspondents for each pair of individuals in the network—whether they corresponded with each other or not (figure 4). We have found that in partial networks, individuals with a high number of these ‘shared correspondents’ are often important figures whose contributions to their particular networks have been overlooked. To find these emergent informal catalogues, we looked for individuals who a) had many significant intersecting correspondents, b) did not have their own catalogue or edited collection and c) could be found across numerous catalogues (table 1) From these criteria, the seventeenth-century Scottish minister John Dury emerged as an interesting case study. Despite his prominent work as a pamphlet writer, minister, diplomat, tutor, and theologian, there is no single ‘Dury Archive’: John Dury left many of his papers with Samuel Hartlib when he travelled on the continent, and therefore the largest single collection of Dury material can be found within the Hartlib papers, held by Sheffield University. Because of this, he does not have a centralised collection of correspondence in EMLO, although parts of it have been collected in printed editions elsewhere [5, p. 39]. Only a small number (1,372) of 391 D A E B C Figure 4: The Overlaps Tool lists the overlapping set (C, D, E) of neighbours of A and B for each pair of nodes in the network—whether or not A and B are also direct neighbours. Dury’s total surviving letters are listed in EMLO, spread across a number of catalogues, and therefore a ‘standard’ quantitative analysis of Dury’s correspondence might underestimate his importance to the network of which he forms a part. With a union catalogue like EMLO we can infer the centrality of individuals even without a centralised catalogue. As well as appearing as an author and recipient in Hartlib’s catalogue, Dury can be found in eight others on EMLO, albeit in much smaller numbers. In this way Dury’s centrality is no longer defined by his appearance in a single catalogue, but by his own centrality, inferred from his appearance across other collections, for example the three letters found in a printed edition of the letters of Joseph Mede, or the two ‘lost letters’ of his to Robert Boyle found in a list compiled by William Wotton shortly after Dury’s death [11, pp. 804, 865– 65, 866–67], [7, p. 246]. Like scientists inferring the presence of black holes by their absence and the behaviour of objects around them, looking at individuals and their overlaps allows us to establish a sense of their contribution to the network, even when their correspondence has not been systematically collected in one place. For Dury, this matches up with what we know about him as a figure within a European intellectual network. A prominent Irenicist (working towards a peaceful Protestant union of churches), Dury was in regular contact with theologians, scholars and diplomats throughout Europe. Dury spent much of his life on the move: living or travelling through the Netherlands, Sweden, Poland, Germany, and Switzerland, communicating state affairs to diplomats in Eng- land and moving on when he had established contacts, aiming to continue the correspondence by letter after he had left. His mobility and continental connections meant that he acted as a ‘bridge’ in a European network of communication centred around Samuel Hartlib, as well sending intelligence to Cromwell’s secretary of state, John Thurloe, in the mid-1650s. Find- ing Dury amongst established archives, and specifically linking evidence of his correspondence across multiple catalogues, we can more fully exploit the surviving records of his contributions, even though most of his correspondence is either lost or not available as structured metadata. 392 Table 1 Individuals in EMLO, arranged by number of times they share at least two correspondents with another individual. Individuals without their own catalogue are highlighted in gray Name Overlaps Degree Catalogues Oldenburg, Henry 734 526 17 Wallis, John (Dr) 678 389 10 Huygens, Constantijn 618 1415 17 Vossius, Gerardus Joannes 588 925 12 Boyle, Robert 561 511 13 Sancroft, William 527 927 2 Dury, John 515 283 9 Smith, Thomas 501 369 3 Mersenne, Marin 489 254 15 Komenský, Jan Amos 474 234 9 Huygens, Christiaan 472 372 15 Hartlib, Samuel 469 364 10 Vossius, Isaac (Dr) 456 313 8 Charles II, King of England, Scotland, and Ireland 447 37 4 Groot, Hugo de 441 540 13 Charlett, Arthur (Reverend) 436 441 4 Saumaise, Claude de 415 34 11 Polyander van den Kerckhoven, Johannes 407 177 5 Lister, Martin 402 245 7 Hevelius, Johannes 399 36 9 1.5. Conclusion The growth of the use of network theory in the humanities over the past decade has been accompanied by self-reflection on its use: at one extreme, it has been used to make longue duree claims about hugely complex historical forces [15], while at the same time other researchers have, quite rightly, pointed out that networks based on assembled ego networks may tell us more about the collection practices than about a historical phenomenon more generally (Weingart 2011). This paper has put forward some techniques for understanding and working with partial network data as found in historically contingent archives, in a way which we believe strengthens the epistemological underpinnings of historical network analysis. Some of the unknown data is ‘known’: partially missing people, dates or place names which we can measure, understand and visualise using the methods as described above. The much larger part of the missing data is truly unknown: destroyed, intercepted or unavailable archives, not to mention the part of the network reliant on oral exchange. Despite this, our findings suggest that network rankings stay remarkably stable when large parts of the network are removed, either at random or by removing entire catalogues—we can also infer that this would be the case were additional catalogues added to a meta-collection. Crucially, the method allows us to reconstruct likely connections in incomplete, historically contingent collections. Quantitative studies of the Republic of Letters have so far focused on individuals whose collections have survived in a single archive or have been assembled 393 afterwards in a single repository.3 This may have the effect of biasing results towards letters received by geographically stable nodes. John Dury’s centrality to the network, on the other hand, was one based on breadth rather than overall strength, and is harder to measure because of the geographic dispersal of his letters: establishing network strengths across catalogues helps to find these intersecting nodes, and counteract the natural bias towards histories based on individuals as found in formal collections or nineteenth-century printed editions. The adoption of linked open data means that access to linked discrete archives is set to increase, and existing datasets will continue to expand. The ‘Networking Archives’ project, for example, will link together three large datasets, and one of these, EMLO, is continually expanding its metadata with new catalogues of correspondence. Our understanding of John Dury is a direct benefit of this: almost seventy additional letters by John Dury found in the Stuart State Papers Online will, when linked to the existing metadata, help the project to understand more fully his role as a provider of intelligence to the English state. This presents an enormous opportunity: each new addition leads to further intersections and helps us to understand both the added network and the entire network as a whole. This work is part of a larger project, from which will come a project volume, code and analysis. Through this we hope to be transparent about the ways in which our data is biased or partial, but also reassure readers and other historical network analysis practitioners that results do have validity, despite missing data. Author contributions YR conceived original idea; adapted code; wrote and revised manuscript; SA wrote and revised manuscript; developed idea; checked code and statistics; RA supervised PI; developed idea; wrote and revised manuscript; Acknowledgments This work is funded by the AHRC as part of the project Networking Archives (AH/R014817/1). The work we report on here builds on the work of the project team past and present, including Philip Beeley, Arno Bosse, Howard Hotson (PI), Miranda Lewis, Esther van Raamsdonk, and Matthew Wilcoxson. We would especially like to thank Philip Beeley, Esther van Raamsdonk, Miranda Lewis and Howard Hotson for their comments on this paper, and Matthew Peeples for developing the original robustness code and his permission to adapt it. For a full list of data contributors to Early Modern Letters Online, see http://emlo-portal.bodleian.ox.ac.uk/ collections/?page_id=2259. References [1] R. Ahnert and S. E. Ahnert. “Metadata, Surveillance and the Tudor State”. In: History Workshop Journal 87 (Jan. 2019), pp. 27–51. issn: 1363-3554. 3 The Mapping the Republic of Letters project, for example, has taken as case studies Athanasius Kircher and John Locke. The bulk of the former’s correspondence is preserved in the Archives of the Pontifical Gregorian University in Rome, and the latter’s was assembled as a multi-volume scholarly edition by Oxford University Press, and made available digitally on the Electronic Enlightenment website. 394 [2] K. Bode. “The Equivalence of “Close” and “Distant” Reading; or, Toward a New Ob- ject for Data-Rich Literary History”. In: Modern Language Quarterly 78.1 (Mar. 2017), pp. 77–106. issn: 0026-7929. [3] E. Costenbader and T. W. Valente. “The stability of centrality measures when networks are sampled”. en. In: Social Networks 25.4 (Oct. 2003), pp. 283–307. issn: 03788733. (Visited on 07/17/2020). [4] C. Edmondson and D. Edelstein, eds. Networks of enlightenment: digital approaches to the republic of letters. Oxford University studies in the Enlightenment 2019:06. OCLC: on1057786719. Liverpool: Liverpool University Press on behalf of Voltaire Foundation, 2019. isbn: 9781786941961. [5] M. Greengrass. “Archive Refractions: Hartlib’s Papers and the Workings of an Intelli- gencer”. In: Archives of the Scientific Revolution: The Formation and Exchange of Ideas in Seventeenth-Century Europe. Ed. by M. Hunter. Woodbridge: Boydell Press, 1998, pp. 35–48. [6] H. Hotson and T. Wallnig. “Introduction”. In: Reassembling the Republic of Letters in the digital age: standards, systems, scholarship. Göttingen: Göttingen University Press, 2019, pp. 7–23. isbn: 9783863954031. [7] M. Hunter, A. Clericuzio, and L. M. Principe, eds. The Correspondence of Robert Boyle. London: Pickering and Chatto, 2001. [8] M. L. Jockers. Macroanalysis: digital methods and literary history. Topics in the digital humanities. Urbana: University of Illinois Press, 2013. isbn: 9780252037528 9780252079078 9780252094767. [9] G. Kossinets. “Effects of missing data in social networks”. en. In: Social Networks 28.3 (July 2006), pp. 247–268. issn: 03788733. doi: 10.1016/j.socnet.2005.07.002. (Visited on 06/15/2020). [10] M. Lewis. Ghosts in the Machine: (Re)Constructing the Bodleian’s Index of Literary Correspondence, 1927-1963. 2013. url: http://www.culturesofknowledge.org/?p=295. [11] J. Mede. The works of the pious and profoundly-learned Joseph Mede, B.D. sometime fellow of Christ’s College in Cambridge. London: Printed by Roger Norton for Richard Royston, 1677. [12] M. Peeples. Network Science and Statistical Techniques for Dealing with Uncertainties in Archaeological Datasets. 2017. [13] A. Piper. Enumerations: data and literary study. London: The University of Chicago Press, 2018. isbn: 9780226568614 9780226568751. [14] M. Riordan. “”The King’s Library of Manuscripts”: The State Paper Office as Archive and Library”. In: Information & Culture 48.2 (2013), pp. 181–193. issn: 21648034, 21663033. [15] M. Schich et al. “A network framework of cultural history”. en. In: Science 345.6196 (Aug. 2014), pp. 558–562. issn: 0036-8075, 1095-9203. (Visited on 07/17/2020). [16] J. A. Smith and J. Moody. “Structural effects of network sampling coverage I: Nodes missing at random”. en. In: Social Networks 35.4 (Oct. 2013), pp. 652–668. issn: 03788733. (Visited on 06/15/2020). 395 [17] J. A. Smith, J. Moody, and J. H. Morgan. “Network sampling coverage II: The effect of non-random missing data on network measurement”. en. In: Social Networks 48 (Jan. 2017), pp. 78–99. issn: 03788733. (Visited on 06/15/2020). [18] I. Van Vugt. “Using Multi-Layered Networks to Disclose Books in the Republic of Let- ters”. en. In: Journal of Historical Network Research 1 (Oct. 2017), pp. 25–51. issn: 2535-8863. (Visited on 06/15/2020). [19] A. Walsham. “The Social History of the Archive: Record-Keeping in Early Modern Eu- rope”. en. In: Past & Present 230.suppl 11 (2016), pp. 9–48. issn: 0031-2746, 1477-464X. (Visited on 06/15/2020). [20] S. Wasserman and K. Faust. Social Network Analysis Methods and Applications. Cam- bridge: Cambridge University Press, 1994. 396