<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comparative Religion, Topic Models, and Conceptualization: Towards the Characterization of Structural Relationships between Online Religious Discourses</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zachary</forename><forename type="middle">K</forename><surname>Stine</surname></persName>
							<email>zkstine@ualr.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Arkansas at Little Rock</orgName>
								<address>
									<addrLine>2801 S. University Ave</addrLine>
									<postCode>72204</postCode>
									<settlement>Little Rock</settlement>
									<region>AR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">James</forename><forename type="middle">E</forename><surname>Deitrick</surname></persName>
							<email>deitrick@uca.edu</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Central Arkansas</orgName>
								<address>
									<addrLine>201 Donaghey Ave</addrLine>
									<postCode>72035</postCode>
									<settlement>Conway</settlement>
									<region>AR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nitin</forename><surname>Agarwal</surname></persName>
							<email>nxagarwal@ualr.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Arkansas at Little Rock</orgName>
								<address>
									<addrLine>2801 S. University Ave</addrLine>
									<postCode>72204</postCode>
									<settlement>Little Rock</settlement>
									<region>AR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comparative Religion, Topic Models, and Conceptualization: Towards the Characterization of Structural Relationships between Online Religious Discourses</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">55D481F185D55C5B8FD8373D663C7BE6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T22:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>comparative religion</term>
					<term>topic modeling</term>
					<term>information theory</term>
					<term>digital religion</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The similarity between the lexicons of different religious discourses does not necessarily reflect the similarity between the ways of understanding the world inherent in their discourses. Drawing on scholarship from comparative religion that distinguishes between surface-level, lexical distinctions and deeper grammatical and structural distinctions between two religious traditions, we present a computational approach to assessing the structural similarity between religious discourses irrespective of their lexical differences. We argue that unsupervised machine learning models trained on different discourses can be indirectly compared by how consistently they organize information as an operationlization of structural similarity. This consistency can be quantified as the mutual information between the models' clusterings of a designated set of comparison data. We present our approach through a case study comparing discussions from Reddit concerning Buddhism and Christianity.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Comparative analyses of culturally specific discourses are complicated by the possibility for the discourses being compared to reflect ways of understanding the world which are fundamentally similar yet expressed through distinct, culturally specific terms. This is specifically problematic for comparative religion in cases where it is possible for one to adopt the forms of a religious tradition without necessarily adopting the deeper structures beneath those forms (i.e., something like a worldview). For example, it has been argued that the religious life of Henry Steel Olcott-a notable convert to Buddhism-can be understood as comprising an American Protestant structure that informs Olcott's identity despite his adoption of a Buddhist and South Asian cultural lexicon <ref type="bibr" target="#b32">[33]</ref>. A distinction in how religious identities are expressed is made here between the consciously-chosen forms that signal an identity-or cultural lexicon-and the deeper cultural structure-or cultural grammar-underlying those forms.</p><p>In this paper, we put forward an operationalization of how religious discourses might be empirically compared in such a way that reflects their similarity at the level of cultural grammar rather than cultural lexicon in order to measure what we are calling their structural similarity. We assume that a particular discourse reflects a way of understanding the world or some aspect of it <ref type="bibr" target="#b17">[18]</ref>. In other words, a discourse reflects a cultural grammar or structure. However, as in the example of Olcott, a discourse may be expressed within a cultural lexicon that is incongruous with the underlying cultural grammar (see section 2.1 for a more detailed discussion of this phenomenon). Importantly, our use of the term "lexical" should be understood to refer to how culturally specific a particular term is, reflecting this notion of cultural lexicon.</p><p>Our approach is based on the assumption that a discourse divides the world, or some aspect of it, in a particular way. In other words, a categorization scheme is implicit within a discourse. Given this assumption, we argue that if discourses are structurally similar, they can be expected to produce categorization schemes that carve up information in a mutually consistent manner, despite differences in culturally specific lexicons used in each discourse. We operationalize this notion using unsupervised machine learning models that are trained on each discourse being compared. Each model learns a clustering scheme that is specific to the discourse used to train it, thereby acting as a plausible representation of that discourse's categorization scheme. We then interrogate the relationship between categorization schemes (represented by the learned models) by forcing each model to apply its discourse-specific scheme to the unseen discourse with which it is being compared. We then measure the mutual consistency with which each discourse-specific model classifies both its own discourse and the comparison discourse using the mutual information between the resulting clusterings.</p><p>In order to better clarify what we are attempting to do in this approach, we draw on and extend a particular usage of the term "conceptualization." A clustering of a data set can be understood as implying a particular conceptualization of that data, and multiple clusterings may imply various ways that a researcher might conceptualize the data, with potential differences or similarities between them <ref type="bibr" target="#b14">[15]</ref>. In this sense, clustering data leads a researcher to interpret the resulting clusters as salient concepts for understanding the data. In that case, a single data set is explored through various clusterings in order to find useful conceptualization schemes. Here, we use "conceptualization" to mean how one discourse-as represented within a model trained on it-organizes another discourse in terms of its own semantic elements. In other words, the representation of a different, unseen discourse by a model trained on a different discourse can be understood as how the training discourse "conceptualizes" this unseen discourse.</p><p>This usage of "conceptualization" is especially useful given the type of unsupervised model we use: latent Dirichlet allocation (or LDA). LDA learns two things from a corpus: a set of word-usage patterns (or topics), which can be understood as corpus-specific concepts, and a representation of each document in the corpus as a mixture of these word-usage patterns <ref type="bibr" target="#b4">[5]</ref>. Importantly, these word-usage patterns may be characterized by the corpus-specific lexicon alongside less corpus-specific terms. When we force such a model to represent the documents of a different corpus as mixtures of its own word-usage patterns, we get a representation of the different corpus through the lens of the corpus which was used to train the model. In other words, we get a conceptualization of this different corpus in terms of the training corpus. We argue that the mutual consistency with which both corpus-specific models "conceptualize" each other reflects their structural similarity-the degree to which each discourse-as-model categorizes the training corpus of each. In this operationalization, the structural similarity is reflected by the mutual information between how two models organize input, regardless of how different the actual word-usage patterns are between the two models. In this way, we are not comparing the features of each model directly, but instead are comparing only how consistently these corpus-specific features are applied by each model. From this, we get a mapping from the "true" word-usage patterns (from the model trained on the corpus) to those used in another model's conceptualization of the corpus. This mapping can be usefully thought of as the interpretation of one model's topics by another model.</p><p>Motivated by prior work in comparative religion concerning encounters between Buddhism and American Protestantism <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b10">11]</ref>, we explore the empirical implications of this operationalization in a narrow case study between two English-language discourses from the popular discussion platform, Reddit: r/Buddhism and r/Christianity. Importantly, there is no reason to assume that either discourse we examine constitutes a general representation of global Buddhism or Christianity (assuming such general forms, untethered from particular social systems, are even valid to begin with). Instead, these discourses should be understood to reflect only the particular versions of Buddhism and Christianity which emerge from these online communities. In other words, rather than focus our comparisons on abstract representations of Buddhism and Christianity, we focus our comparisons on specific communities engaged in discussing Buddhism and Christianity. Therefore, our findings should not be construed as reflections of Buddhism and Christianity as transcendent forms, but as contingent upon these online communities. We include two additional communities to help contextualize our results.</p><p>Far from being trivial or unserious objects of scholarly inquiry, such online discourses offer valuable insights into how religious traditions are understood and engaged with in popular culture. In recent years, a body of literature has emerged specifically around the study of religion in digital contexts under the name of "digital religion" <ref type="bibr" target="#b5">[6]</ref>. Given the popularity of Reddit, it is reasonable to think that an understanding of its religious communities does have salience for understanding popular conceptions of religious traditions in the English-speaking world. Additionally, the quantity of data that is available from these communities is sufficiently large to be an obstacle to researchers analyzing these data without the aid of computational tools. Quantitative methods are underused within the study of digital religion <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b16">17]</ref>, and so another goal of this work is to demonstrate how such methods may be imported and customized as useful complements to qualitative methods.</p><p>We find evidence that our proposed operationalization of structural similarity accords with our expectations about the relationship between the two subreddits' discourses and with the discourses of two secondary subreddits. Additionally, we investigate which features from models of r/Buddhism and r/Christianity are most responsible for their structural similarities by calculating the pointwise mutual information between each possible pair of features. We find the context in which the two corpora are compared is highly influential on which feature pairs emerge as most strongly related between models. We also find that, while these feature pairs may have stark differences between the lexical items that characterize them, their mappings between models often appear surprisingly reasonable as if analogies for each other within their different lexical contexts.</p><p>In the following sections of this article, we provide background for understanding the theoretical framework we present, describe the data and methods used to illustrate this framework within the case study of the r/Buddhism and r/Christianity discourses from Reddit, present our findings from the case study, and briefly discuss what these findings suggest about our operationalization and directions for further investigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>In the following subsections, we provide background information necessary for constructing our argument that the mutual information between topic models trained on lexically distinct religious discourses can be understood as a reflection of their structural relationship.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Comparative religion and religious creolization</head><p>While this comparative problem may be faced in a variety of cultural contexts, we explore it from within the context of comparative religion, and so a brief consideration of the problems faced in comparative religion will provide important context for understanding the challenges faced by this work. Paden identifies three primary criticisms that have been levied against traditional comparative approaches <ref type="bibr" target="#b30">[31]</ref>. First, comparativism may mislead by suppressing differences between cultures, engaging in colonialist reductiveness. Second, comparativists have sometimes been guilty of introducing theological or ontological assumptions into their work in an unscientific manner. Finally, charges have been made that comparativism is untheoretical in that it lacks the ability to explain religious differences and similarities.</p><p>The use of empirical methods in comparative religion has been suggested as a possible antidote to this last criticism <ref type="bibr" target="#b23">[24]</ref>, and while computational methods are certainly not objective, they at least reduce the ways in which researchers may introduce their own faulty assumptions into an analysis or make those assumptions explicit. However, the potential for reductionism in computational approaches is worth consideration. Computational methods, specifically those from machine learning, are effective in identifying large-scale patterns within data too numerous for individuals to comb through. Such large-scale analyses require a trade-off between the particular and the general. In other words, machine learning methods excel at illuminating trends and generalities, but potentially at the expense of finer-grained variation. While reductionism is certainly a concern, it has been argued by some that a preoccupation with reductionism has substantially hindered comparative religion <ref type="bibr" target="#b36">[37,</ref><ref type="bibr" target="#b8">9]</ref>.</p><p>With these challenges in mind, we now turn to the comparative work undertaken by Deitrick concerning the relationship between the social ethics of engaged Buddhism and mainstream American religion, which serves as the inspiration for the present study. In <ref type="bibr" target="#b11">[12]</ref>, Deitrick invokes a theory of religious creolization put forward to describe the religious life of Henry Steel Olcott <ref type="bibr" target="#b32">[33]</ref>. This theory posits a distinction between a religion's grammatical structures and the particular lexical forms through which these structures are expressed. In the case of American engaged Buddhism, Deitrick argues that, in terms of its social ethics, it can be understood as the adoption of a Buddhist lexicon to describe cultural structures that ultimately reflect mainstream American religion. Deitrick refers to this as an "inverse creole faith" in that it reverses the power dynamics of what is typically referred to as "creole"-a dominant group adopts the lexicon of a minority group <ref type="bibr" target="#b11">[12]</ref>.</p><p>In the present study, we are interested in whether the Buddhist discourse from Reddit is only lexically distinct from the Christian discourse, or if it is both lexically and structurally distinct.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Religion on Reddit</head><p>Reddit consists of a large number of communities, called subreddits, which facilitate discussions around a defined theme or topic. Users can author submissions to a subreddit and author comments within discussion threads that accompany each submission. Data from Reddit have been usefully analyzed in work ranging from the effectiveness of hate speech bans <ref type="bibr" target="#b7">[8]</ref>, violations of community norms <ref type="bibr" target="#b6">[7]</ref>, persuasion <ref type="bibr" target="#b40">[41]</ref>, birth narratives <ref type="bibr" target="#b1">[2]</ref>, and discourses around China <ref type="bibr" target="#b39">[40]</ref>.</p><p>Reddit is a useful source of popular discourses for several reasons. Most importantly, each community constitutes a discourse that is endogenously defined. Constructing a corpus that represents a particular religious tradition is complicated by the decisions that must be made about which documents to include and exclude from the corpus. In the case of Reddit, such consequential decisions are avoided: The community of users and their discussions presents an unambiguously delineated discourse. Additionally, comparative analyses of subreddits have the benefit that all subreddits being analyzed are subject to the same effects that stem from simply being on Reddit, whether in the form of demographic trends of its users or the affordances of the platform. Each subreddit we analyze is predominantly English-language.</p><p>While a number of subreddits exist which focus on Buddhist and Christian traditions, we limit ourselves to r/Buddhism and r/Christianity for two reasons. First, our primary goal in this paper is to explain our proposed approach for making structural comparisons between religious discourses; therefore, we analyze these two subreddits to serve as a focused case study. Second, r/Buddhism and r/Christianity appear to be the most general subreddits dedicated to their respective religious traditions as well as having the largest discussion histories. We are more interested in popular conceptions of Buddhism generally rather than engagements with more specific traditions within, for example, Theravada, Mahayana, or Vajrayana Buddhism. Similarly, we are interested in general conceptions of Christianity rather than in specific denominations. This is not to suggest that communities with a narrower focus on more specific traditions and denominations are irrelevant to our questions, but simply that, within the current study, we are interested in the two most popular subreddits that involve discussions of Buddhism or Christianity.</p><p>Various sects and denominations are surely represented to some extent in these communities, but there is no reason to think they are represented in a balanced way-certain perspectives may loom larger than others. However, to reiterate a previous point, we are not studying r/Buddhism because we mistakenly believe it to be an accurate representation of global Buddhist perspectives. Instead, we study it because it is a wildly popular Buddhist discussion community on a wildly popular social media platform and its discourse is therefore salient for understanding Buddhism within popular English-language online culture. The same applies to r/Christianity. In future work, we intend to extend our approach to other communities including several smaller sect-specific subreddits alongside those analyzed here. However, r/Buddhism and r/Christianity remain reasonable and interesting starting places for our case study for the reasons just given.</p><p>To provide context for our results comparing r/Buddhism and r/Christianity, we also report results comparing them with two other subreddits: r/religion and r/math. Our rationale for including r/religion is that we expect it to reflect a tendency in Western culture to associate the notion of religion with Abrahamic traditions and especially with Christianity. We include r/math because we expect that, while r/Buddhism and r/Christianity may present two distinct discourses, they are more likely to reflect similar conceptualization schemes with each other than with discussions about mathematics. Additionally, the inclusion of r/math serves as a check to make sure that our approach is still capable of showing dissimilarity and not simply forcing all corpora being compared to appear mostly similar.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Latent Dirichlet allocation</head><p>We use latent Dirichlet allocation (LDA) to represent each discourse as a topic model. LDA views a corpus as the result of a generative statistical process in which each document in the corpus is generated by drawing a probability distribution over a set of "topics"-probability distributions over the vocabulary of the corpus-from which each word in the document is then drawn <ref type="bibr" target="#b4">[5]</ref>. In training, LDA attempts to infer the distribution over topics for each document in the corpus as well as the distributions over the vocabulary (or "topics"). The learned topics correspond to latent features underlying the corpus. While these features may sometimes correspond to colloquial usages of "topic," they are better understood as patterns of wordusage, or as <ref type="bibr" target="#b0">[1]</ref> suggests, contexts. The topics of LDA can also be understood to reflect several concepts from the sociology of culture <ref type="bibr" target="#b13">[14]</ref>. LDA not only provides a representation of each document in the corpus as a mixture of these features but can also provide representations of unseen documents not included in the training corpus as mixtures of these features.</p><p>An unsupervised algorithm, LDA learns the topics and document-topic distributions without any specifications of what the content of its features ought to look like. However, LDA does require the selection of the number of topics, k. Different choices of k may influence the specificity of the learned features, with smaller values of k yielding more general topics and larger values yielding more specific topics <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b0">1]</ref>. Quantitative evaluation of LDA models is a complex problem, and qualitative evaluation is typically necessary to ensure that a model is understandable and therefore helpful to a researcher <ref type="bibr" target="#b34">[35]</ref>. Ultimately, it may not make sense to think of one model as more correct than another, even if one appears optimal according to one or more evaluation metrics, but to simply see each as plausible representations of the training corpus.</p><p>LDA has been previously used within the context of religious studies including a comparative analysis of three Confucian texts <ref type="bibr" target="#b29">[30]</ref> and an investigation into mind-body holism in medieval Chinese thought <ref type="bibr" target="#b37">[38]</ref>. LDA has also been used in comparative contexts outside of religious studies to compare the proceedings of natural language processing conferences over time <ref type="bibr" target="#b15">[16]</ref> and to compare two discourses about China from Reddit <ref type="bibr" target="#b39">[40]</ref>. In each of these cases, LDA is used to train a common topic model that is shared by each of the collections being compared. The relevant documents, terms, or collections of documents are then compared within this shared topic space. This approach makes sense when the objects being compared are not characterized by distinct lexicons or if such lexical distinctions are of interest. What differentiates the approach we describe here is that we are not comparing objects within a shared topic space but are instead comparing how topic models try to fit unseen, lexically distinct discourses into their own topic spaces that are specific to their training discourses. We are not looking at which topics are associated with which discourse but are instead comparing how much consistency exists between how models place documents from different discourses within their own discourse-specific features. In other words, we are looking at how different models "conceptualize" other discourses and measuring the consistency between those conceptualizations rather than measuring the similarity between the concepts themselves.</p><p>The LDA models trained on the discussions of r/Buddhism and r/Christianity can be thought of as representations of their corresponding discourses, where we understand a discourse as a way of understanding the world or some aspects of it <ref type="bibr" target="#b17">[18]</ref>. While useful, these representations are not perfect, functioning more like metonyms of the corresponding discourses <ref type="bibr" target="#b31">[32]</ref>. We propose thinking of LDA models as not only representations of a discourse, but also as operationalizations of a discourse in that we can deploy the organizational scheme of the model in novel contexts to see how the model organizes new information, i.e., how it conceptualizes. In addition to learning features and a representation of the training corpus as mixtures of those features, a trained LDA model also has the ability to infer the topic mixtures of new documents using the posterior parameter for document-topic distributions (typically notated as α) that becomes the prior in the inference process for new documents' topic distributions. When inferring the topic distributions of unseen documents, this prior acts as the conceptual disposition of the model, which is taken in along with the observed text of the new document to determine its topic distribution. If we were to ask the model to infer the topic mixture of a blank document, it would simply assign this prior topic mixture.</p><p>Contrary to the usual goals of machine learning, we do not want these discourse models to generalize beyond their training data. Instead, we want them to reflect only the conceptual schemes latent in their training corpus. Rather than examine the similarity between the features of the models, which reflect differences in lexical content, we are interested in the mutual consistency between how the models conceptualize-do certain features tend to be co-applied to documents regardless of the lexical differences that constitute those features?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Information theory</head><p>Information theory provides a useful means for quantifying the kinds of relationships we are trying to uncover between discourses. To quantify the consistency with which two LDA models conceptualize a discourse, we use the mutual information between each model's topic assignments. Introduced in the context of communication channels by <ref type="bibr" target="#b35">[36]</ref>, the mutual information of two random variables, I(X; Y ), quantifies the reduction in uncertainty about X (or Y ) that is provided by knowing Y (or X) given in bits <ref type="bibr" target="#b9">[10]</ref>. A common usage of mutual information is to measure how similarly two clustering schemes partition a set of observations (e.g., <ref type="bibr" target="#b12">[13]</ref>). Typically, this is done for hard clusterings in which each observation is assigned to a single class, as distinct from LDA, which assigns observations (documents) to a mixture of multiple classes (topics). A method for "hardening" topic mixtures from LDA is proposed by <ref type="bibr" target="#b39">[40]</ref>. However, we calculate the mutual information between the probabilistic clusters of documents, following <ref type="bibr" target="#b20">[21]</ref>, which does have some complications. Other information theoretic quantities exist for comparing two clusterings, including variations based on mutual information (e.g., <ref type="bibr" target="#b27">[28]</ref>) and the metric, variation of information <ref type="bibr" target="#b24">[25]</ref>, which we plan to compare with the standard mutual information in further work.</p><p>Additionally, measures of information divergence provide a useful means for quantifying how lexically distinct two discourses are and how distinguishing each term is individually. One such quantity, the Kullback-Leibler divergence, provides an asymmetric measure of how much one probability distribution differs from an expectation based on another distribution <ref type="bibr" target="#b19">[20]</ref>. The Kullback-Leibler divergence (or KLD) has been previously used alongside LDA to characterize the reading behavior of Charles Darwin <ref type="bibr" target="#b26">[27]</ref>, innovation within parliamentary speeches <ref type="bibr" target="#b3">[4]</ref>, and legislative change <ref type="bibr" target="#b38">[39]</ref>. The Jensen-Shannon divergence (or JSD) is a symmetrical divergence derived from the KLD <ref type="bibr" target="#b21">[22]</ref>. The JSD has been previously used to measure the distinguishability between distributions of features from violent and non-violent court trials <ref type="bibr" target="#b18">[19]</ref>. It has also been used to measure the difference between LDA topics (e.g., <ref type="bibr" target="#b25">[26]</ref>).</p><p>The contribution of each feature to the total JSD between distributions can also be calculated. For example, this is done in <ref type="bibr" target="#b18">[19]</ref> to identify which trial features most distinguish violent from non-violent trials and vice versa. We use the JSD between the relative frequencies of each word between subreddits to quantify how lexically distinct the discourses of the subreddits are from each other. Additionally, we can characterize the extent to which each word functions as part of a discourse's lexicon by calculating each word's individual contribution to the total JSD between discourses. In this context, a word's contribution to the JSD between discourses represents how strongly the word implies one discourse over another.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods and Data</head><p>In this section, we describe our data collection, preprocessing steps, and put together our framework built from the topics introduced in the previous section, explaining it in parallel to the methods we use to compare the discourses of r/Buddhism and r/Christianity.<ref type="foot" target="#foot_0">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data collection and preprocessing</head><p>We collect data from the two subreddits of primary interest, r/Buddhism and r/Christianity, as well as for r/religion and r/math. For the subreddits of interest, we first collected all available submission IDs from the creation date of the subreddit through the end of 2019. These submission IDs were collected from the service PushShift.io (using the Python wrapper PSAW), which maintains historical data from Reddit. We then used Reddit's own Application Programming Interface (API) (using the Python wrapper, PRAW) to collect the submission title, body text, and all comments for each submission ID, which were written to CSV files along with relevant metadata such as user ID and timestamps.</p><p>After collecting the submissions from each subreddit, we performed basic preprocessing on the text. Tokens are lowercase strings with a minimum length of three characters. URLs are tokenized so that they are reduced to their hostname with hyphens replacing any punctuation (e.g., "en.wikipedia.org" becomes "en-wikipedia-org"). References to users and subreddits are preceded by "u/" and "r/" respectively. We preserve these indicators when tokenizing so that a distinction is made in cases where a user name or subreddit name overlaps with another word type. For example, if a comment references the subreddit, r/Buddhism, that reference will be assigned to the word type, "r-buddhism" in order to distinguish it from the word type, "buddhism." Tokens other than URLs, user names, and subreddit names do not include punctuation or numeric characters.</p><p>We created a custom set of 42 stopwords from the most frequent words in each subreddit which were removed from all documents. Additionally, words which occurred in fewer than five documents within each subreddit were removed from all documents. After word removal, the final vocabulary was limited to words that were within the 30,000 most frequent words of a subreddit. Using this final vocabulary, only documents with 20 or more tokens were included in each subreddit's corpus. An overview of the data collected can be seen in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Quantifying the lexical distinctness of discourses</head><p>While it might be reasonable to take it for granted that the discourses of r/Buddhism and r/Christianity use cultural lexicons that distinguish each from the other, we use the JSD between the relative word frequencies from each subreddit to quantify the degree to which they are lexically distinct from each other. For each word type in the combined vocabulary of the subreddits, we calculate the probability of each word within a subreddit as the number of times that word occurs divided by the total number of tokens present in all documents from that subreddit. We then calculate the JSD between the two distributions for each pair of subreddits under consideration. Additionally, we calculate the individual contributions of each word to the JSD between r/Buddhism and r/Christianity to see if the words which contribute the most to the total JSD reasonably correspond to what we would expect to see in the cultural lexicons of the subreddits. The way in which we calculate the JSD contribution of each term differs slightly from the method used by <ref type="bibr" target="#b18">[19]</ref>. There, the authors calculate the partial KLD of each feature from one distribution to the mean of the two distributions, which quantifies how much each feature signals one particular distribution over the other. Here, we simply calculate the perfeature JSD contributions by calculating the partial KLD of each feature for both distributions. This results in two partial KLD values for each feature from which we take the mean to get the partial JSD of the feature. Done this way, we can see which terms are most distinguishing between the two subreddits from both directions, rather than which terms distinguish one subreddit over the other.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Structural comparisons between discourses</head><p>We now propose and explain our implementation of the structural comparisons between the discourses of r/Buddhism and r/Christianity. We separately train LDA models with 30 topics on the r/Buddhism corpus and the r/Christianity corpus using the Gensim package for Python <ref type="bibr" target="#b33">[34]</ref>. For brevity, we will refer to the model trained on r/Buddhism as model B, and the 30topic model trained on r/Christianity as model C, and refer to the i th topic of a model as B.i or C.i. After training each model, we get three primary results: a set of "topics" (or features) as probability distributions over the vocabulary, a representation of all documents in the training corpus as distributions of topics, and a way to infer topic distributions for unseen documents. We qualitatively choose labels for each topic based on the highest probability words in the topic as well as close readings of exemplar documents of the topic.</p><p>A more common way to compare these two models would be to calculate the similarity or distance between the topics from one model and the topics from the other model (e.g., as in <ref type="bibr" target="#b25">[26]</ref>). However, we are less interested in how similar the models' topics are, and more interested in how similarly the models apply their topics. This is a substantial distinction. We are acknowledging that the two different models, trained on two different corpora, may have completely different topics. However, as long as the models apply those topics to documents in a mutually consistent fashion, the models functionally conceptualize the documents similarly.</p><p>Our assumption is that, if two models that organize input in a mutually consistent fashion, then they are similar at a structural level regardless of how different their particular features are from each other.</p><p>It is common to think of LDA models as primarily being the set of inferred topics, but this is only half of the full picture. In addition to their topics, LDA models instantiate a particular organizational scheme that takes input text and categorizes it as a mixture of those topics, weighing the observed text being input with a model's learned disposition for applying its topics. In other words, LDA models can be thought of as both a representation and an operationalization of a discourse, with topics being the former and the way in which models apply those topics to particular documents being the latter. Drawing on and extending the use of the term in <ref type="bibr" target="#b2">[3]</ref> and <ref type="bibr" target="#b14">[15]</ref>, we frame this activity of assigning topics to new information things as conceptualization-the activity of the model representing the novel information in terms of its own discursive features (topics) and dispositions (the trained model's posterior document-topic distribution parameters, which act as a prior when doing inference on new documents). By comparing how two models apply their topics to a set of documents (rather than comparing their topics directly), we are comparing how each model conceptualizes that particular set of documents. If two models conceptualize information in a mutually consistent way, then they share a kind of similarity that is deeper than the particular forms their concepts (or topics) take. This is what we are referring to as structural similarity, distinct from lexical similarity.</p><p>To quantify this shared consistency between two ways of conceptualizing input, we calculate the mutual information between their conceptualizations of a set of documents. Given two clusterings of the same set of objects, the mutual information between the two clusterings represents how much information knowing one cluster assignment provides for knowing the assignment made by the other clustering. As previously noted, mutual information is typically used to quantify the similarity between two "hard" clusterings-those in which each object is assigned to a single cluster. However, we calculate the mutual information between documenttopic distributions from two models following the proposed method in <ref type="bibr" target="#b20">[21]</ref> by multiplying the transposed document-topic probability matrix from one model with the document-topic matrix of the other model to create a kind of contingency table from which the joint and marginal probabilities of the topics from the two models can be calculated.</p><p>Calculating the mutual information between two LDA models in this way requires us to choose the set of documents across which the two models will be compared, and there is no reason to suppose that the mutual information between models will be the same when different document sets are used when calculating it. If we assume that the topic assignments made on the same documents which were used to train the model are the "true" topic assignments of that corpus, we can think of the mutual information between the models based on that corpus as representing how well the other model is able to interpret the first.</p><p>For example, when comparing models B and C on the r/Buddhism corpus used to train model B, we consider the topic assignments made by model B to be the "true" assignments, since B was trained from this corpus. The topic assignments made by model C, on the other hand, represent something very different. Model C, acting as an extension of its training corpus, is forced to apply its own set of topics (or contexts) from r/Christianity to r/Buddhism. In other words, model C conceptualizes r/Buddhism based on the broad discourse underlying r/Christianity. So if model B assigns a document from r/Buddhism to have high probability of topic B.i, the topic assignment made by model C can be understood as model C's interpretation of B.i. If model C is highly certain about how to assign a topic mixture, perhaps assigning the document to have high probability for topic C.j, then model C can be understood as interpreting B.i as C.j within the context of this single document. If, on the other hand, model C is highly uncertain about how to assign a topic mixture to the document, the resulting distribution of topics may be highly spread out, lacking a clear mapping from model B to C.</p><p>As this is repeated over all of the documents from r/Buddhism, if B.i and C.j continue to occur with high probability in the same documents, then the association between them (in the context of r/Buddhism) continues to strengthen. If, however, model C applies a variety of topics to documents with topic B.i, whether by topic distributions that are continually spread out over the topics or by applying high probability topics which vary from document to document, then the interpretation of B.i by model C becomes less clear. This relationship between the topics is quantified by the mutual information between the models. Specific relationships between a single topic from one model with a single topic from the other model are quantified by the pointwise mutual information between them. The mutual information is simply the expected pointwise mutual information between all topic pairs across models. Importantly, the mutual information between two models is contingent on the set of documents over which they are compared. As we will show, the mutual information between B and C will depend on the comparison corpus, and more notably, the strongest mappings between topic pairs will also depend on the comparison corpus.</p><p>The argument we are exploring here is that if two models representing two lexically distinct discourses are functionally similar (in that they organize information similarly), then the discourses represented by the two models are structurally similar-the two discourses divide aspects of the world up into similar categories, despite using different lexical items to describe the categories. The degree of structural similarity between models is reflected in the mutual information between them on a particular discourse.</p><p>In the present article, we empirically explore this argument by comparing how models trained on the discourses of r/Buddhism and r/Christianity interpret each other by calculating the mutual information between their topic assignments twice: once for each corpus to act as the comparison corpus. To contextualize these results, we compare them to the self-mutual information of each corpus and corresponding model. We also compare how each model interprets models trained on the discussions of r/math and r/religion. For each comparison, we refer to the model trained on the comparison corpus as the source model and refer to the model trained on a corpus other than the comparison corpus as the interpreting model.</p><p>To better understand what the mutual information between models represents, we look at which topic pairs between models B and C have the largest pointwise mutual information. To assess how different our proposed method for comparing models is from a direct comparison of topics between models, we also calculate the distance between all topic pairs using the Jensen-Shannon divergence.</p><p>To get a sense of how dependent these results are on using models with 30 topics, we also train models on each subreddit with 60 topics and calculate the mutual information between them. To differentiate models with different numbers of topics, we subscript k with the model name (e.g., B 30 or C 60 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>In this section, we report the results of our methodology within the narrow case study of the subreddits previously described. Our goal in reporting these results is to illustrate the empirical implications of the way we have defined and operationalized the notion of structural similarity.  information of model C 30 . We likewise find 0.125 bits of mutual information between B 60 and C 60 (40% of the self-mutual information of C 60 ). These results can be understood to reflect how well the two models-and as imperfect representations, the two discourses-interpret each other. In the form of their corresponding models, the discourse of r/Christianity is capable of interpreting the discourse of r/Buddhism better than r/Buddhism can interpret r/Christianity. This is true in the case of the models with 30 topics as well as the 60-topic models (see Tables <ref type="table" target="#tab_3">5 and 6</ref>).</p><p>Simply knowing the mutual information values between does not provide strong intuitions about their structural similarity, so we contextualize these values with comparisons to r/religion and r/math. Given the number of topics in the model trained on r/religion that reflect generally Abrahamic and monotheistic religious concerns, we expect r/Christianity and r/religion to have higher structural similarity with each other than any other subreddit pairing. We find that the largest mutual information between any two subreddits occurs between r/Christianity and r/religion when the comparison corpus is r/Christianity. This is the case for both the 30-topic and 60-topic models. In the case of the 30-topic models, r/religion interprets r/Christianity with 54% of the self-mutual information of r/Christianity, the third-highest. In the 60topic models, r/religion interprets r/Christianity with 63% of the self-mutual information of r/Christianity, rising to the second-highest.</p><p>In the case of r/math, we expect both r/Buddhism and r/Christianity to be highly distinct, both lexically and structurally. Accordingly, the four comparisons done between the subreddits of interest and r/math generate the lowest four mutual information values (as percentages of the appropriate self-mutual information). This is true for the 30-topic models and for the 60-topic models. Mutual information values for models with 30 topics can be seen in Table <ref type="table" target="#tab_3">5</ref>, and values for models with 60 topics can be seen in Table <ref type="table">6</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Pointwise mutual information between topics</head><p>While the mutual information between models on a comparison corpus provides a high-level picture of the relationship between models, it is also possible to dig into which features of the discourses are mapped together by looking at which topic pairs between models have the highest pointwise mutual information. For brevity, we only focus on the 30-topic models, B 30 and C 30 , as a case study for which we obtain all 900 pointwise mutual information values for each combination of topics for both r/Buddhism and r/Christianity with each as the source corpus.</p><p>As examples, we report the ten topic pairs with the highest pointwise mutual information in Tables <ref type="table" target="#tab_5">7 and 8</ref>, annotated with our qualitative topic labels based on high-probability topic words and close readings of exemplar documents for each topic. Notably, these examples reveal that, despite their lexical differences, these mappings appear surprisingly reasonable in many cases. The topic pairs with high pointwise mutual information suggest interesting analogies. For example, the association that emerges between topics B.24 and C.18 suggests that discussions about dietary ethics are to r/Buddhism what discussions about abortion are to r/Christianity. The content of these discussions is considerably distinct lexically. Yet, these divisive ethical and moral debates occur in both subreddits with the particular focus of the debates marking the discourse as that of r/Christianity (in the case of abortion) or of r/Buddhism (in the case of eating meat).</p><p>This example provides important clues as to how this method of comparison works. When model B 30 encounters discussions about abortion in r/Christianity, it is confronted with terms that are not prominent in its training corpus from r/Buddhism. None of the topics in model B 30 include the term "abortion" as a high-probability term and so the term does not play much of a role in model B 30 choosing an appropriate topic mixture. Instead, model B 30 is forced to ignore lexically distinct terms like "abortion" in favor of terms that are less distinguishing between the two discourses. Thus a common structural property between discourses emerges that we might label as something that is non-discourse specific such as "contentious ethical issues."</p><p>Additionally, we find that the relative strength of the associations between topics is dependent on the comparison corpus used. The interpretation by model C 30 of B.24 as C.18 has the second-highest pointwise mutual information (see Table <ref type="table" target="#tab_4">7</ref>), whereas the interpretation by model B 30 of C.18 as B.24 ranks tenth (see Table <ref type="table" target="#tab_5">8</ref>).</p><p>In order to assess how different these topic mappings are from those we might get using a more standard method of comparing topics directly, we calculate the Jensen-Shannon divergence between each pair of topics between model B 30 and model C 30 . The ten most similar topic pairs (i.e., those with the lowest Jensen-Shannon divergence) can be seen in Table <ref type="table" target="#tab_6">9</ref>. We find that, while overlap certainly exists, the ten most similar pairs of topics between models are not necessarily those that appear most salient when making indirect comparisons within the context of a comparison corpus.</p><p>Topics B.16 and C.15 appear as the most similar when compared directly in this way. This is also true when compared indirectly through the interpretation of r/Buddhism by r/Christianity (in the form of model B 30 and C 30 ) as shown in Table <ref type="table" target="#tab_4">7</ref>. However, this topic pair is ranked twelfth when indirectly compared through the interpretation of r/Christianity by r/Buddhism. Evidently, the choice of comparison corpus is consequential for how salient the same topic pair is within the comparison. The extent of how consequential the differences are between direct and indirect comparisons can be severe. When r/Buddhism interprets r/Christianity, the relationship between C.04 and B.07 is strongest. When r/Christianity interprets r/Buddhism, this pairing is ranked 32nd. When compared directly using the Jensen-Shannon divergence, the pair is ranked 672nd. Clearly, indirect comparisons between topics within the context specified by a comparison corpus are capable of painting substantially different pictures of how the features between two models are mapped.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>Our goal in reporting the above results is not to prove the validity of our operationalization of structural similarity but to provide a glimpse of what this operationalization looks like within a narrow case study, and to see how closely the results of this case study conform to our intuitions.</p><p>Further work is therefore necessary to continue exploring the method we have proposed here. While we have suggested one possible operationalization of structural similarity, there are likely to be many different possible operationalizations which may overcome limitations present in ours.</p><p>An important limitation of the analysis we present here is that we have only considered two sets of LDA models for representing the discourses. LDA models trained on the same corpus and with the same parameters may still exhibit differences due to the randomness in the training process. For this reason, it is possible that particularities within these models may produce mutual information that is highly dependent on those particularities. In future work, we will examine the relationships between corpora where each is represented by a variety of LDA models in order to get a more robust reading of the mutual information that tends to occur between models trained on different corpora.</p><p>We believe that an important strength of the approach we outline here is that it does not require any significant modifications to each corpus beyond standard preprocessing. However, our next steps will include an approach in which each corpus is modified in such a way that it is forced to be less lexically distinct from the corpora with which it is compared. Possibilities for reducing the lexical distinctness between two corpora might include the removal of certain terms based on their contribution to the JSD between the vocabulary distributions of the corpora being compared. Additionally, the methods put forward by <ref type="bibr" target="#b41">[42]</ref> to reduce the correlation between the topics of an LDA model and metadata may be appropriate for this context as well.</p><p>If our attempt at quantifying structural relationships between discourses has some validity, we can begin to explore comparative religion (and perhaps comparative culture more broadly conceived) as a meta-clustering problem in which relationships between various clustering schemes learned from different discourses suggest similarities and differences that go far deeper than lexical distinctions. This is similar to the meta-clustering problem described in <ref type="bibr" target="#b14">[15]</ref>, except in that case, the different clusterings being compared are all learned from the same set of observations. Our case, wherein each clustering is learned from a different set of observations, brings up additional complications. Most importantly, it is not clear whether or not the structural similarity, as we have defined it here, between two discourses is stable across various contexts in which the discourses are compared (i.e., the comparison corpus). As our results show, the structural similarity is contingent on the context in which the discourses are compared. However, it is possible that, as two discourses are compared within a greater variety of comparison corpora, that their structural similarity becomes stable. Even if a stable trend of structural similarity does not emerge between discourses, then examining the contexts in which their structural similarity differs should still offer useful insights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Drawing from the comparative religion research in <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b10">11]</ref> and the framing of unsupervised machine learning models as conceptualization schemes found in <ref type="bibr" target="#b14">[15]</ref> and <ref type="bibr" target="#b2">[3]</ref>, we have proposed a computational theory of the structural similarity between lexically distinct religious discourses-discourses that are characterized by distinct lexicons. We have argued that, if two unsupervised machine learning models organize information with a high degree of mutual consistency as quantified by the mutual information between them, then they share a high degree of structural similarity, regardless of the lexical distinctions between the models' representa-tions. Using latent Dirichlet allocation as our model of choice, we developed our theory and explored its empirical implications for a case study comparing the discourses of two discussion communities from Reddit: r/Buddhism and r/Christianity. The results from this case study suggest that our method for quantifying structural similarity has merit and warrants further exploration.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of Collected Data</figDesc><table><row><cell></cell><cell>Subscribers as</cell><cell cols="3">Accessible Submissions Raw Vocab</cell></row><row><cell>Subreddit</cell><cell cols="2">Date Created of 2020-06-22 Submissions</cell><cell>in Corpus</cell><cell>Size</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Word Types with the Largest JSD Contributions Between r/Buddhism and r/Christianity</figDesc><table><row><cell></cell><cell>Contribution</cell><cell>Contribution</cell></row><row><cell cols="2">Word Type to JSD (bits) Word Type</cell><cell>to JSD (bits)</cell></row><row><cell>god</cell><cell>5.66 × 10 −3 sin</cell><cell>9.67 × 10 −4</cell></row><row><cell>buddhism</cell><cell>2.61 × 10 −3 christians</cell><cell>9.34 × 10 −4</cell></row><row><cell>buddha</cell><cell>2.42 × 10 −3 mind</cell><cell>8.10 × 10 −4</cell></row><row><cell>jesus</cell><cell>2.02 × 10 −3 self</cell><cell>7.04 × 10 −4</cell></row><row><cell>church</cell><cell>1.79 × 10 −3 suffering</cell><cell>6.78 × 10 −4</cell></row><row><cell>buddhist</cell><cell>1.78 × 10 −3 dharma</cell><cell>6.66 × 10 −4</cell></row><row><cell>bible</cell><cell>1.53 × 10 −3 path</cell><cell>6.30 × 10 −4</cell></row><row><cell>christ</cell><cell>1.26 × 10 −3 zen</cell><cell>5.55 × 10 −4</cell></row><row><cell>meditation</cell><cell>1.22 × 10 −3 karma</cell><cell>5.38 × 10 −4</cell></row><row><cell>practice</cell><cell>1.13 × 10 −3 faith</cell><cell>5.20 × 10 −4</cell></row><row><cell>christian</cell><cell>1.04 × 10 −3 enlightenment</cell><cell>5.16 × 10 −4</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Self-Mutual Information of Models</figDesc><table><row><cell>Training</cell><cell cols="2">Self-MI (bits) Self-MI (bits)</cell></row><row><cell>Corpus</cell><cell>k = 30</cell><cell>k = 60</cell></row><row><cell>r/math</cell><cell>0.629</cell><cell>0.547</cell></row><row><cell>r/Christianity</cell><cell>0.401</cell><cell>0.309</cell></row><row><cell>r/Buddhism</cell><cell>0.306</cell><cell>0.236</cell></row><row><cell>r/religion</cell><cell>0.250</cell><cell>0.265</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Mutual Information Between Models with 30 Topics</figDesc><table><row><cell>Interpreting</cell><cell>Source</cell><cell></cell><cell>Percent of</cell></row><row><cell cols="4">Model Corpus Model Corpus MI (bits) Source Self-MI</cell></row><row><cell cols="2">r/Christianity r/religion</cell><cell>0.198</cell><cell>79%</cell></row><row><cell cols="2">r/Christianity r/Buddhism</cell><cell>0.182</cell><cell>59%</cell></row><row><cell>r/religion</cell><cell>r/Christianity</cell><cell>0.218</cell><cell>54%</cell></row><row><cell>r/religion</cell><cell>r/Buddhism</cell><cell>0.137</cell><cell>45%</cell></row><row><cell>r/Buddhism</cell><cell>r/religion</cell><cell>0.108</cell><cell>43%</cell></row><row><cell>r/Buddhism</cell><cell>r/Christianity</cell><cell>0.168</cell><cell>42%</cell></row><row><cell>r/math</cell><cell>r/Buddhism</cell><cell>0.095</cell><cell>31%</cell></row><row><cell cols="2">r/Christianity r/math</cell><cell>0.167</cell><cell>27%</cell></row><row><cell>r/Buddhism</cell><cell>r/math</cell><cell>0.139</cell><cell>22%</cell></row><row><cell>r/math</cell><cell>r/Christianity</cell><cell>0.078</cell><cell>20%</cell></row><row><cell>Table 6</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Mutual Information Between Models with 60 Topics</cell><cell></cell><cell></cell></row><row><cell>Interpreting</cell><cell>Source</cell><cell></cell><cell>Percent of</cell></row><row><cell cols="4">Model Corpus Model Corpus MI (bits) Source Self-MI</cell></row><row><cell cols="2">r/Christianity r/religion</cell><cell>0.183</cell><cell>69%</cell></row><row><cell>r/religion</cell><cell>r/Christianity</cell><cell>0.196</cell><cell>63%</cell></row><row><cell cols="2">r/Christianity r/Buddhism</cell><cell>0.133</cell><cell>57%</cell></row><row><cell>r/religion</cell><cell>r/Buddhism</cell><cell>0.118</cell><cell>50%</cell></row><row><cell>r/Buddhism</cell><cell>r/Christianity</cell><cell>0.125</cell><cell>40%</cell></row><row><cell>r/Buddhism</cell><cell>r/religion</cell><cell>0.098</cell><cell>37%</cell></row><row><cell>r/math</cell><cell>r/Buddhism</cell><cell>0.081</cell><cell>34%</cell></row><row><cell cols="2">r/Christianity r/math</cell><cell>0.143</cell><cell>26%</cell></row><row><cell>r/math</cell><cell>r/Christianity</cell><cell>0.075</cell><cell>24%</cell></row><row><cell>r/Buddhism</cell><cell>r/math</cell><cell>0.117</cell><cell>21%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 7</head><label>7</label><figDesc>Ten Topic Pairs with Highest PMI Between 30-topic Models Trained on r/Buddhism and r/Christianity and Compared on the Documents of r/Buddhism</figDesc><table><row><cell>r/Buddhism</cell><cell>r/Christianity</cell><cell>Pointwise</cell></row><row><cell>Source Topics</cell><cell>Interpreted Topics</cell><cell>Mutual Information</cell></row><row><cell>B.16 Relationships</cell><cell>C.15 Relationships</cell><cell>3.095</cell></row><row><cell>B.24 Dietary Ethics &amp; Meat</cell><cell>C.18 Abortion</cell><cell>2.797</cell></row><row><cell>B.05 Repeated Text</cell><cell>C.27 Repeated Text: Moderators</cell><cell>2.761</cell></row><row><cell>B.05 Repeated Text</cell><cell>C.10 Repeated Text: Verse Bot</cell><cell>2.743</cell></row><row><cell>B.21 Intl. Politics &amp; Conflict</cell><cell>C.08 American Politics &amp; Race</cell><cell>2.670</cell></row><row><cell>B.12 Text Quotations</cell><cell>C.23 Bible Verses</cell><cell>2.665</cell></row><row><cell>B.25 Precepts</cell><cell>C.25 Sex &amp; Morality</cell><cell>2.617</cell></row><row><cell cols="2">B.03 Monastic Practice &amp; Monks C.03 Churches &amp; Fellowship</cell><cell>2.583</cell></row><row><cell>B.25 Precepts</cell><cell>C.29 Sexual Preferences</cell><cell>2.295</cell></row><row><cell>B.26 Source Text Discussion</cell><cell>C.22 Historical Jesus &amp; Accuracy</cell><cell>2.156</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 8</head><label>8</label><figDesc>Ten Topic Pairs with Highest PMI Between 30-topic Models Trained on r/Buddhism and r/Christianity and Compared on the Documents of r/Christianity</figDesc><table><row><cell>r/Christianity</cell><cell>r/Buddhism</cell><cell>Pointwise</cell></row><row><cell>Source Topics</cell><cell>Interpreted Topics</cell><cell>Mutual Information</cell></row><row><cell>C.04 Prayer</cell><cell>B.07 Schools &amp; Sects</cell><cell>3.113</cell></row><row><cell cols="2">C.27 Repeated Text: Moderators B.05 Repeated Text</cell><cell>3.103</cell></row><row><cell>C.16 Health</cell><cell>B.19 Mental Health</cell><cell>2.844</cell></row><row><cell>C.14 Science &amp; Evolution</cell><cell>B.17 Mind &amp; Reality</cell><cell>2.821</cell></row><row><cell>C.25 Sex &amp; Morality</cell><cell>B.25 Precepts</cell><cell>2.711</cell></row><row><cell cols="2">C.01 Resources &amp; Bible Versions B.06 Resources</cell><cell>2.705</cell></row><row><cell>C.03 Churches &amp; Fellowship</cell><cell>B.03 Monastic Practice &amp; Monks</cell><cell>2.538</cell></row><row><cell cols="2">C.01 Resources &amp; Bible Versions B.26 Source Text Discussion</cell><cell>2.478</cell></row><row><cell>C.29 Sexual Preferences</cell><cell>B.25 Precepts</cell><cell>2.382</cell></row><row><cell>C.18 Abortion</cell><cell>B.24 Dietary Ethics &amp; Meat</cell><cell>2.274</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 9</head><label>9</label><figDesc>Ten Topic Pairs from 30-topic Models with Lowest Jensen-Shannon Divergence</figDesc><table><row><cell>r/Buddhism</cell><cell>r/Christianity</cell><cell>Jensen-Shannon</cell></row><row><cell>Topics</cell><cell>Topics</cell><cell>Divergence (bits)</cell></row><row><cell>B.16 Relationships</cell><cell>C.15 Relationships</cell><cell>0.153</cell></row><row><cell cols="2">B.09 Debate, Opinions, Questions C.28 Debate, Non-Christians, Criticisms</cell><cell>0.159</cell></row><row><cell>B.00 Advice</cell><cell>C.17 Advice</cell><cell>0.185</cell></row><row><cell cols="2">B.09 Debate, Opinions, Questions C.09 Debate, Theology, Apologetics</cell><cell>0.222</cell></row><row><cell>B.18 Dealing with People</cell><cell>C.17 Advice</cell><cell>0.258</cell></row><row><cell cols="2">B.09 Debate, Opinions, Questions C.07 Bible &amp; Interpretation</cell><cell>0.270</cell></row><row><cell>B.17 Mind &amp; Reality</cell><cell>C.09 Debate, Theology, Apologetics</cell><cell>0.271</cell></row><row><cell>B.18 Dealing with People</cell><cell>C.28 Debate, Non-Christians, Criticisms</cell><cell>0.282</cell></row><row><cell>B.21 Intl. Politics &amp; Conflict</cell><cell>C.21 Money &amp; Society</cell><cell>0.293</cell></row><row><cell>B.00 Advice</cell><cell>C.11 References, Stories, Humor</cell><cell>0.294</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">All code used for this analysis is available at https://github.com/zacharykstine/chr2020_comp_relg_lda.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research is funded in part by grants from the U.S. National Science Foundation (OIA-1946391, OIA-1920920, IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2675, N00014-17-1-2605, N68335-19-C-0359, N00014-19-1-2336, N68335-20-C-0540), U.S. Air Force Research Lab, U.S. Army Research Office (W911NF-17-S-0002, W911NF-16-1-0189), U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock, and the Australian Department of Defense Strategic Policy Grants Program (SPGP) (award number: 2020-106-094) to the third co-author, Nitin Agarwal. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researcher gratefully acknowledges the support.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>r/Buddhism 2008-03-25 254,693 87,792 66,108 223,356 r/Christianity 2008-01-25 241,539 412,930 298,502 618,370 r/math 2008-01-24 1,198,611 155,873 103,471 237,742 r/religion 2008-01-25 53,167 88,390 31,283 160,562</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Lexical comparisons</head><p>When we calculate the JSD between the vocabulary distributions of each subreddit, we find that r/Christianity and r/religion have the least divergence between them. In other words, they are the least lexically distinct pair. We also find that r/Buddhism is slightly less lexically distinct from r/religion than from r/Christianity. The JSD values between the vocabulary distributions of the subreddits are given in Table <ref type="table">2</ref>. The relationships between subreddits that emerge from their lexical distinctness provide a good baseline against which we can compare their structural similarity. As we will show in the sections below, there is some disagreement between the ordering of lexical similarity between subreddits in Table <ref type="table">2</ref> with the orderings we obtain from their structural similarity reported in the subsections that follow. This disagreement, though slight, is an encouraging sign that our approach to calculating structural similarity is not simply a more complicated, but functionally equivalent, calculation of the lexical similarityit measures something different.</p><p>A sample of the twenty-two words with the largest contributions to the JSD between the vocabulary distributions of r/Buddhism and r/Christianity is provided in Table <ref type="table">3</ref>. Most of these highly distinguishing terms are reasonable candidates for the cultural lexicons of either subreddit. Some terms such as "practice" or "suffering" may not be unique to a single religious lexicon. However, their relatively large JSD contributions indicate that they are highly distinguishing terms between the subreddits-they are strong signals of one discourse over the other. Importantly, this way of quantifying the extent to which a word functions as part of a discourse's lexicon is dependent on the discourse it is being compared with.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Mutual information between models</head><p>Given that we are calculating mutual information between probabilistic clusterings of documents, we first calculate the mutual information between each model and itself. This selfmutual information for each model gives us a rough sense of the maximum mutual information possible for the corpus on which the model was trained. For that reason, when we report mutual information between models trained on different corpora, we also report what percentage of the self-mutual information that value is, according to the self-mutual information of the source corpus. The self-mutual information values for each subreddit can be found in Table <ref type="table">4</ref>.</p><p>When we calculate the mutual information between models B 30 and C 30 with r/Buddhism as the comparison corpus, we get 0.182 bits or 59% of the mutual information model B 30 has with itself. Similarly, we find that B 60 and C 60 have mutual information of 0.133 bits (57% of the self-mutual information of B 60 ). When we calculate the mutual information between B 30 and C 30 within the context of the r/Christianity corpus, we get 0.168 bits or 42% of the self-mutual</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">LDA Topic Modeling: Contexts for the History &amp; Philosophy of Science</title>
		<author>
			<persName><forename type="first">C</forename><surname>Allen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Murdock</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Dynamics of Science: Computational Frontiers in History and Philosophy of Science</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Ramsey</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>De Block</surname></persName>
		</editor>
		<meeting><address><addrLine>Pittsburgh</addrLine></address></meeting>
		<imprint>
			<publisher>Pittsburgh University Press</publisher>
			<date type="published" when="2020-05">May 2020</date>
		</imprint>
	</monogr>
	<note>Preprint of a chapter forthcoming</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Narrative Paths and Negotiation of Power in Birth Stories</title>
		<author>
			<persName><forename type="first">M</forename><surname>Antoniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Levy</surname></persName>
		</author>
		<idno type="DOI">10.1145/3359190</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Hum.-Comput. Interact. 3</title>
				<meeting>ACM Hum.-Comput. Interact. 3</meeting>
		<imprint>
			<publisher>CSCW</publisher>
			<date type="published" when="2019-11">Nov. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Typlogies and Taxonomies: An Introduction to Classification Techniques</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">D</forename><surname>Bailey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Quantitative Applications in the Social Sciences</title>
				<meeting><address><addrLine>Beverly Hills, CA</addrLine></address></meeting>
		<imprint>
			<publisher>Sage</publisher>
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Individuals, institutions, and innovation in the debates of the French Revolution</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T J</forename><surname>Barron</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1717729115</idno>
		<ptr target="https://www.pnas.org/content/115/18/4607" />
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<idno type="ISSN">0027-8424</idno>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="issue">18</biblScope>
			<biblScope unit="page" from="4607" to="4612" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Latent Dirichlet Allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Making Space for Religion in Internet Studies</title>
		<author>
			<persName><forename type="first">H</forename><surname>Campbell</surname></persName>
		</author>
		<idno type="DOI">10.1080/01972240591007625</idno>
	</analytic>
	<monogr>
		<title level="j">The Information Society</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="309" to="315" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The Internet&apos;s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales</title>
		<author>
			<persName><forename type="first">E</forename><surname>Chandrasekharan</surname></persName>
		</author>
		<idno type="DOI">10.1145/3274301</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Hum.-Comput. Interact. 2</title>
				<meeting>ACM Hum.-Comput. Interact. 2</meeting>
		<imprint>
			<publisher>CSCW</publisher>
			<date type="published" when="2018-11">Nov. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">You Can&apos;t Stay Here: The Efficacy of Reddit&apos;s 2015 Ban Examined Through Hate Speech</title>
		<author>
			<persName><forename type="first">E</forename><surname>Chandrasekharan</surname></persName>
		</author>
		<idno type="DOI">10.1145/3134666</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Hum.-Comput. Interact. 1</title>
				<meeting>ACM Hum.-Comput. Interact. 1</meeting>
		<imprint>
			<publisher>CSCW</publisher>
			<date type="published" when="2017-12">Dec. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Religion as a Complex and Dynamic System</title>
		<author>
			<persName><forename type="first">F</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Squiers</surname></persName>
		</author>
		<idno type="DOI">10.1093/jaarel/lft016</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Academy of Religion</title>
		<idno type="ISSN">0002-7189</idno>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="357" to="398" />
			<date type="published" when="2013-04">Apr. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Elements of Information Theory</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Cover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Thomas</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>John Wiley &amp; Sons, Inc</publisher>
			<pubPlace>Hoboken, NJ</pubPlace>
		</imprint>
	</monogr>
	<note>2nd</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Engaged Buddhist ethics: Mistaking the boat for the shore</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Deitrick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Action Dharma: New Studies in Engaged Buddhism</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Queen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Prebish</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Keown</surname></persName>
		</editor>
		<meeting><address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<publisher>RoutledgeCurzon</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="252" to="269" />
		</imprint>
	</monogr>
	<note>RoutledgeCurzon Critical Studies in Buddhism</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Deitrick</surname></persName>
		</author>
		<idno>UMI Number: 3041445</idno>
		<title level="m">Mistaking the Boat for the Shore?: A Critical Analysis of Socially Engaged Buddhism in the United States</title>
				<meeting><address><addrLine>Los Angeles, CA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
		<respStmt>
			<orgName>University of Southern California</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Information-Theoretic Co-Clustering</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Dhillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mallela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Modha</surname></persName>
		</author>
		<idno type="DOI">10.1145/956750.956764</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD &apos;03</title>
				<meeting>the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD &apos;03<address><addrLine>Washington, D.C.</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="89" to="98" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding</title>
		<author>
			<persName><forename type="first">P</forename><surname>Dimaggio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blei</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.poetic.2013.08.004</idno>
		<idno>doi:</idno>
		<ptr target="http://www.sciencedirect.com/science/article/pii/S0304422X13000661" />
	</analytic>
	<monogr>
		<title level="m">Topic Models and the Cultural Sciences</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="570" to="606" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">General purpose computer-assisted clustering and conceptualization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Grimmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>King</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1018067108</idno>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<idno type="ISSN">0027-8424</idno>
		<imprint>
			<biblScope unit="volume">108</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="2643" to="2650" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Studying the History of Ideas Using Topic Models</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the Conference on Empirical Methods in Natural Language Processing<address><addrLine>Honolulu, Hawaii</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="363" to="371" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Digital Humanities and the Study of Religion</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hutchings</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Between Humanities and the Digital</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Svensson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Goldberg</surname></persName>
		</editor>
		<meeting><address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>The MIT Press</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="283" to="294" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Discourse Analysis as Theory and Method</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jørgensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Phillips</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
			<publisher>Sage</publisher>
			<pubPlace>London</pubPlace>
		</imprint>
	</monogr>
	<note>1st</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">The civilizing process in London&apos;s Old Bailey</title>
		<author>
			<persName><forename type="first">S</forename><surname>Klingenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hitchcock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dedeo</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1405984111</idno>
		<ptr target="https://www.pnas.org/content/111/26/9419" />
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<idno type="ISSN">0027-8424</idno>
		<imprint>
			<biblScope unit="volume">111</biblScope>
			<biblScope unit="issue">26</biblScope>
			<biblScope unit="page" from="9419" to="9424" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">On Information and Sufficiency</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kullback</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Leibler</surname></persName>
		</author>
		<idno type="DOI">10.1214/aoms/1177729694</idno>
		<ptr target="https://doi.org/10.1214/aoms/1177729694" />
	</analytic>
	<monogr>
		<title level="j">Ann. Math. Statist</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="79" to="86" />
			<date type="published" when="1951-03">Mar. 1951</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Generalized information theoretic cluster validity indices for soft clusterings</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Computational Intelligence and Data Mining (CIDM)</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="24" to="31" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Divergence measures based on the Shannon entropy</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.1109/18.61115</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Information Theory</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="145" to="151" />
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Considering critical methods and theoretical lenses in digital religion studies</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lövheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Campbell</surname></persName>
		</author>
		<idno type="DOI">10.1177/1461444816649911</idno>
	</analytic>
	<monogr>
		<title level="j">New Media &amp; Society</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="14" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Comparison</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Martin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Guide to the Study of Religion</title>
				<editor>
			<persName><forename type="first">W</forename><surname>Braun</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Mccutcheon</surname></persName>
		</editor>
		<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<publisher>Cassell</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="45" to="56" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Comparing clusterings-an information based distance</title>
		<author>
			<persName><forename type="first">M</forename><surname>Meilă</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jmva.2006.11.013</idno>
		<idno>doi:</idno>
		<ptr target="http://www.sciencedirect.com/science/article/pii/S0047259X06002016" />
	</analytic>
	<monogr>
		<title level="j">Journal of Multivariate Analysis</title>
		<idno type="ISSN">0047-259X</idno>
		<imprint>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="873" to="895" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Is the Sample Good Enough? Comparing Data from Twitter&apos;s Streaming API with Twitter&apos;s Firehose</title>
		<author>
			<persName><forename type="first">F</forename><surname>Morstatter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International AAAI Conference on Web and Social Media</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Exploration and exploitation of Victorian science in Darwin&apos;s reading notebooks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Murdock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Allen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dedeo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.cognition.2016.11.012</idno>
		<idno>doi:</idno>
		<ptr target="http://www.sciencedirect.com/science/article/pii/S0010027716302840" />
	</analytic>
	<monogr>
		<title level="j">Cognition</title>
		<idno type="ISSN">0010-0277</idno>
		<imprint>
			<biblScope unit="volume">159</biblScope>
			<biblScope unit="page" from="117" to="126" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Improved mutual information measure for clustering, classification, and community detection</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E J</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">T</forename><surname>Cantwell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Young</surname></persName>
		</author>
		<idno type="DOI">10.1103/PhysRevE.101.042304</idno>
		<ptr target="https://link.aps.org/doi/10.1103/PhysRevE.101.042304" />
	</analytic>
	<monogr>
		<title level="j">Phys. Rev. E</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">42304</biblScope>
			<date type="published" when="2020-04">Apr. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">How we do things with words: Analyzing text as social and cultural data</title>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.01468</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Modeling the Contested Relationship between Analects, Mencius, and Xunzi: Preliminary Evidence from a Machine-Learning Approach</title>
		<author>
			<persName><forename type="first">R</forename><surname>Nichols</surname></persName>
		</author>
		<idno type="DOI">10.1017/S0021911817000973</idno>
	</analytic>
	<monogr>
		<title level="j">The Journal of Asian Studies</title>
		<imprint>
			<biblScope unit="volume">77</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="19" to="57" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Comparative Religion</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Paden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Routledge Companion to the Study of Religion</title>
				<editor>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Hinnells</surname></persName>
		</editor>
		<meeting><address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="208" to="225" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">There Will Be Numbers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
		<idno type="DOI">10.22148/16.006</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<date type="published" when="2016-05">May 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Prothero</surname></persName>
		</author>
		<title level="m">The White Buddhist: The Asian Odyssey of Henry Steel Olcott. 1st</title>
				<meeting><address><addrLine>IN</addrLine></address></meeting>
		<imprint>
			<publisher>Indiana University Press</publisher>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
		<author>
			<persName><forename type="first">R</forename><surname>Řehůřek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
		<ptr target="http://is.muni.cz/publication/884893/en.Valletta" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks<address><addrLine>Malta</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2010-05">May 2010</date>
			<biblScope unit="page" from="45" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Navigating the Local Modes of Big Data: The Case of Topic Models</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Stewart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tingley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Social Science: Discovery and Prediction</title>
				<editor>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Alvarez</surname></persName>
		</editor>
		<meeting><address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="51" to="97" />
		</imprint>
	</monogr>
	<note>Chap. 2</note>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">A mathematical theory of communication</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Shannon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Bell system technical journal</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="379" to="423" />
			<date type="published" when="1948">1948</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Who&apos;s Afraid of Reductionism? The Study of Religion in the Age of Cognitive Science</title>
		<author>
			<persName><forename type="first">E</forename><surname>Slingerland</surname></persName>
		</author>
		<idno type="DOI">10.1093/jaarel/lfn004</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Academy of Religion</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="375" to="411" />
			<date type="published" when="2008-03">Mar. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">The Distant Reading of Religious Texts: A &quot;Big Data&quot; Approach to Mind-Body Concepts in Early China</title>
		<author>
			<persName><forename type="first">E</forename><surname>Slingerland</surname></persName>
		</author>
		<idno type="DOI">10.1093/jaarel/lfw090</idno>
		<ptr target="https://doi.org/10.1093/jaarel/lfw090" />
	</analytic>
	<monogr>
		<title level="j">Journal of the American Academy of Religion</title>
		<idno type="ISSN">0002-7189</idno>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="985" to="1016" />
			<date type="published" when="2017-03">Mar. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">A Quantitative Portrait of Legislative Change in Ukraine</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">K</forename><surname>Stine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Agarwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Social, Cultural, and Behavioral Modeling</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Thomson</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="50" to="59" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Comparative Discourse Analysis Using Topic Models: Contrasting Perspectives on China from Reddit</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">K</forename><surname>Stine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Agarwal</surname></persName>
		</author>
		<idno type="DOI">10.1145/3400806.3400816</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Social Media and Society. SMSociety&apos;20</title>
				<meeting><address><addrLine>Toronto, ON, Canada</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="73" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-Faith Online Discussions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Tan</surname></persName>
		</author>
		<idno type="DOI">10.1145/2872427.2883081</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on World Wide Web. WWW &apos;16</title>
				<meeting>the 25th International Conference on World Wide Web. WWW &apos;16<address><addrLine>Montréal, Québec, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="613" to="624" />
		</imprint>
	</monogr>
	<note>International World Wide Web Conferences Steering Committee</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Authorless Topic Models: Biasing Models Away from Known Structure</title>
		<author>
			<persName><forename type="first">L</forename><surname>Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/C18-1329" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Computational Linguistics</title>
				<meeting>the 27th International Conference on Computational Linguistics<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018-08">Aug. 2018</date>
			<biblScope unit="page" from="3903" to="3914" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
