<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluating the Similarity of Location-based Corpora Identified in Reddit Comments</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Cillian</forename><surname>Berragan</surname></persName>
							<email>c.berragan@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
								<address>
									<postCode>L69 3BX</postCode>
									<settlement>Liverpool</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alex</forename><surname>Singleton</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
								<address>
									<postCode>L69 3BX</postCode>
									<settlement>Liverpool</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessia</forename><surname>Calafiore</surname></persName>
							<email>calafio@ed.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Edinburgh</orgName>
								<address>
									<settlement>Edinburgh</settlement>
									<region>EH8 9YL</region>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jeremy</forename><surname>Morley</surname></persName>
							<email>jeremy.morley@os.uk</email>
							<affiliation key="aff2">
								<orgName type="institution">Ordnance Survey</orgName>
								<address>
									<postCode>SO16 0AS</postCode>
									<settlement>Southampton</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluating the Similarity of Location-based Corpora Identified in Reddit Comments</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BBD750A0D08CD1590A66EB502EA1516C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-06-19T14:30+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Social media</term>
					<term>Natural Language Processing</term>
					<term>Social Interaction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Social interaction is typically studied from the context of physical movement, where geographic distance and ease of connectivity influence the strength of interaction between regions. From the point of view of social media networks however, these limitations appear to still persist, despite interactions not being reliant on physical movement, suggesting non-physical geographic characteristics influence interaction between social communities. Unlike geotags, which provide explicit geographic information about social media users as coordinates, unstructured text presents an alternative perspective for the study of social interaction between regions, instead allowing for the comparison between the language used when mentioning locations in context. Our paper analyses the corpora associated with major cities across the UK, first vectorising Reddit comments through transformer-based embeddings, which capture semantic information, then using these to establish unsupervised clusters and similarity between them. We find that distinct groups emerge which broadly conform with established regional identities of locations across the UK, but with interesting deviations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Social interaction is typically studied in the context of mobility, using data sources like Census or transport records, where physical movement is restricted by distance and ease of connectivity between two locations <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. In contrast to this, social interaction has also been studied using phone call data <ref type="bibr" target="#b2">[3]</ref>, and social media networks <ref type="bibr" target="#b3">[4]</ref>, where the spatial and temporal bounds of connectivity between two locations does not restrict interactions. Despite this however, many studies have found that geographic identities within communities still persist in these networks, with interaction strength influenced by the geographic distance between them <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>Social media also presents rich semantic information regarding locations through text associated with geotagged social media posts. Comparative analysis of corpora associated with geotagged locations similarly exhibit regionality; for example, tweets from the North East of England are statistically different compared with the South <ref type="bibr" target="#b4">[5]</ref>.</p><p>Our paper explores the similarity of corpora with respect to locational mentions from data taken directly from text, without relying on geotagged metadata. This approach offers an alternative perspective for the analysis of social interaction, built directly from the semantic information associated with locations, rather than the location associated with social media users themselves. Collective semantic information from social media embeds the regional identity of locations across a continuous spectrum, allowing for the direct comparison between these identities and their relationships.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>The following section gives an overview of our data source and the data processing methodology used in our paper. All code, analysis and data are available on our DagsHub repository.</p><p>Reddit is a public discussion, news aggregation social network, among the top 20 most visited websites in the United Kingdom. As of 2020, Reddit had around 430 million active monthly users, comparable to the number of Twitter users <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. Reddit is divided into separate independent subreddits each with specific topics of discussion, where users may submit posts which each have dedicated nested conversation threads that users can add comments to. Subreddits cover a wide range of topics, and in the interest of geography, they also act as forums for the discussion of local places. The United Kingdom subreddit acts as a general hub for related topics, notably including a list of smaller and more specific related subreddits. This list provides a 'Places' section, a collection of local British subreddits, ranging in scale from country level (/r/England), regional (/r/thenorth, /r/Teeside), to cities (/r/Manchester) and small towns (/r/Alnwick). In total there are 213 subreddits that relate to 'places' within the United Kingdom<ref type="foot" target="#foot_0">1</ref> . For each subreddit, every single historic comment was retrieved using the Pushshift Reddit archive <ref type="bibr" target="#b8">[9]</ref>. In total 8,282,331 comments were extracted, submitted by 490,535 unique users, between 2011-01-01 and 2022-04-17.</p><p>We extracted and geolocated all place names in this collection of comments using a custom built geoparsing pipeline. To identify place names, we used a BERT transformer-based NER model trained on the WNUT 2017 dataset <ref type="bibr" target="#b9">[10]</ref>, available on the HuggingFace Model Hub. We then implemented a disambiguation methodology using contextual place names and two gazetteers to geolocate place names; OS Open Names and 'natural' location types from the Gazetteer of British Place Names. Processed comments consist of a collection of geolocated place names, alongside their natural language context sentence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Similarity of Place Corpora</head><p>Comparing the similarity between two or more distinct texts first relies on an appropriate method for processing the text into a numerical format. For each location we obtained a corpus of comments, consisting of sentences where each location is mentioned. These were then processed into a single vector, reflecting the semantic information attributed with locations.</p><p>Typically, a TF-IDF approach is used to generate document embeddings <ref type="bibr" target="#b10">[11]</ref>, however we found comparative analysis between embeddings did not always provide insightful information. Each vector shared similar properties, giving cosine similarities which did not result in any distinct variation between locations. This is likely a problem with the language between locations sharing similar properties, meaning the more nuanced semantic information is not captured through a TF-IDF method.</p><p>We therefore extracted embeddings from a deep neural network called a transformer. Unlike TF-IDF or simpler neural network models, transformers are able to use contextual information to generate word embeddings, meaning the same word in two different contexts will not share the exact same vector, capturing different embedded semantic information <ref type="bibr" target="#b11">[12]</ref>. Additionally, transformers are pre-trained on a large corpus of text, meaning general information regarding the English language is already embedded within the model, allowing for improved understanding of semantic information. These core features mean that embeddings generated from transformers are likely to capture information that allows for more the accurate comparative analysis. We generated embeddings using the all-mpnet-base-v2 model from the sentencetransformers library in Python <ref type="bibr" target="#b12">[13]</ref>. Unlike a standard 'BERT'-like transformer, this library implements modifications to base models that more appropriately captures semantic information in their output embeddings.</p><p>Before calculating embeddings we first masked every mention of a location with a generic token 'PLACE', this ensured that when analysing embeddings, no explicit geographic information was captured accidentally. For example, Manchester and Liverpool may mention matching locations frequently in each of their comments because they are geographically close. To both remove noise and reduce the computational requirements for this work, only locations with over 10,000 unique mentions were kept, from these a random sample of 1,000 comments were selected for each. Once embeddings were generated for every comment in each city corpus, the mean for each corpus was generated, giving a vector 768 decimal values for each city.</p><p>With a single vector for each selected location, we first calculated K-Means clusters to determine whether distinct groupings of locations could be identified across the UK. To visualise these clusters we used a PCA decomposition to reduce the dimensionality from 768 down to 2 dimensions. Finally, we calculated the cosine similarity between each and every location vector.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results &amp; Discussion</head><p>Figure <ref type="figure">1</ref> gives K Means clusters for transformer embeddings decomposed into two dimensions with 𝑘 = 5. These Clusters show corpora that share similar semantic properties, however, it is worth noting that while points that are closer together likely indicate increased similarity, the position of these points reflect PCA decomposed values, which capture less information compared with the clusters calculated on non-decomposed vectors. Notably London appears as a single value in a cluster, suggesting the corpus associated with the capital of the UK is semantically distinct from the rest of the country. There is also a single cluster associated with the four Scottish cities considered in our study (Cluster 1), as well as a cluster for Cambridge and Oxford (Cluster 5). restricted geographic properties, while also capturing some divergences from this, with locations like London, Newcastle, Bristol and Brighton geographically distant from locations they share clusters with. With our high dimensional transformer embeddings we compare the cosine similarity between them on Figure <ref type="figure" target="#fig_1">2</ref>. The highest and lowest similarity score for each location is highlighted in red and green respectively. As with Figure <ref type="figure">1</ref>, corpora in Scottish cities appear to largely share similarities, with Glasgow and Edinburgh sharing their highest similarity values. The city with the lowest similarity to the most other locations is Oxford, which shares low values with cities in Scotland, as well as Liverpool and Manchester. London again stands out, with overall very low similarities with all other cities, but the highest similarity with Manchester.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>Our paper demonstrates the ability to compare Reddit comments relating to cities across the UK, using document embeddings generated from a transformer neural network. Instead of focussing on physical interactions between people or social media interactions, our work identifies relationships between cities through their semantic footprint, and analysing each corpus computationally allows for direct comparisons between cities through clustering and cosine similarity.</p><p>Our analysis reveals distinct clusters which largely reflect geographic proximity of locations, however, interesting deviations from proximity do emerge. Oxford and Cambridge are both clustered and share a high cosine similarity, but generate the lowest similarity with many other locations in the UK, including London. London in particular appears distinct from the rest of the UK, while cities that are not geographically close exhibit clustering and high similarity, such as Liverpool and Newcastle. The information generated through our work presents an alternative view of relationships between cities that are not captured by existing data sources, all of which rely on explicit geographic coordinate information. Instead, we build similarities and clusters directly from the semantic information that exists within their respective corpora. Unlike traditional data, which captures objective social interactions between regions, the deviations from the restriction of geographic distance between several cities in our work appears to reflect the more subjective language that shapes the cultural and perceived identity of regions, and the relationships between them.</p><p>While our work enables the direct numerical comparison between city-based corpora, it cannot explain the similarities and dissimilarities between them. Additional work may explore the use of topic-modelling to identify shared topics between locations, and differences in the sentiment towards these topics may explain dissimilarity.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 (Figure 1 :</head><label>11</label><figDesc>Figure1gives K Means clusters for transformer embeddings decomposed into two dimensions with 𝑘 = 5. These Clusters show corpora that share similar semantic properties, however, it is worth noting that while points that are closer together likely indicate increased similarity, the position of these points reflect PCA decomposed values, which capture less information compared with the clusters calculated on non-decomposed vectors. Notably London appears as a single value in a cluster, suggesting the corpus associated with the capital of the UK is semantically distinct from the rest of the country. There is also a single cluster associated with the four Scottish cities considered in our study (Cluster 1), as well as a cluster for Cambridge and Oxford (Cluster 5). Figure1(B) reveals that clusters do broadly appear to reflect distance-</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Cosine similarity between each and every location related transformer vector embedding. Values scaled between 0 and 1. Green highlights indicate the highest value in each row, while red indicates the lowest value in each row.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.reddit.com/r/unitedkingdom/wiki/british_subreddits</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">From spatial interaction data to spatial interaction information? Geovisualisation and spatial structures of migration from the 2001 UK census, Computers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rae</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.compenvurbsys.2009.01.007</idno>
	</analytic>
	<monogr>
		<title level="j">Environment and Urban Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="161" to="178" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Assessing the extent of transport social exclusion among the elderly</title>
		<author>
			<persName><forename type="first">H</forename><surname>Titheridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Achuthan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Mackett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Solomon</surname></persName>
		</author>
		<idno type="DOI">10.5198/jtlu.v2i2.44</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Transport and Land Use</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Delineating Geographical Regions with Networks of Human Interactions in an Extensive Set of Countries</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sobolevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Szell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Campari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Couronné</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Smoreda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ratti</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0081707</idno>
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">e81707</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Geographies of an Online Social Network</title>
		<author>
			<persName><forename type="first">B</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Varga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ságvári</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Á</forename><surname>Jakobi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kertész</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0137248</idno>
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0137248</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The human geography of Twitter: Quantifying regional identity and inter-region communication in England and Wales</title>
		<author>
			<persName><forename type="first">R</forename><surname>Arthur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T P</forename><surname>Williams</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0214466</idno>
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page">e0214466</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Redrawing the Map of Great Britain from a Network of Human Interactions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ratti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sobolevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Calabrese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Andris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reades</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Claxton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Strogatz</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0014248</idno>
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">e14248</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Murphy</surname></persName>
		</author>
		<ptr target="https://www.redditinc.com/blog/reddits-2019-year-in-review/#content" />
		<title level="m">Reddit&apos;s 2019 Year in Review -Upvoted</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><surname>Statista</surname></persName>
		</author>
		<ptr target="https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/" />
		<title level="m">Most popular social networks worldwide as of January 2022, ranked</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>by number of monthly active users</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Baumgartner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zannettou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Keegan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Squire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Blackburn</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2001.08435</idno>
		<title level="m">The Pushshift Reddit Dataset</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition</title>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nichols</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Erp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Limsopatham</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W17-4418</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Noisy User-generated Text, Association for Computational Linguistics</title>
				<meeting>the 3rd Workshop on Noisy User-generated Text, Association for Computational Linguistics<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="140" to="147" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Speech and Language Processing: An Introduction to Natural Language Processing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Daniel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>James</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics, and Speech Recognition</title>
				<meeting><address><addrLine>prentice hall</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.03762</idno>
		<idno>arXiv:1706.03762</idno>
		<title level="m">Attention Is All You Need</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>cs</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1410</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3980" to="3990" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
