<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Twin BERT Contextualized Sentence Embedding Space Learning and Gradient-Boosted Decision Tree Ensembles for Scene Segmentation in German Literature</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Sebastian</forename><surname>Gombert</surname></persName>
							<email>gombert@dipf.de</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Information Center for Education DIPF</orgName>
								<orgName type="department" key="dep2">Leibniz Institute for Research and Information in Education Frankfurt am</orgName>
								<address>
									<settlement>Main</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Twin BERT Contextualized Sentence Embedding Space Learning and Gradient-Boosted Decision Tree Ensembles for Scene Segmentation in German Literature</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0308B5458EBACFCBBB01E3694B750B02</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper documents a submission to the shared task on scene segmentation hosted at KONVENS 2021 <ref type="bibr" target="#b15">(Zehe et al., 2021b)</ref>. The aim of this shared task was to find methods for segmenting narrative texts into different scenes -segments of text where location, time and the constellation of characters stay more or less coherent. This task is formulated as a sentence classification task where sentences bordering the scenes have to be distinguished from in-scene sentences. The approach presented in this paper is based on two steps. In the first one, a twin BERT training setup is used to learn a sentence embedding space in which sentences functioning as scene borders are well-separated from ones that are in-scene. In the second one, the sentence embeddings generated by this model are used as feature vectors to feed a gradient-boosted decision tree ensemble which conducts final predictions. In the shared task leaderboard, the system ranked second in track 1 and first in track 2.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Scene segmentation in narrative texts is a novel task in natural language processing introduced by <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>. The aim of this task is to segment pieces of literature into scenes -sections of text where the relation of story time and discourse time, the location and character constellations stay more or less the same. From a formal point of view, this problem can be interpreted as a sentence in context classification task where sentences separating scenes have to be distinguished from in-scene ones. This is needed as the typical length of longer narrative texts such as novels prevents techniques such as co-reference resolution useful for proceeding steps of analysis from functioning well <ref type="bibr" target="#b14">(Zehe et al., 2021a)</ref>. With a text being segmented into coherent scenes, each scene can be processed sepa-rately improving the performance for such follow up processing.</p><p>This paper presents an a participating system at the KONVENS 2021 shared task on scene segmentation <ref type="bibr" target="#b15">(Zehe et al., 2021b)</ref> and relies on two steps. For the first one, a BERT-based <ref type="bibr" target="#b1">(Devlin et al., 2019)</ref> neural network trained in a twin network setup is used to predict embeddings for respective input sentences <ref type="bibr" target="#b11">(Reimers and Gurevych, 2019)</ref>. This network was trained to provide an embedding space in which sentences bordering scenes are well-separated from in-scene ones. For the second step, gradient-boosted decision tree ensembles <ref type="bibr" target="#b6">(Mason et al., 1999)</ref> are then fed these sentence embeddings as feature vectors to carry out final predictions.</p><p>For shared task evaluations, this system was trained on a data set consisting of various German dime novels where scene borders had been previously annotated. Participating systems were evaluated in two tracks using F1 scores. In the first track, the models were evaluated using a test set consisting of additional dime novels. In this track, the system presented in this paper achieved the second place with an F1 of 0.16. In the second track, domain-adaptability was probed by evaluating the systems on a set of German contemporary highbrow literature. Here, the system presented performed better and was ranked first with an F1 of 0.26.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Task Description</head><p>In <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>, the authors interpreted the task of scene segmentation as a sentence classification task. They defined four different classes of sentences: no border, scene-to-scene, scene-tononscene and nonscene-to-scene. The three latter of these are used to mark the different kinds of tex-tual borders among the sentences. They trained a BERT-based <ref type="bibr" target="#b1">(Devlin et al., 2019)</ref> classifier utilising a sliding windows over multiple sentences for context encoding to carry out sentence classification.</p><p>This approach was evaluated against the unsupervised TextTiling <ref type="bibr" target="#b2">(Hearst, 1997)</ref> and TopicTiling <ref type="bibr" target="#b12">(Riedl and Biemann, 2012)</ref> methods on a corpus consisting of 15 German dime novels using cross validation. While the supervised BERT model achieved superior results (γ 0.15) compared to the unsupervised methods (γ 0.01; γ 0.02), the overall results turned out subpar which led the authors conclude that scene segmentation can be regarded as an inherently hard task.</p><p>For the KONVENS 2021 shared task, the organizers provided an expanded version of the data set presented by <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>. This data set is composed of various German dime novels. The authors chose this genre as they deemed it easier for potential models to deal with.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Related Work</head><p>While segmenting text into smaller units such as tokens, sentences or spans is one of the oldest and most researched topics in natural language processing, the task of semantically segmenting narrative texts into scenes is a new one. In this form, scene segmentation was first introduced by <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>. From a problem-centric point of view, <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref> relate scene segmentation to the task of topic segmentation, the task of segmenting a text by topic changes, as changes of time, place and character constellation can be interpreted as a special cases of topic changes.</p><p>Most of the more recent work in this area <ref type="bibr" target="#b12">(Riedl and Biemann, 2012;</ref><ref type="bibr" target="#b7">Misra et al., 2011)</ref> is built upon latent Dirichlet allocation <ref type="bibr" target="#b0">(Blei et al., 2003)</ref>. This method discovers fields of words consistently co-occuring in the same contexts. By monitoring changes in their distribution throughout a text, one can define topic-wise section borders. Another related topic according to <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref> is discourse coherence. Recent approaches in this area rely on neural networks to detect textual coherence in various setups and use cases <ref type="bibr" target="#b4">(Li and Jurafsky, 2017;</ref><ref type="bibr" target="#b9">Pichotta and Mooney, 2016)</ref>. Changes in these coherence scores can be used for detecting borders within texts, as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">System Description</head><p>My code can be found under<ref type="foot" target="#foot_1">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Adjustments to the Tag Set</head><p>While Zehe et al. (2021a) used a quaternary tag set which distinguished scene to scene-and nonscene to scene borders which is also used for official shared task evaluations, my system internally relies on a tertiary tag set consisting of the tags O, SCENE and NONSCENE. The latter two refer to the first sentence of an according section. The reason for this adjustment is that the number of border sentences is low compared to the number of non-border sentences. My tertiary tag set is the smallest classification setup which can be used to distinguish scenes and non-scenes. Using this tertiary tagset results in all scene to scene-and nonscene to scene sentences being grouped under the SCENE task, and all scene-to-nonscene ones under the NONSCENE.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Twin BERT Embedding Space Learning</head><p>My system is built around the idea of neural embedding space learning. <ref type="bibr" target="#b11">Reimers and Gurevych (2019)</ref> introduced the idea of using twin and triplet network-based training setups for fine-tuning transformer language models to map sentences into meaningful semantic vector spaces under the name Sentence Transformers. In their training setup, two or three different sentences are fed into the same transformer language model. These pairs and triplets of sentences are assigned scores such as cosine similarity or concrete training labels. A prediction head which is fed the output of the transformer language model for all two or three sentences is trained to predict the assigned scores or labels. After this training process, the transformer language model can embed sentences into a vector space where they are well-separated according to the respective training objective.</p><p>The idea behind the system presented in this paper is to combine this approach of twin network embedding space learning with the sliding windowbased approach from <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>. More precisely, my approach is to utilise a twin networkbased training setup to learn an embedding space encoding information about a sentence as well as the sentences surrounding it. The goal here is that, within this vector space, the embeddings of sentences bordering scenes are well-separated from them of in-scene ones.</p><p>Instead of a single BERT model as <ref type="bibr" target="#b11">(Reimers and Gurevych, 2019)</ref>, it uses two of them with one functioning as sentence encoder and one as context encoder. In both cases, the regular pooling layer output of these networks is used to encode given input sentences. While the sentence encoder is only used to predict a sentence embedding for a given target sentence, the context encoder also predicts sentence embeddings for a context window of n sentences to the left and to the right around this target sentence. The output of both encoders is concatenated to acquire the final embeddings for embedding a sentence and its context into vector space.</p><p>m(s t ) = e sent (s t ) ⊕ e cont (s t )</p><p>(1)</p><formula xml:id="formula_0">e sent (s t ) = B 1 (s t ) (2) e cont (s t ) = c lef t (s t ) ⊕ B 2 (s t ) ⊕ c right (s t ) (3) c lef t (s t ) = B 2 (s t−n ) ⊕ • • • ⊕ B 2 (s t−1 ) (4) c right (s t ) = B 2 (s t+1 ) ⊕ • • • ⊕ B 2 (s t+n ) (5)</formula><p>In these equations, s t is a given sentence at time step (position in text) t. m(s) refers to the function used for predicting embeddings. e sent(s) and e cont (s) are the two different encoder networks. B 1 and B 2 refer to the two underlying BERT networks, and c lef t (s t ) and c right (s t ) are the functions used for acquiring the context of a given sentence s t . n determines the size of this context.</p><p>For training such a sentence embedding model, I randomly sampled 15000 pairs of sentences which were both either scene-or non-scene borders and 15000 pairs where both sentences were from different categories, the majority of them being pairs of scene border and in-scene sentences, from the training set. While the prior set of pairs is assigned a score of 1, the pairs from the latter set are assigned a score of -1. </p><formula xml:id="formula_1">m concat (p) = m(s 1 (p)) ⊕ m(s 2 (p)) (6) f (p) = L(m concat (p))<label>(7</label></formula><formula xml:id="formula_2">f (x, y) = x if y = 1 max(0, δ − x) if y = -1 (8)</formula><p>Within this function, x is a predicted score, y a gold standard one and δ the so-called margin, a hyper parameter which can be used to control the distances between the vectors a given model learns. This function is used to learn a maximum marginlike embedding space which separates scene borders from in-scene sentences.</p><p>The GermanBERT variant provided by Huggingface Transformers <ref type="bibr" target="#b13">(Wolf et al., 2020)</ref> under the id bert-base-german-dbmdz-uncased<ref type="foot" target="#foot_2">2</ref> is used as a base for both sentence encoder and context encoder. The reason for choosing this model was that the data it was pre-trained on includes narrative texts which makes it an appropriate basis for a model dealing with literary data. The model was trained using AdamW (Kingma and Ba, 2015; <ref type="bibr" target="#b5">Loshchilov and Hutter, 2019)</ref> with the learning rate As visible in figure <ref type="figure">3</ref>, the model indeed learned to embed sentences into a vector space in which they were well-separated into two distinct clusters. However, it does not seem that the model generalized the idea of what exactly is a scene border well from the training data. While for 'Der kleine Chinesengott', the German dime novel provided as trial corpus, the majority of scene borders is located in the smaller of the two clusters, there are also borders located in the larger cluster, and, moreover, many in-scene sentences are also sorted into the smaller cluster. This phenomenon was visible after multiple training runs with different sampled pairs of sentences which implies that drawing clear distinctions between scene borders and in-scene sentences is hard for solely BERT-based models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Gradient Boosted Decision Tree Ensembles</head><p>As the embedding model did seemingly not learn a precise enough distinction between scene borders and in-scene sentences, using maximum margin classification with the resulting embeddings as feature vectors was no option. Instead, I chose gradient boosted decision tree ensembles <ref type="bibr" target="#b6">(Mason et al., 1999)</ref> as classification algorithm because of its ability to select distinctive features and ignore less distinctive ones.</p><p>During training, this algorithm creates an ensemble of weak regression trees trained to predict the logits within a specialized logistic regression setup. Combining enough of such trees results in a strong learner. This is conducted by means of gradient descent and decision tree learning. Each subsequent tree is trained to correct erroneous predictions of the previous ones. As each of them is limited to use only a small subset of the input features provided in given input feature vectors, the trained ensemble can automatically isolate features which globally distinguish scene borders from in-scene sentences the best within the training set.</p><p>For implementing this part of the system, I used Catboost <ref type="bibr" target="#b10">(Prokhorenkova et al., 2018)</ref> as framework. The model is based upon its multi class classification mode. The tree growth policy is set to lossguide and class weights are used. The following formula is used for calculating them:</p><formula xml:id="formula_3">w c = 1 − num(c) C c num(c ) (9)</formula><p>w c is a respective class weight, c a class, C the set of all classes, c and c classes and num(c) a function which returns the number of training examples for a given class. Additionally, I used early stopping to prevent overfitting. For this, I set the number of training iterations to 5000, let the framework choose a learning rate automatically, and then used the checkpoint of the model which performed best on the trial dime novel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Results</head><p>Shared task evaluations were carried out on two different corpora resulting in two different evaluation tracks. The first of these corpora consisted of 5 more dime novels similar to the ones systems were trained on to address in-domain transfer capabilities of the participating systems. The corpus used for the second track consisted of two pieces of highbrow German literature. The aim of this track was to evaluate out-of-domain transfer capabilities Figure <ref type="figure">3</ref>: The embeddings predicted for the sentences from the dime novel 'Der kleine Chinesengott' used as trial data in the shared task visualized in 2D using principal component analysis <ref type="bibr" target="#b8">(Pearson, 1901)</ref>. 0/brown corresponds to in-scene sentences, 1/green to scene borders and 2/blue to non-scene borders.</p><p>of the participating systems. My system ranked second out of four in the first track reaching a micro F1 of 0.16 and first out of five in the second track reaching a micro F1 score of 0.26. These results confirm the difficulty of this task observed by <ref type="bibr" target="#b14">Zehe et al. (2021a)</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Track</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Qualitative Error Analysis</head><p>To further analyze the results of my system, I turned to qualitative error analysis. For this purpose, I collected the false negative and false positive scene border sentences detected by my system for the trial corpus and analyzed a selection of them with regard to common structural patterns. 128 of the sentences Ich fuhr zur Linienstraße. Dann aber schlich ich mich in den dunklen Hausflur.</p><p>Most false positive sentences mention time, characters or location without explicitly signifying a change. This speaks for the assumption that the model might have overgeneralized these signals:</p><p>In der Nähe des schlesischen Bahnhofs.</p><p>"Tom, was tust Du, mußte das sein!" Bill lag wieder still. Auch Tom lauschte und schien unschlüssig zu sein.</p><p>Isaak Kornblum besaß Telephon. Ich tat es.</p><p>On the other hand, many of the false negatives contain similar signals. This puts the assumption that the model might have overgeneralized upon such signals into question. Of course, one needs to consider that the majority of dimensions of the respective embeddings encode sentences from the context of a particular target sentence. Given this fact in combination that with the observation that false positives and false negatives share similar patterns, it seems very likely that these local context sentences have played a major role for classification. The following utterances are examples for false negatives: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion &amp; Outlook</head><p>I presented my submission to the shared task on scene segmentation at KONVENS 2021, a system aimed at segmenting German narrativew texts into distinct scenes, spans of text where character constellations, discourse-and story time, and locations stay more or less the same. For its implementation, the task was interpreted as a sentence in context classification task. For solving this task, I first trained a neural model consisting of two German-BERT networks, the sentence encoder and context encoder, which, in conjunction, predict contextualized sentence embeddings. This was conducted in a twin network setup where triplets of two sentences and an according score were fed to a a linear layer responsible for predicting such an according score.</p><p>The goal behind this was to train a model which would be able to embed sentences into a vector space in which sentences functioning as scene borders would be well-separated from in-scene ones which could then be used as feature vectors in regular classification. While the model indeed learned a vector space in which sentences were more or less sorted into two distinct clusters, these clusters did not seem to capture a general understanding of the concept of scene borders. This is shown by the observation that gold standard scene borders from the trial set were sorted into both clusters when embedded by the model.</p><p>For this reason, gradient boosting was chosen as a subsequent classification algorithm for its ability to isolate a subset of features which would still be able to separate classes well. Early stopping was used during training, meaning that the model was trained for 5000 iterations on the shared task training data and the iteration of the model which achieved best results on the trial data set was chosen as final. This achieved comparably poor results with micro F1 scores of 0.16 for track 1 respectively 0.26 for track 2. Nonetheless, these results were sufficient for ranks 2/4 respectively 1/5 in the two tracks.</p><p>It is an interesting observation that my system performs better for highbrow literature in spite of the fact that its training data consisted solely of dime novels as it contradicts the assumption of the authors that dime novels would be potentially easier to deal with for participating systems compared to highbrow literature. A possible explanation for this could lie in the more formal nature of highbrow literature which might result in more regularities that are useful for successful classification. However, without further inspection, this remains speculation.</p><p>Further work could be the optimization of the architecture and training procedure of the contextualized sentence embedding model presented in this paper. This might lead to improved downstream training results. Moreover, as gradient boosting functions as feature-based learning algorithm, it could be an option to combine contextualized sentence embeddings with statistical and hand-crafted features for representing sentences in context. In general, it can be said that the problem is far from solved as sugggested by the poor results. However, the idea of learning contextualized sentence embeddings and the optimization of the according training procedure could be a useful option to for future work on the topic.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The architecture of the neural network model in prediction mode when generating contextualized sentence embeddings.</figDesc><graphic coords="3,101.75,62.79,158.75,90.33" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>)</head><label></label><figDesc>In these equations, p refers to a triple of two sentences from the training set and an according score (-1 or 1, depending on class equality), s 1 (p) and s 2 (p) are functions retrieving the first respectively second sentence from a given training input triple. f (p) refers to the final output score calculated by the network during training and L to a linear feedforward layer. During training both sentences of a triple and their according local context sentences are propagated through both the sentence respectively the context encoders. Their pooling layer outputs for both sentences are concatenated and propagated into a linear layer whose single output neuron is trained to predict the according score using hinge embedding loss:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: A visualisation of the twin network-based training setup.</figDesc><graphic coords="4,128.69,62.79,340.17,131.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>marked as scene borders within the trial corpus were false positives. What became quickly visible was that some false positives contained changes of time, character constellations and/or location. As these function as important signals for a scene change, the model seems to have overgeneralized such cases. The following utterances are examples for a signified change in time from false positives: Langsam verstrich die Zeit. Natürlich kamen wir zu spät. unendlich langsam verstrich die Zeit [...]. Ich wartete also noch eine Weile, dann aber [...] Gerade in dem Moment vernahm ich [...] Examples for a change in character constellation are the following: Bills Alarmruf hatte den Spitzbuben verscheucht. Der Verfolger war [...] untergetaucht. Da hörte ich Tom plötzlich aufstehen [...]. Tom erhob sich jetzt und entschuldigte sich [...]. Dem herbeieilenden Portier berichtete ich [...]. Ich war wieder allein [...]. Bill meldete in diesem Moment den Besuch Dr. Türks. Ich fand ihn ohnmächtig auf dem Fußboden liegen. The following utterances are examples for a location change: Wir verließen unser Häuschen [...]. "Schnell, zu Wertheim," raunte Tom mir zu. Wir trafen uns erst wieder draußen in der Linienstraße. Wir durchsuchten noch einmal das Arbeitszimmer [...]. Endlich erreichten wir den kleinen Antiquitätenladen.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>The shared task evaluation results of my system.</figDesc><table><row><cell></cell><cell>F1</cell><cell>γ</cell><cell>Rank</cell></row><row><cell>Dime Novels</cell><cell cols="2">0.16 0.085</cell><cell>2/4</cell></row><row><cell cols="3">Highbrow Literature 0.26 0.175</cell><cell>1/5</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">https://github.com/SGombert/ ssts-2021-sego</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">https://huggingface.co/ bert-base-german-dbmdz-uncased</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">4-5</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Text tiling: Segmenting text into multi-paragraph subtopic passages</title>
		<author>
			<persName><forename type="first">Marti</forename><forename type="middle">A</forename><surname>Hearst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="33" to="64" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">3rd International Conference on Learning Representations, ICLR 2015</title>
				<meeting><address><addrLine>San Diego, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015-05-07">2015. May 7-9, 2015</date>
		</imprint>
	</monogr>
	<note>Conference Track Proceedings</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Neural net models of open-domain discourse coherence</title>
		<author>
			<persName><forename type="first">Jiwei</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jurafsky</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2017 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="198" to="209" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Decoupled weight decay regularization</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Hutter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">7th International Conference on Learning Representations, ICLR 2019</title>
				<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-05-06">2019. May 6-9, 2019</date>
		</imprint>
	</monogr>
	<note>OpenReview.net</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Boosting algorithms as gradient descent</title>
		<author>
			<persName><forename type="first">Llew</forename><surname>Mason</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Baxter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Bartlett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marcus</forename><surname>Frean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS&apos;99</title>
				<meeting>the 12th International Conference on Neural Information Processing Systems, NIPS&apos;99<address><addrLine>Cambridge, MA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="512" to="518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Text segmentation: A topic modeling perspective</title>
		<author>
			<persName><forename type="first">Hemant</forename><surname>Misra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Olivier</forename><surname>Franc ¸ois Yvon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joemon</forename><surname>Cappé</surname></persName>
		</author>
		<author>
			<persName><surname>Jose</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="528" to="544" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">LIII. on lines and planes of closest fit to systems of points in space</title>
		<author>
			<persName><forename type="first">Karl</forename><surname>Pearson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="559" to="572" />
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Learning statistical scripts with lstm recurrent neural networks</title>
		<author>
			<persName><forename type="first">Karl</forename><surname>Pichotta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raymond</forename><forename type="middle">J</forename><surname>Mooney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI&apos;16</title>
				<meeting>the Thirtieth AAAI Conference on Artificial Intelligence, AAAI&apos;16</meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2800" to="2806" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Catboost: unbiased boosting with categorical features</title>
		<author>
			<persName><forename type="first">Liudmila</forename><surname>Ostroumova Prokhorenkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gleb</forename><surname>Gusev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aleksandr</forename><surname>Vorobev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anna</forename><surname>Veronika Dorogush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrey</forename><surname>Gulin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018</title>
				<meeting><address><addrLine>Montréal, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-12-03">2018. December 3-8, 2018</date>
			<biblScope unit="page" from="6639" to="6649" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence embeddings using Siamese BERTnetworks</title>
		<author>
			<persName><forename type="first">Nils</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iryna</forename><surname>Gurevych</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">TopicTiling: A text segmentation algorithm based on LDA</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Riedl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Biemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2012 Student Research Workshop</title>
				<meeting>ACL 2012 Student Research Workshop<address><addrLine>Jeju Island, Korea</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="37" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lysandre</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clement</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anthony</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierric</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tim</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Remi</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Morgan</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joe</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sam</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clara</forename><surname>Patrick Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yacine</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Canwen</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Teven</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sylvain</forename><surname>Le Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mariama</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Quentin</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><surname>Rush</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
		<title level="s">Online. Association for Computational Linguistics</title>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Detecting scenes in fiction: A new segmentation task</title>
		<author>
			<persName><forename type="first">Albin</forename><surname>Zehe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leonard</forename><surname>Konle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lea</forename><surname>Katharina Dümpelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evelyn</forename><surname>Gius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Hotho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fotis</forename><surname>Jannidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucas</forename><surname>Kaufmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Markus</forename><surname>Krug</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Puppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nils</forename><surname>Reiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Annekea</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nathalie</forename><surname>Wiedmer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume</title>
				<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume</meeting>
		<imprint>
			<date type="published" when="2021">2021a</date>
			<biblScope unit="page" from="3167" to="3177" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Shared task on scene segmentation@konvens2021</title>
		<author>
			<persName><forename type="first">Albin</forename><surname>Zehe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leonard</forename><surname>Konle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Svenja</forename><surname>Guhr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lea</forename><surname>Katharina Dümpelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evelyn</forename><surname>Gius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Hotho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fotis</forename><surname>Jannidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucas</forename><surname>Kaufmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Markus</forename><surname>Krug</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Puppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nils</forename><surname>Reiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Annekea</forename><surname>Schreiber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Shared Task on Scene Segmentation</title>
				<imprint>
			<date type="published" when="2021">2021b</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
