<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Re-Evaluating GermEval17 Using German Pre-Trained Language Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Matthias</forename><surname>Aßenmacher</surname></persName>
							<email>matthias@stat.uni-muenchen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Statistics</orgName>
								<orgName type="institution">Ludwig-Maximilians-Universität</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessandra</forename><surname>Corvonato</surname></persName>
							<email>alessandracorvonato@yahoo.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Statistics</orgName>
								<orgName type="institution">Ludwig-Maximilians-Universität</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Re-Evaluating GermEval17 Using German Pre-Trained Language Models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">AE61EDBA280C88770EDB20913ABBE7B9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T19:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The lack of a commonly used benchmark data set (collection) such as (Super) GLUE <ref type="bibr" target="#b44">(Wang et al., 2018</ref><ref type="bibr" target="#b43">(Wang et al., , 2019) )</ref> for the evaluation of non-English pre-trained language models is a severe shortcoming of current English-centric NLP-research. It concentrates a large part of the research on English, neglecting the uncertainty when transferring conclusions found for the English language to other languages. We evaluate the performance of German and multilingual BERT models currently available via the huggingface transformers library on four subtasks of Aspect-based Sentiment Analysis (ABSA) from the GermEval17 workshop. We compare them to pre-BERT architectures <ref type="bibr" target="#b46">(Wojatzki et al., 2017;</ref><ref type="bibr" target="#b35">Schmitt et al., 2018;</ref><ref type="bibr" target="#b0">Attia et al., 2018)</ref> as well as to an ELMo-based architecture <ref type="bibr" target="#b3">(Biesialska et al., 2020)</ref> and a BERT-based approach <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref>. The observed improvements are put in relation to those for a similar ABSA task <ref type="bibr" target="#b29">(Pontiki et al., 2014)</ref> and similar models (pre-BERT vs. BERT-based) for the English language and we check whether the reported improvements correspond to those we observe for German.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>(Aspect-based) Sentiment Analysis is often used to transform reviews into helpful information on how a product or service of a company is perceived among the customers. Until recently, Sentiment Analysis was mainly conducted using traditional machine learning and recurrent neural networks, like LSTMs <ref type="bibr" target="#b13">(Hochreiter and Schmidhuber, 1997)</ref> or GRUs <ref type="bibr" target="#b5">(Cho et al., 2014)</ref>. Those models have been practically replaced by language models relying on (parts of) the Transformer architecture, a novel framework proposed by <ref type="bibr" target="#b42">Vaswani et al. (2017)</ref>. <ref type="bibr" target="#b7">Devlin et al. (2019)</ref> developed a Transformer-encoder-based language model called BERT (Bidirectional Encoder Representations from Transfomers), achieving state-of-the-art (SOTA) performance on several benchmark tasks -mainly for the English language -and becoming a milestone in the field of NLP.</p><p>Up to now, only a few researchers have focused on sentiment related problems for German reviews, despite language-specific evaluation is a crucial driving force for a more universal model development and improvement. Unique characteristics of the different languages present different challenges to the models, which is why sole evaluation on English data is a severe shortcoming.</p><p>The first shared task on German ABSA, which provides a large annotated data set for training and evaluation, is the GermEval17 Shared Task <ref type="bibr" target="#b46">(Wojatzki et al., 2017)</ref>. The participating teams back then analyzed the data using mostly standard machine learning techniques such as SVMs, CRFs, or LSTMs. In contrast to 2017, today, different pre-trained BERT models are available for a variety of different languages, including German. We re-analyzed the complete GermEval17 Task using seven pre-trained BERT models suitable for German provided by the huggingface transformers library <ref type="bibr" target="#b47">(Wolf et al., 2020)</ref>. We evaluate which one of the models is best suited for the different GermEval17 subtasks by comparing their performance values. Furthermore, we compare our findings on whether (and how much) BERT-based models are able to improve the pre-BERT SOTA in German ABSA with the SOTA developments for English ABSA by the example of SemEval-2014 <ref type="bibr" target="#b29">(Pontiki et al., 2014)</ref>.</p><p>We first give an overview on the GermEval17 tasks (cf. Sec. 2) and on related work <ref type="bibr">(cf. Sec. 3)</ref>. Second, we present the data and the models (cf. Sec. 4), while Section 5 holds the results of our re-evaluation. Sections 6 and 7 conclude our work by stating our main findings and drawing parallels to the English language.</p><p>2 The GermEval17 Task(s)</p><p>The GermEval17 Shared Task <ref type="bibr" target="#b46">(Wojatzki et al., 2017)</ref> is a task on analyzing aspect-based sentiments in customer reviews about "Deutsche Bahn" (DB) -the German public train company. The main data was crawled from various social media platforms such as Twitter, Facebook and Q&amp;A websites from May 2015 to June 2016. The documents were manually annotated, and split into a training (train), a development (dev) and a synchronic (test syn ) test set. A diachronic test set (test dia ) was collected the same way from November 2016 to January 2017 in order to test for temporal robustness. The task comprises four subtasks representing a complete classification pipeline. Subtask A is a binary Relevance Classification task which aims at identifying whether the feedback refers to DB. Subtask B aims at classifying the Document-level Polarity ("negative", "positive" and "neutral"). In Subtask C, the model has to identify all the aspect categories with associated sentiment polarities in a relevant document. This multi-label classification task was divided into Subtask C1 (Aspect-only) and Subtask C2 (Aspect+Sentiment). For this purpose, the organizers defined 20 different aspect categories, e.g. Allgemein (General), Sonstige Unregelmäßigkeiten (Other irregularities). Finally, Subtask D refers to the Opinion Target Extraction (OTE), i.e. a sequence labeling task extracting the linguistic phrase used to express an opinion. We differentiate between exact match (Subtask D1) and overlapping match, tolerating errors of +/− one token (Subtask D2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Related Work</head><p>Already before BERT, many researchers focused on (English) Sentiment Analysis <ref type="bibr" target="#b2">(Behdenna et al., 2018)</ref>. The most common architectures were traditional machine learning classifiers and recurrent neural networks (RNNs). SemEval14 <ref type="bibr">(Task 4;</ref><ref type="bibr">Pon-tiki et al., 2014)</ref> was the first workshop to introduce Aspect-based Sentiment Analysis (ABSA) which was expanded within SemEval15 Task 12 <ref type="bibr" target="#b28">(Pontiki et al., 2015)</ref> and SemEval16 Task 5 <ref type="bibr" target="#b27">(Pontiki et al., 2016)</ref>. Here, restaurant and laptop reviews were examined on different granularities. The best model at SemEval16 was an SVM/CRF architecture using GloVe embeddings <ref type="bibr" target="#b25">(Pennington et al., 2014)</ref>. However, many works recently focused on re-evaluating the SemEval Sentiment Analysis task using BERTbased language models <ref type="bibr" target="#b12">(Hoang et al., 2019;</ref><ref type="bibr" target="#b49">Xu et al., 2019;</ref><ref type="bibr" target="#b39">Sun et al., 2019;</ref><ref type="bibr" target="#b18">Li et al., 2019;</ref><ref type="bibr" target="#b15">Karimi et al., 2020;</ref><ref type="bibr" target="#b41">Tao and Fang, 2020)</ref>.</p><p>In comparison, little research deals with German ABSA. For instance, <ref type="bibr" target="#b1">Barriere and Balahur (2020)</ref> trained a multilingual BERT model for German Document-level Sentiment Analysis on the SB-10k data set <ref type="bibr" target="#b6">(Cieliebak et al., 2017)</ref>. Regarding the GermEval17 Subtask B, <ref type="bibr" target="#b9">Guhr et al. (2020)</ref> considered both FastText <ref type="bibr" target="#b4">(Bojanowski et al., 2017)</ref> and BERT, achieving notable improvements. <ref type="bibr" target="#b3">Biesialska et al. (2020)</ref> made use of ensemble models: One is an ensemble of ELMo <ref type="bibr" target="#b26">(Peters et al., 2018)</ref>, GloVe and a bi-attentive classification network (BCN; <ref type="bibr">Mc-Cann et al., 2017)</ref>, achieving a score of 0.782, and the other one consists of ELMo and a Transformerbased Sentiment Analysis model (TSA), reaching a score of 0.789 for the synchronic test data set. Moreover, <ref type="bibr" target="#b0">Attia et al. (2018)</ref> trained a convolutional neural network (CNN), achieving a score of 0.7545 on the synchronic test set. <ref type="bibr" target="#b35">Schmitt et al. (2018)</ref> advanced the SOTA for Subtask C by employing biLSTMs and CNNs to carry out end-toend Aspect-based Sentiment Analysis. The highest score was achieved using an end-to-end CNN architecture with FastText embeddings, scoring 0.523 and 0.557 on the synchronic and diachronic test data set for Subtask C1, respectively, and 0.423 and 0.465 for Subtask C2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Materials and Methods</head><p>Data The GermEval17 data is freely available in .xml-and .tsv-format<ref type="foot" target="#foot_0">1</ref> . Each data split <ref type="bibr">(train, validation, test)</ref> in .tsv-format contains the following variables:</p><p>• document id (URL)</p><p>• document text • relevance label (true, false)</p><p>• document-level sentiment label (negative, neutral, positive)</p><p>• aspects with respective polarities (e.g. Ticketkauf#Haupt:negative)</p><p>For documents which are annotated as irrelevant, the sentiment label is set to neutral and no aspects are available. Visibly, the .tsv-formatted data does not contain the target expressions or their associated sequence positions. Consequently, Subtask D can only be conducted using the data in .xml-format, which additionally holds the information on the starting and ending sequence positions of the target phrases.</p><p>The data set comprises ∼ 26k documents in total, including the diachronic test set with around 1.8k examples. Further, the main data was randomly split by the organizers into a train data set for training, a development data set for validation and a synchronic test data set. The distribution of the sentiments is depicted in Table <ref type="table">3</ref>, which shows that between 65% and 69% (per split) belong to the neutral class, 25-31% to the negative and only 4-6% to the positive class.    <ref type="bibr" target="#b11">(Hinton et al., 2015)</ref>. The exact model specifications regarding number of layers (L), number of attention heads (A) and embedding size (H) for available German BERT models are depicted in the last column of Table <ref type="table" target="#tab_5">5</ref>. Both architectures were pre-trained on the Masked Language Modeling task as well as on the auxiliary Next Sentence Prediction task (only BERT) and can subsequently be fine-tuned on a task at hand.</p><p>We include three German (Distil)BERT models pre-trained by DBMDZ<ref type="foot" target="#foot_2">3</ref> and one by Deepset.ai<ref type="foot" target="#foot_3">4</ref> . The latter one is pre-trained using German Wikipedia (6GB raw text files), the Open Legal Data dump (2.4GB; <ref type="bibr" target="#b24">Ostendorff et al., 2020)</ref> and news articles (3.6GB). DBMDZ combined Wikipedia, EU Bookshop <ref type="bibr" target="#b37">(Skadin ¸š et al., 2014)</ref>, Open Subtitles <ref type="bibr" target="#b19">(Lison and Tiedemann, 2016</ref><ref type="bibr">), CommonCrawl (Ortiz Suárez et al., 2019</ref><ref type="bibr">), ParaCrawl (Esplà-Gomis et al., 2019)</ref> and News Crawl <ref type="bibr" target="#b10">(Haddow, 2018)</ref> to a corpus with a total size of 16GB with ∼ 2, 350M tokens. Besides this, we use the three multilingual (Distil)BERT models included in the transformers module. This amounts to five BERT and two DistilBERT models, two of which are "uncased" (i.e. every character is lower-cased) while the other five models are "cased" ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>For the re-evaluation, we used the latest data provided in .xml-format. Duplicates were not removed, in order to make our results as comparable as possible. We tokenized the documents and fixed single spelling mistakes in the labels<ref type="foot" target="#foot_4">5</ref> . For Subtask D, the BIO-tags were added based on the provided sequence positions, i.e. one entity corresponds to at least one token tag starting with B-for "Beginning" and continuing with I-for "Inner". If a token does not belong to any entity, the tag O for "Outer" is assigned. For instance, the sequence "fährt nicht" (engl. "does not run") consists of two tokens and would receive the entity Zugfahrt:negative and the token tags [B-Zugfahrt:negative, I-Zugfahrt:negative] if it refers to a DB train which is not running.</p><p>The models were fine-tuned on one Tesla V100 PCIe 16GB GPU using Python 3.8.7. Moreover, the transformers module (version 4.0.1) and torch (version 1.7.1) were used 6 . The considered values for the hyperparameters for fine-tuning follow the recommendations of <ref type="bibr" target="#b7">Devlin et al. (2019)</ref>:</p><p>• Batch size ∈ {16, 32},</p><p>• Adam learning rate ∈ {5e,3e,2e} − 5,</p><p>• # epochs ∈ {2, 3, 4}.</p><p>After evaluating the model performance for combinations<ref type="foot" target="#foot_5">7</ref> of the different hyperparameters, all pretrained architectures were fine-tuned with a learning rate of 5e-5 for four epochs, which turned out to be the most promising combination across the different models. The maximum sequence length was set to 256, which is sufficient since the evaluated data set consists of rather short texts from social media, and a batch size of 32 was chosen.</p><p>Other models Eight teams officially participated in the GermEval17 shared task, five of which analyzed Subtask A, all of them Subtask B and two repectively Subtask C and D. We furthermore consider the system by <ref type="bibr" target="#b32">Ruppert et al. (2017)</ref>  though they were the organizers and did not "officially" participate. They also tackled all four subtasks. Since 2017 several other authors analyzed (parts of) the GermEval17 subtasks using more advanced models, which we also consider for comparison here. Table <ref type="table">6</ref> shows which authors employed which kinds of models to solve which task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Subtask</head><p>A B C1 C2 D1 D2 Models from 2017 X X X X X X <ref type="bibr" target="#b46">(Wojatzki et al., 2017;</ref><ref type="bibr" target="#b32">Ruppert et al., 2017)</ref> Our BERT models X X X X X X CNN <ref type="bibr" target="#b0">(Attia et al., 2018</ref>) -X ----CNN+FastText <ref type="bibr" target="#b35">(Schmitt et al., 2018)</ref> --X X --ELMo+GloVe+BCN <ref type="bibr" target="#b3">(Biesialska et al., 2020</ref>) -X ----ELMo+TSA <ref type="bibr" target="#b3">(Biesialska et al., 2020)</ref> -X ----FastText <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref> -X ---bert-base-german-cased -X ---- <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref> Table <ref type="table">6</ref>: An overview on all the models discussed in this article, an "X" in a column indicates that the architecture was evaluated on the respective subtask.</p><p>Subtask A The Relevance Classification is a binary document classification task with classes true and false.  <ref type="bibr" target="#b3">(Biesialska et al., 2020)</ref> 0.782 -ELMo+TSA <ref type="bibr" target="#b3">(Biesialska et al., 2020)</ref> 0.789 -FastText <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref> 0.698 † bert-base-german-cased <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref>  All models outperform the best model from 2017 by 1.0-4.0 percentage points for the synchronic, and by 1.6-5.0 percentage points for the diachronic test set. On the synchronic test set, the uncased German BERT-BASE model by dbmdz performs best with a score of 0.807, followed by its cased variant with 0.799. For the diachronic test set, the uncased German BERT-BASE model exceeds the other models with a score of 0.800, followed by the cased German BERT-BASE model reaching a score of 0.793. The three multilingual models perform generally worse than the German models on this task. Besides this, all the models perform slightly better on the synchronic data set than on the diachronic one. The FastText-based model <ref type="bibr" target="#b9">(Guhr et al., 2020)</ref> comes not even close to the baseline from 2017, while the ELMo-based models <ref type="bibr" target="#b3">(Biesialska et al., 2020)</ref>  Here, the pre-trained models surpass the best model from 2017 by 15.7-25.9 percentage points and 20.7-26.5 percentage points, respectively, for the synchronic and diachronic test sets. Again, the best model is the uncased German BERT-BASE dbmdz model reaching scores of 0.655 and 0.689, respectively. The CNN models <ref type="bibr" target="#b35">(Schmitt et al., 2018)</ref> are also outperformed. For both, Subtask C1 and C2, all the displayed models perform better on the diachronic than on the synchronic test data.</p><p>Subtask D Subtask D refers to the Opinion Target Extraction (OTE) and is thus a tokenlevel classification task. As this is a rather difficult task, <ref type="bibr" target="#b46">Wojatzki et al. (2017)</ref>  In Table <ref type="table" target="#tab_1">11</ref>, we compare the pre-trained models using an "ordinary" softmax layer to when using a CRF layer for Subtask D1. on the synchronic test set and 5.6-21.7 percentage points on the diachronic test set.</p><p>For the overlapping match (cf. Tab. 12), the best system from 2017 are outperformed by 4.9-17.5 percentage points on the synchronic and by 4.2-16.8 percentage points on the diachronic test set. Again, the uncased German BERT-BASE model by dbmdz with CRF layer performs best with an micro F1 score of 0.523 on the synchronic and 0.533 on the diachronic set. To our knowledge, there were no other models to compare our performance values with, besides the results from 2017.</p><p>Main Takeaways For the first two subtasks, which are rather simple binary and multi-class classification tasks, the pre-trained models are able to improve a little upon the already pretty decent performance values from 2017. Further, we do not see large differences between the different pre-trained models. Nevertheless, the small differences we can observe, already point in the same direction as what can be observed for the primary ABSA tasks of interest, C1 and C2:</p><p>• Uncased models have a tendency of outperforming their cased counterparts for the monolingual models, for multilingual models this cannot be clearly confirmed. • Monolingual models outperform the multilingual ones. • There are no large performance differences between the two cased BERT models by DBMDZ and Deepset.ai, which suggests only a minor influence of the different corpora, which the models were pre-trained on.</p><p>• The monolingual DistilBERT model is pretty competitive, it consistently outperforms its multilingual counterpart as well as the multilingual BERT models on the subtasks A -C and is at least competitive to the monolingual BERT models. For D1 and D2 we observe a rather clear dominance of the uncased monolingual model which is not observable to this extent for the other tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion</head><p>After having observed a notable performance increase for German ABSA when employing pretrained models, the next step is to compare these observations to what was reported for the English language. Therefore, we examine the temporal development of the SOTA performance on the most widely adopted data sets for English ABSA, originating from the SemEval Shared Tasks <ref type="bibr" target="#b29">(Pontiki et al., 2014</ref><ref type="bibr" target="#b28">(Pontiki et al., , 2015</ref><ref type="bibr" target="#b27">(Pontiki et al., , 2016))</ref>. When looking at public leaderboards, e.g. https://paperswithcode.com/, Subtask SB2 (aspect term polarity) from SemEval-2014 is the task which attracts most of the researchers. This task is related, but not perfectly similar, to Subtask C2, since in this case, the aspect term is always a word which has to present in the given review. For this task, a comparison of pre-BERT and BERT-based methods reveals no big "jump" in the performance values, but rather a steady increase over time (cf. Tab. 13).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Language model</head><p>Laptops Restaurants</p><p>Best model SemEval-2014 0.7048 0.8095 pre-BERT <ref type="bibr" target="#b29">(Pontiki et al., 2014)</ref> MemNet <ref type="bibr" target="#b40">(Tang et al., 2016)</ref>   <ref type="bibr" target="#b29">Pontiki et al., 2014)</ref>. Selected models were picked from https://paperswithcode.com/sota/aspectbased-sentiment-analysis-on-semeval.</p><p>Clearly more related, but unfortunately also less used, are the subtasks SB3 (aspect category extraction; comparable to Subtask C1) and SB4 (aspect category polarity; comparable to Subtask C2) from SemEval-2014. 9 Limitations with respect to comparability arise from the different numbers of categories: Subtask SB4 only exhibits five aspect categories (as opposed to 20 categories for GermEval17) which leads to an easier classification problem and is reflected in the already pretty high scores of the 2014 baselines. Table <ref type="table" target="#tab_14">14</ref> shows the performance of the best model from 2014 as well as performance of subsequent (pre-BERT and BERT-based) models for subtasks SB3 and SB4.  In contrast to what can be observed for SB2, in this case, the performance increase on SB4 caused by the introduction of BERT seems to be kind of striking. While the ATAE-LSTM <ref type="bibr" target="#b45">(Wang et al., 2016)</ref> only slightly increased the performance compared to 2014, the BERT-based models led to a jump of more than 6 percentage points. So when taking into account the potential room for improvement (0.16 for SB4 vs. 0.60 for C2), the improvements relative to the potential (0.06/0.16 for SB4 vs. 0.23/0.60 for C2) are quite similar. Another issue is that (partly) highly specialized (T)ABSA architectures were used for improving the SOTA on the SemEval-2014 tasks, while we "only" applied standard pre-trained German BERT models without any task-specific modifications or extensions. This leaves room for further improvements on this task on German data which should be an objective for future research. 9 Since the data sets (Restaurants and Laptops) have been further developed for SemEval-2015 and SemEval-2016, subtasks SB3 and SB4 are revisited under the names Slot 1 and Slot 3 for the in-domain ABSA in SemEval-2015. Slot 2 from SemEval-2015 aims at OTE and thus corresponds to Subtask D from GermEval17. For SemEval-2016 the same task names as in 2015 were used, subdivided into Subtask 1 (sentence-level ABSA) and Subtask 2 (text-level ABSA).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>As one would have hoped, all the state-of-the art pre-trained language models clearly outperform all the models from 2017, proving the power of transfer learning also for German ABSA. Throughout the presented analyses, the models always achieve similar results between the synchronic and the diachronic test sets, indicating temporal robustness for the models. Nonetheless, the diachronic data was collected only half a year after the main data. It would be interesting to see whether the trained models would return similar predictions on data collected a couple of years later.</p><p>The uncased German BERT-BASE model by dbmdz achieves the best results across all subtasks. Since <ref type="bibr" target="#b31">Rönnqvist et al. (2019)</ref> showed that monolingual BERT models often outperform the multilingual models for a variety of tasks, one might have already suspected that a monolingual German BERT performs best across the performed tasks. It may not seem evident at first that an uncased language model ends up as the best performing model since, e.g. in Sentiment Analysis, capitalized letters might be an indicator for polarity. In addition, since nouns and beginnings of sentences always start with a capital letter in German, one might assume that lower-casing the whole text changes the meaning of some words and thus confuses the language model. Nevertheless, the GermEval17 documents are very noisy since they were retrieved from social media. That means that the data contains many misspellings, grammar and expression mistakes, dialect, and colloquial language. For this reason, already some participating teams in 2017 pursued an elaborate pre-processing on the text data in order to eliminate some noise <ref type="bibr" target="#b14">(Hövelmann and Friedrich, 2017;</ref><ref type="bibr" target="#b34">Sayyed et al., 2017;</ref><ref type="bibr" target="#b36">Sidarenka, 2017)</ref>. Among other things, <ref type="bibr" target="#b14">Hövelmann and Friedrich (2017)</ref> transformed the text to lower-case and replaced, for example, "S-Bahn" and "S Bahn" with "sbahn". We suppose that in this case, lower-casing the texts improves the data quality by eliminating some of the noise and acts as a sort of regularization. As a result, the uncased models potentially generalize better than the cased models. The findings from <ref type="bibr" target="#b20">Mayhew et al. (2019)</ref>, who compare cased and uncased pre-trained models on social media data for NER, corroborate this hypothesis.</p><p>It may be interesting to have a more detailed look at the model performance for this subtask because of the high number of classes and their skewed distribution by investigating the performance on category-level. Table <ref type="table" target="#tab_16">15</ref> shows the performance of the uncased German BERT-BASE model by dbmdz per test set for Subtask C1. The support indicates the number of appearances, which are also displayed in Table <ref type="table" target="#tab_3">4</ref>    All the aspect categories displayed in Table 16 are also visible in Table <ref type="table" target="#tab_16">15</ref> and most of them have negative sentiment. Allgemein:neutral and Sonstige Unregelmäßigkeiten:negative show the highest scores. Again, we assume that here, 48 categories could not be identified due to data sparsity. However, having this in mind, the model achieves a relatively high overall performance for both, Subtask C1 and C2 (cf. Tab. 9 and Tab. 10). This is mainly owed to the high score of the majority classes Allgemein and Allgemein:neutral, respectively, because the micro F1 score puts a lot of weight on majority classes. It might be interesting whether the classification of the rare categories can be improved by balancing the data. We experimented with removing general categories such as Allgemein, Allgemein:neutral or documents with sentiment neutral since these are usually less interesting for a company. We observe a large drop in the overall F1 score which is attributed to the absence of the strong majority class and the resulting data loss. Indeed, the classification for some single categories could be improved, but the rare categories could still not be identified by the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Detailed results (per category) for Subtask D</head><p>Similar as for Subtask C, the results for the best model are investigated in more detail.  For Subtask D1, the model returns a positive score on 25 entity categories on at least one of the two test sets. The category Zugfahrt:negative can be classified best on both test sets, followed by Sonstige Unregelmäßigkeiten:negative and Sicherheit:negative for the synchronic test set and by Connectivity:negative and Allgemein:positive for the diachronic set. Visibly, the scores between the two test sets differ more here than in the classification report of the previous task.</p><p>The report for the overlapping match (cf. Tab. 18) shows slightly better results on some categories  than for the exact match. The third-best score on the diachronic test data is now Sonstige Unregelmäßigkeiten:negative. Besides this, the top three categories per test set remain the same.</p><p>Apart from the fact that this is a different kind of task than before, one can notice that even though the overall micro F1 scores are lower for Subtask D than for Subtask C, the model manages to successfully identify a larger variety of categories, i.e. it achieves a positive score for more categories. This is probably due to the more balanced data for Subtask D than for Subtask C2, resulting in a lower overall score and mostly higher scores per category.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell>displays the</cell></row><row><cell cols="3">number of documents for each split.</cell></row><row><cell>train</cell><cell cols="2">dev test syn test dia</cell></row><row><cell cols="2">19,432 2,369</cell><cell>2,566 1,842</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Number of documents per split of the data set.</figDesc><table><row><cell cols="5">While roughly 74% of the documents form the train</cell></row><row><cell cols="5">set, the development split and the synchronic test</cell></row><row><cell cols="5">split contain around 9% and around 10%, respec-</cell></row><row><cell cols="5">tively. The remaining 7% of the data belong to</cell></row><row><cell cols="5">the diachronic set (cf. Tab. 1). Table 2 shows</cell></row><row><cell cols="5">the relevance distribution per data split. This un-</cell></row><row><cell cols="5">veils a pretty skewed distribution of the labels since</cell></row><row><cell cols="5">the relevant documents represent the clear majority</cell></row><row><cell cols="3">with over 80% in each split.</cell><cell></cell><cell></cell></row><row><cell>Relevance</cell><cell>train</cell><cell cols="3">dev test syn test dia</cell></row><row><cell>true</cell><cell cols="2">16,201 1,931</cell><cell cols="2">2,095 1,547</cell></row><row><cell>false</cell><cell>3,231</cell><cell>438</cell><cell>471</cell><cell>295</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>Relevance distribution for Subtask A.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc></figDesc><table><row><cell>holds the distribution of the 20 different</cell></row><row><cell>aspect categories assigned to the documents 2 . It</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc></figDesc><table><row><cell>: Aspect category distribution for Subtask C.</cell></row><row><cell>Multiple mentions of the same aspect category in a doc-</cell></row><row><cell>ument are only considered once.</cell></row><row><cell>Pre-trained architectures BERT was initially</cell></row><row><cell>introduced in a base (110M parameters) and a</cell></row><row><cell>large (340M) variant, Sanh et al. (2019) pro-</cell></row><row><cell>posed an even smaller BERT model (DistilBERT,</cell></row><row><cell>60M parameters) trained via knowledge distillation</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5 :</head><label>5</label><figDesc>Pre-trained models provided by huggingface transformers (version 4.0.1) suitable for German. For all available models, see: https://huggingface.co/transformers/pretrained_models.html.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head></head><label></label><figDesc>Table 7 displays the micro F1 score obtained by each language model on each test set (best result per data set in bold). Subtask B Subtask B refers to the Documentlevel Polarity, which is a multi-class classification task with three classes. Table 8 demonstrates the performances on the two test sets:</figDesc><table><row><cell></cell><cell></cell><cell>Language model</cell><cell>test syn test dia</cell></row><row><cell></cell><cell></cell><cell>Best models 2017 (test syn : Ruppert et al., 2017) (test dia : Sayyed et al., 2017)</cell><cell>0.767</cell><cell>0.750</cell></row><row><cell></cell><cell></cell><cell>bert-base-german-cased</cell><cell>0.798</cell><cell>0.793</cell></row><row><cell></cell><cell></cell><cell>bert-base-german-dbmdz-cased</cell><cell>0.799</cell><cell>0.785</cell></row><row><cell></cell><cell></cell><cell>bert-base-german-dbmdz-uncased</cell><cell>0.807</cell><cell>0.800</cell></row><row><cell></cell><cell></cell><cell>bert-base-multilingual-cased</cell><cell>0.790</cell><cell>0.780</cell></row><row><cell></cell><cell></cell><cell>bert-base-multilingual-uncased</cell><cell>0.784</cell><cell>0.766</cell></row><row><cell></cell><cell></cell><cell>distilbert-base-german-cased</cell><cell>0.798</cell><cell>0.776</cell></row><row><cell></cell><cell></cell><cell>distilbert-base-multilingual-cased</cell><cell>0.777</cell><cell>0.770</cell></row><row><cell></cell><cell></cell><cell>CNN (Attia et al., 2018)</cell><cell>0.755</cell><cell>-</cell></row><row><cell></cell><cell></cell><cell>ELMo+GloVe+BCN</cell></row><row><cell>Language model</cell><cell cols="2">test syn test dia</cell></row><row><cell>Best model 2017 (Sayyed et al., 2017)</cell><cell>0.903</cell><cell>0.906</cell></row><row><cell>bert-base-german-cased</cell><cell>0.950</cell><cell>0.939</cell></row><row><cell>bert-base-german-dbmdz-cased</cell><cell>0.951</cell><cell>0.946</cell></row><row><cell>bert-base-german-dbmdz-uncased</cell><cell>0.957</cell><cell>0.948</cell></row><row><cell>bert-base-multilingual-cased</cell><cell>0.942</cell><cell>0.933</cell></row><row><cell>bert-base-multilingual-uncased</cell><cell>0.944</cell><cell>0.939</cell></row><row><cell>distilbert-base-german-cased</cell><cell>0.944</cell><cell>0.939</cell></row><row><cell>distilbert-base-multilingual-cased</cell><cell>0.941</cell><cell>0.932</cell></row><row><cell cols="3">Table 7: F1 scores for Subtask A on synchronic and</cell></row><row><cell>diachronic test sets.</cell><cell></cell><cell></cell></row><row><cell cols="3">All the models outperform the best result achieved</cell></row><row><cell cols="3">in 2017 for both test data sets. For the synchronic</cell></row><row><cell cols="3">test set, the previous best result is surpassed by</cell></row><row><cell cols="3">3.8-5.4 percentage points. For the diachronic test</cell></row><row><cell cols="3">set, the absolute difference to the best contender of</cell></row><row><cell cols="3">2017 varies between 2.6 and 4.2 percentage points.</cell></row><row><cell cols="3">With a micro F1 score of 0.957 and 0.948, respec-</cell></row><row><cell cols="3">tively, the best scoring pre-trained language model</cell></row><row><cell cols="3">is the uncased German BERT-BASE variant by</cell></row><row><cell cols="3">dbmdz, followed by its cased version. All the</cell></row></table><note>pre-trained models perform slightly better on the synchronic test data than on the diachronic data.<ref type="bibr" target="#b0">Attia et al. (2018)</ref>,<ref type="bibr" target="#b35">Schmitt et al. (2018)</ref>,<ref type="bibr" target="#b3">Biesialska et al. (2020)</ref> and<ref type="bibr" target="#b9">Guhr et al. (2020)</ref> did not evaluate their models on this task.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 8 :</head><label>8</label><figDesc>0.789 † -Micro-averaged F1 scores for Subtask B on synchronic and diachronic test sets.</figDesc><table /><note>† Guhr et al. (2020) created their own (balanced &amp; unbalanced) data splits, which limits comparability. We compare to the performance on the unbalanced data since it more likely resembles the original data splits.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 9 :</head><label>9</label><figDesc>are pretty competitive. Interestingly, two of the multilingual models are even outperformed by these ELMo-based models. with each of the three sentiments. Consistent with<ref type="bibr" target="#b16">Lee et al. (2017)</ref> and<ref type="bibr" target="#b22">Mishra et al. (2017)</ref>, we do not account for multiple mentions of the same label in one document. The results for Subtask C1 are shown in Table9: Micro-averaged F1 scores for Subtask C1 (Aspect-only) on synchronic and diachronic test sets. A detailed overview of per-class performances for error analysis can be found in Table15in Appendix A.All pre-trained German BERTs clearly surpass the best performance from 2017 as well as the results reported by<ref type="bibr" target="#b35">Schmitt et al. (2018)</ref>, who are the only ones of the other authors to evaluate their models on this tasks. Regarding the synchronic test set, the absolute improvement ranges between 16.9 and 22.4 percentage points, while for the diachronic test data, the models outperform the previous results by 17.8-23.5 percentage points. The best model is again the uncased German BERT-BASE model by dbmdz, reaching scores of 0.761 and 0.791, respectively, followed by the two cased German BERT-BASE models. One more time, the multilingual models exhibit the poorest performances amongst the evaluated models. Next, Table10shows the results for Subtask C2:</figDesc><table><row><cell>Subtask C Subtask C is split into Aspect-only</cell></row><row><cell>(Subtask C1) and Aspect+Sentiment Classification</cell></row><row><cell>(Subtask C2), each being a multi-label classifica-</cell></row><row><cell>tion task 8 . As the organizers provide 20 aspect</cell></row><row><cell>categories, Subtask C1 includes 20 labels, whereas</cell></row><row><cell>Subtask C2 has 60 labels since each aspect category</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_11"><head>Table 12 :</head><label>12</label><figDesc>The best performing model is the uncased German BERT-BASE model by dbmdz with CRF layer on both test sets, with a score of 0.515 and 0.518, respectively. Overall, the results from 2017 are outperformed by 11.8-28.6 percentage points Entity-level micro-averaged F1 scores for Subtask D2 (overlapping match) on synchronic and diachronic test sets. A detailed overview of per-class performances for error analysis can be found in Table18in Appendix B.</figDesc><table><row><cell></cell><cell>Language model</cell><cell cols="2">test syn test dia</cell></row><row><cell></cell><cell>Best models 2017 (test syn : Lee et al., 2017) (test dia : Ruppert et al., 2017)</cell><cell>0.348</cell><cell>0.365</cell></row><row><cell></cell><cell>bert-base-german-cased</cell><cell>0.471</cell><cell>0.474</cell></row><row><cell>without CRF</cell><cell>bert-base-german-dbmdz-cased bert-base-german-dbmdz-uncased bert-base-multilingual-cased bert-base-multilingual-uncased distilbert-base-german-cased</cell><cell>0.491 0.501 0.457 0.435 0.397</cell><cell>0.488 0.518 0.473 0.417 0.407</cell></row><row><cell></cell><cell>distilbert-base-multilingual-cased</cell><cell>0.433</cell><cell>0.429</cell></row><row><cell></cell><cell>bert-base-german-cased</cell><cell>0.455</cell><cell>0.457</cell></row><row><cell></cell><cell>bert-base-german-dbmdz-cased</cell><cell>0.476</cell><cell>0.469</cell></row><row><cell>with CRF</cell><cell>bert-base-german-dbmdz-uncased bert-base-multilingual-cased bert-base-multilingual-uncased</cell><cell>0.523 0.476 0.484</cell><cell>0.533 0.474 0.464</cell></row><row><cell></cell><cell>distilbert-base-german-cased</cell><cell>0.433</cell><cell>0.423</cell></row><row><cell></cell><cell>distilbert-base-multilingual-cased</cell><cell>0.442</cell><cell>0.427</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_12"><head>Table 13 :</head><label>13</label><figDesc>Development of the SOTA Accuracy for the aspect term polarity task(SemEval-2014;   </figDesc><table><row><cell>0.7221</cell><cell>0.8095</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_14"><head>Table 14 :</head><label>14</label><figDesc>Development of the SOTA F1 score (SB3) and Accuracy (SB4) for the aspect category extraction/polarity task(SemEval-2014;<ref type="bibr" target="#b29">Pontiki et al., 2014)</ref>.</figDesc><table /><note>† Additional auxiliary sentences were used.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_15"><head></head><label></label><figDesc>in this case. Seven categories are summarized in Rest because they have an F1 score of 0 for both test sets, i.e. the model is not able to correctly identify any of these seven aspects appearing in the test data. The table is sorted by the score on the synchronic test set.</figDesc><table><row><cell></cell><cell cols="2">test syn</cell><cell cols="2">test dia</cell></row><row><cell>Aspect Category</cell><cell cols="4">Score Support Score Support</cell></row><row><cell>Allgemein</cell><cell>0.854</cell><cell cols="2">1,398 0.877</cell><cell>1,024</cell></row><row><cell>Sonstige Unregelmäßigkeiten</cell><cell>0.782</cell><cell cols="2">224 0.785</cell><cell>164</cell></row><row><cell>Connectivity</cell><cell>0.750</cell><cell cols="2">36 0.838</cell><cell>73</cell></row><row><cell>Zugfahrt</cell><cell>0.678</cell><cell cols="2">241 0.687</cell><cell>184</cell></row><row><cell>Auslastung und Platzangebot</cell><cell>0.645</cell><cell cols="2">35 0.667</cell><cell>20</cell></row><row><cell>Sicherheit</cell><cell>0.602</cell><cell cols="2">84 0.639</cell><cell>42</cell></row><row><cell>Atmosphäre</cell><cell>0.600</cell><cell cols="2">148 0.532</cell><cell>53</cell></row><row><cell>Barrierefreiheit</cell><cell>0.500</cell><cell>9</cell><cell>0</cell><cell>2</cell></row><row><cell>Ticketkauf</cell><cell>0.481</cell><cell cols="2">95 0.506</cell><cell>48</cell></row><row><cell cols="2">Service und Kundenbetreuung 0.476</cell><cell cols="2">63 0.417</cell><cell>27</cell></row><row><cell>DB App und Website</cell><cell>0.455</cell><cell cols="2">28 0.563</cell><cell>18</cell></row><row><cell>Informationen</cell><cell>0.329</cell><cell cols="2">58 0.464</cell><cell>35</cell></row><row><cell>Komfort und Ausstattung</cell><cell>0.286</cell><cell>24</cell><cell>0</cell><cell>11</cell></row><row><cell>Rest</cell><cell>0</cell><cell>24</cell><cell>0</cell><cell>20</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_16"><head>Table 15 :</head><label>15</label><figDesc>Micro-averaged F1 scores and support by aspect category (Subtask C1). Seven categories are summarized in Rest and show each a score of 0. Connectivity are the highest. 13 categories, mostly similar between the two test sets, show a positive F1 score on at least one of the two test sets. For the categories subsumed under Rest, the model was not able to learn how to correctly identify these categories.Subtask C2 exhibits a similar distribution of the true labels, with the Aspect+Sentiment category Allgemein:neutral as majority class. Over 50% of the true labels belong to this class. Table16shows that only 12 out of 60 labels can be detected by the model (see Table16).</figDesc><table><row><cell>The F1 scores for Allgemein (General),</cell></row><row><cell>Sonstige Unregelmäßigkeiten (Other ir-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_17"><head>Table 16 :</head><label>16</label><figDesc>Micro-averaged F1 scores and support by As-pect+Sentiment category (Subtask C2). 48 categories are summarized in Rest and show each a score of 0.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_18"><head></head><label></label><figDesc>Table 17 gives the detailed classification report for the uncased German BERT-BASE model with CRF layer on Subtask D1. Only entities that were correctly detected at least once are displayed. The table is sorted by the score on the synchronic test set. The classification report for Subtask D2 is displayed analogously in Table 18.</figDesc><table><row><cell></cell><cell>test syn</cell><cell></cell><cell>test dia</cell><cell></cell></row><row><cell>Category</cell><cell cols="4">Score Support Score Support</cell></row><row><cell>Zugfahrt:negative</cell><cell>0.702</cell><cell cols="2">622 0.729</cell><cell>495</cell></row><row><cell>Sonstige Unregelmäßigkeiten:negative</cell><cell>0.681</cell><cell cols="2">693 0.581</cell><cell>484</cell></row><row><cell>Sicherheit:negative</cell><cell>0.604</cell><cell cols="2">337 0.457</cell><cell>122</cell></row><row><cell>Connectivity:negative</cell><cell>0.598</cell><cell cols="2">56 0.620</cell><cell>109</cell></row><row><cell>Barrierefreiheit:negative</cell><cell>0.595</cell><cell>14</cell><cell>0</cell><cell>3</cell></row><row><cell>Auslastung und Platzangebot:negative</cell><cell>0.579</cell><cell cols="2">66 0.447</cell><cell>31</cell></row><row><cell>Connectivity:positive</cell><cell>0.571</cell><cell cols="2">26 0.555</cell><cell>60</cell></row><row><cell>Allgemein:negative</cell><cell>0.545</cell><cell cols="2">807 0.343</cell><cell>139</cell></row><row><cell>Atmosphäre:negative</cell><cell>0.500</cell><cell cols="2">403 0.337</cell><cell>164</cell></row><row><cell>Ticketkauf:negative</cell><cell>0.383</cell><cell cols="2">96 0.583</cell><cell>74</cell></row><row><cell>Ticketkauf:positive</cell><cell>0.368</cell><cell>59</cell><cell>0</cell><cell>13</cell></row><row><cell>Komfort und Ausstattung:negative</cell><cell>0.357</cell><cell>24</cell><cell>0</cell><cell>16</cell></row><row><cell>Atmosphäre:neutral</cell><cell>0.348</cell><cell cols="2">40 0.111</cell><cell>14</cell></row><row><cell cols="2">Service und Kundenbetreuung:negative 0.323</cell><cell cols="2">74 0.286</cell><cell>31</cell></row><row><cell>Informationen:negative</cell><cell>0.301</cell><cell cols="2">68 0.505</cell><cell>46</cell></row><row><cell>Zugfahrt:positive</cell><cell>0.276</cell><cell cols="2">62 0.343</cell><cell>83</cell></row><row><cell>DB App und Website:negative</cell><cell>0.232</cell><cell cols="2">39 0.375</cell><cell>33</cell></row><row><cell>DB App und Website:neutral</cell><cell>0.188</cell><cell>23</cell><cell>0</cell><cell>11</cell></row><row><cell>Sonstige Unregelmäßigkeiten:neutral</cell><cell>0.179</cell><cell cols="2">13 0.222</cell><cell>2</cell></row><row><cell>Allgemein:positive</cell><cell>0.157</cell><cell cols="2">86 0.586</cell><cell>92</cell></row><row><cell cols="2">Service und Kundenbetreuung:positive 0.115</cell><cell>23</cell><cell>0</cell><cell>5</cell></row><row><cell>Atmosphäre:positive</cell><cell>0.105</cell><cell>26</cell><cell>0</cell><cell>15</cell></row><row><cell>Ticketkauf:neutral</cell><cell>0.040</cell><cell cols="2">144 0.222</cell><cell>25</cell></row><row><cell>Connectivity:neutral</cell><cell>0</cell><cell cols="2">11 0.211</cell><cell>15</cell></row><row><cell>Toiletten:negative</cell><cell>0</cell><cell cols="2">15 0.160</cell><cell>23</cell></row><row><cell>Rest</cell><cell>0</cell><cell>355</cell><cell>0</cell><cell>115</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_19"><head>Table 17 :</head><label>17</label><figDesc>Micro-averaged F1 scores and support by As-pect+Sentiment entity with exact match (Subtask D1). 35 categories are summarized in Rest, each of them exhibiting a score of 0.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_21"><head>Table 18 :</head><label>18</label><figDesc>Micro-averaged F1 scores and support by Aspect+Sentiment entity with overlapping match (Subtask D2). 35 categories are summarized in Rest and show each a score of 0.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The data sets (in both formats) can be obtained from http://ltdata1.informatik.uni-hamburg.de/germeval2017/.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Multiple annotations per document are possible;for a detailed category description see https://sites.google.com/view/germeval2017-absa/data.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">MDZ Digital Library team at the Bavarian State Library. Visit https://www.digitale-sammlungen.de for details and https://github.com/dbmdz/berts for their repository on pre-trained BERT models.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">  4  Visit https://deepset.ai/german-bert for details</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">.5  "positve" in train set was replaced with "positive", " negative" in test dia set was replaced with "negative".</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">Due to memory limitations, not every hyperparameter combination was applicable.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">This leads to a change of activation functions in the final layer from softmax to sigmoid + binary cross entropy loss.</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix</head><p>A Detailed results (per category) for Subtask C</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Multilingual multi-class sentiment classification using convolutional neural networks</title>
		<author>
			<persName><forename type="first">Mohammed</forename><surname>Attia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Younes</forename><surname>Samih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ali</forename><surname>Elkahky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Laura</forename><surname>Kallmeyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)</title>
				<meeting>the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)<address><addrLine>Miyazaki, Japan</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Improving sentiment analysis over non-English tweets using multilingual transformers and automatic translation for data-augmentation</title>
		<author>
			<persName><forename type="first">Valentin</forename><surname>Barriere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexandra</forename><surname>Balahur</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.coling-main.23</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Computational Linguistics</title>
				<meeting>the 28th International Conference on Computational Linguistics<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="266" to="271" />
		</imprint>
	</monogr>
	<note>International Committee on Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Document level sentiment analysis: A survey</title>
		<author>
			<persName><forename type="first">Salima</forename><surname>Behdenna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fatiha</forename><surname>Barigou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ghalem</forename><surname>Belalem</surname></persName>
		</author>
		<idno type="DOI">10.4108/eai.14-3-2018.154339</idno>
	</analytic>
	<monogr>
		<title level="j">EAI Endorsed Transactions on Contextaware Systems and Applications</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">154339</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Katarzyna</forename><surname>Biesialska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Magdalena</forename><surname>Biesialska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Henryk</forename><surname>Rybinski</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.05574</idno>
		<title level="m">Sentiment analysis with contextual embeddings and self-attention</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Enriching word vectors with subword information</title>
		<author>
			<persName><forename type="first">Piotr</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edouard</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Armand</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="135" to="146" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Learning phrase representations using rnn encoder-decoder for statistical machine translation</title>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bart</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caglar</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fethi</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Holger</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1406.1078</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A Twitter corpus and benchmark resources for German sentiment analysis</title>
		<author>
			<persName><forename type="first">Mark</forename><surname>Cieliebak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jan</forename><forename type="middle">Milan</forename><surname>Deriu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dominic</forename><surname>Egger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fatih</forename><surname>Uzdilli</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W17-1106</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</title>
				<meeting>the Fifth International Workshop on Natural Language Processing for Social Media<address><addrLine>Valencia, Spain</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="45" to="51" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">ParaCrawl: Web-scale parallel corpora for the languages of the EU</title>
		<author>
			<persName><forename type="first">M</forename><surname>Esplà-Gomis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Forcada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gema</forename><surname>Ramírez-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hieu</forename><forename type="middle">T</forename><surname>Hoang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MT-Summit</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems</title>
		<author>
			<persName><forename type="first">Oliver</forename><surname>Guhr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anne-Kathrin</forename><surname>Schumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Bahrmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hans-Joachim</forename><surname>Böhme</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)</title>
				<meeting>the 12th Conference on Language Resources and Evaluation (LREC 2020)<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1627" to="1632" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">News Crawl Corpus</title>
		<author>
			<persName><forename type="first">Barry</forename><surname>Haddow</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Distilling the knowledge in a neural network</title>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oriol</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1503.02531</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Aspect-based sentiment analysis using BERT</title>
		<author>
			<persName><forename type="first">Mickel</forename><surname>Hoang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oskar</forename><surname>Alija Bihorac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jacobo</forename><surname>Rouces</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd Nordic Conference on Computational Linguistics</title>
				<meeting>the 22nd Nordic Conference on Computational Linguistics<address><addrLine>Turku, Finland</addrLine></address></meeting>
		<imprint>
			<publisher>Linköping University Electronic Press</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="187" to="196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">Sepp</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Fasttext and Gradient Boosted Trees at GermEval-2017 Tasks on Relevance Classification and Document-level Polarity</title>
		<author>
			<persName><forename type="first">Leonard</forename><surname>Hövelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christoph</forename><forename type="middle">M</forename><surname>Friedrich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Adversarial training for aspect-based sentiment analysis with bert</title>
		<author>
			<persName><forename type="first">Akbar</forename><surname>Karimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leonardo</forename><surname>Rossi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Prati</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">UKP TU-DA at GermEval 2017: Deep Learning for Aspect Based Sentiment Detection</title>
		<author>
			<persName><forename type="first">Ji-Ung</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steffen</forename><surname>Eger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Daxenberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iryna</forename><surname>Gurevych</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Hierarchical attention based position-aware network for aspect-level sentiment analysis</title>
		<author>
			<persName><forename type="first">Lishuang</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yang</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anqiao</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/K18-1018</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd Conference on Computational Natural Language Learning</title>
				<meeting>the 22nd Conference on Computational Natural Language Learning<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="181" to="189" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Exploiting BERT for end-to-end aspect-based sentiment analysis</title>
		<author>
			<persName><forename type="first">Xin</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lidong</forename><surname>Bing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wenxuan</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wai</forename><surname>Lam</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-5505</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)</title>
				<meeting>the 5th Workshop on Noisy User-generated Text (W-NUT 2019)<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="34" to="41" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">OpenSubti-tles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles</title>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Lison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jörg</forename><surname>Tiedemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</title>
				<meeting>the 10th International Conference on Language Resources and Evaluation (LREC</meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">ner and pos when nothing is capitalized</title>
		<author>
			<persName><forename type="first">Stephen</forename><surname>Mayhew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tatiana</forename><surname>Tsygankova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Roth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="6256" to="6261" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Learned in translation: Contextualized word vectors</title>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Mccann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Bradbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caiming</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="6294" to="6305" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">GermEval 2017: Sequence based Models for Customer Feedback Analysis</title>
		<author>
			<persName><forename type="first">Pruthwik</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vandan</forename><surname>Mujadia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Soujanya</forename><surname>Lanka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures</title>
		<author>
			<persName><forename type="first">Pedro</forename><surname>Javier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ortiz</forename><surname>Suárez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benoît</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Laurent</forename><surname>Romary</surname></persName>
		</author>
		<idno type="DOI">10.14618/IDS-PUB-9021</idno>
	</analytic>
	<monogr>
		<title level="m">7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)</title>
				<meeting><address><addrLine>Cardiff, United Kingdom</addrLine></address></meeting>
		<imprint>
			<publisher>Leibniz-Institut für Deutsche Sprache</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Towards an Open Platform for Legal Information</title>
		<author>
			<persName><forename type="first">Malte</forename><surname>Ostendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Till</forename><surname>Blume</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Saskia</forename><surname>Ostendorff</surname></persName>
		</author>
		<idno type="DOI">10.1145/3383583.3398616</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, JCDL &apos;20</title>
				<meeting>the ACM/IEEE Joint Conference on Digital Libraries in 2020, JCDL &apos;20<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="385" to="388" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">Mark</forename><surname>Matthew E Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohit</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matt</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luke</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1802.05365</idno>
		<title level="m">Deep contextualized word representations</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Semeval-2016 task 5: Aspect based sentiment analysis</title>
		<author>
			<persName><forename type="first">Maria</forename><surname>Pontiki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dimitris</forename><surname>Galanis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haris</forename><surname>Papageorgiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ion</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Suresh</forename><surname>Manandhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Al-Smadi</forename><surname>Mohammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mahmoud</forename><surname>Al-Ayyoub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yanyan</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bing</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Orphee</forename><surname>De Clercq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Veronique</forename><surname>Hoste</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marianna</forename><surname>Apidianaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xavier</forename><surname>Tannier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Natalia</forename><surname>Loukachevitch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evgeny</forename><surname>Kotelnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nuria</forename><surname>Bel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Salud</forename><surname>María Zafra</surname></persName>
		</author>
		<author>
			<persName><surname>Güls ¸en Eryigit</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/S16-1002</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Workshop on Semantic Evaluation</title>
				<meeting>the 10th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="19" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">SemEval-2015 task 12: Aspect based sentiment analysis</title>
		<author>
			<persName><forename type="first">Maria</forename><surname>Pontiki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dimitris</forename><surname>Galanis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haris</forename><surname>Papageorgiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Suresh</forename><surname>Manandhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ion</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/S15-2082</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)</title>
				<meeting>the 9th International Workshop on Semantic Evaluation (SemEval 2015)<address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="486" to="495" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">SemEval-2014 task 4: Aspect based sentiment analysis</title>
		<author>
			<persName><forename type="first">Maria</forename><surname>Pontiki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dimitris</forename><surname>Galanis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Pavlopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Harris</forename><surname>Papageorgiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ion</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Suresh</forename><surname>Manandhar</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/S14-2004</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)</title>
		<title level="s">Association for Computational Linguistics</title>
		<meeting>the 8th International Workshop on Semantic Evaluation (SemEval 2014)<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="27" to="35" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Adapt or get left behind: Domain adaptation through BERT language model finetuning for aspect-target sentiment classification</title>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Rietzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Stabinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Opitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Engl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Language Resources and Evaluation Conference</title>
				<meeting>the 12th Language Resources and Evaluation Conference<address><addrLine>seille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2020-03">2020. Mar-</date>
			<biblScope unit="page" from="4933" to="4941" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Is Multilingual BERT Fluent in Language Generation?</title>
		<author>
			<persName><forename type="first">Samuel</forename><surname>Rönnqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jenna</forename><surname>Kanerva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapio</forename><surname>Salakoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Filip</forename><surname>Ginter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing</title>
				<meeting>the First NLPL Workshop on Deep Learning for Natural Language Processing<address><addrLine>Turku, Finland</addrLine></address></meeting>
		<imprint>
			<publisher>Linköping University Electronic Press</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="29" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">LT-ABSA: An Extensible Open-Source System for Document-Level and Aspect-Based Sentiment Analysis</title>
		<author>
			<persName><forename type="first">Eugen</forename><surname>Ruppert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abhishek</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Biemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</title>
		<author>
			<persName><forename type="first">Victor</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lysandre</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Wolf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.01108</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">IDS-IUCL: Investigating Feature Selection and Oversampling for GermEval</title>
		<author>
			<persName><forename type="first">Zeeshan</forename><surname>Ali Sayyed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><surname>Dakota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sandra</forename><surname>Kübler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Joint aspect and polarity classification for aspect-based sentiment analysis with end-to-end neural networks</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Schmitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Steinheber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Konrad</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benjamin</forename><surname>Roth</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1139</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1109" to="1114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">PotTS at GermEval-2017 Task B: Document-Level Polarity Detection Using Hand-Crafted SVM and Deep Bidirectional LSTM Network</title>
		<author>
			<persName><forename type="first">Uladzimir</forename><surname>Sidarenka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspect-based Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus</title>
		<author>
			<persName><forename type="first">Jörg</forename><surname>Raivis Skadin ¸š</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roberts</forename><surname>Tiedemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daiga</forename><surname>Rozis</surname></persName>
		</author>
		<author>
			<persName><surname>Deksne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)</title>
				<meeting>the 9th International Conference on Language Resources and Evaluation (LREC 2014)<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1850" to="1855" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<title level="m" type="main">Attentional encoder network for targeted sentiment classification</title>
		<author>
			<persName><forename type="first">Youwei</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiahai</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tao</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiyue</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yanghui</forename><surname>Rao</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1902.09314</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence</title>
		<author>
			<persName><forename type="first">Chi</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luyao</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xipeng</forename><surname>Qiu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1035</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="380" to="385" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<monogr>
		<title level="m" type="main">Aspect level sentiment classification with deep memory network</title>
		<author>
			<persName><forename type="first">Duyu</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bing</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ting</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1605.08900</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Toward multi-label sentiment analysis: a transfer learning based approach</title>
		<author>
			<persName><forename type="first">Jie</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xing</forename><surname>Fang</surname></persName>
		</author>
		<idno type="DOI">10.1186/s40537-019-0278-0</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Big Data</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">1</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Attention Is All You Need</title>
		<author>
			<persName><forename type="first">Ashish</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Noam</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Niki</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jakob</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Llion</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aidan</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lukasz</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Illia</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">31st Conference on Neural Information Processing Systems (NIPS 2017)</title>
				<meeting><address><addrLine>Long Beach, California, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Superglue: A stickier benchmark for general-purpose language understanding systems</title>
		<author>
			<persName><forename type="first">Alex</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yada</forename><surname>Pruksachatkun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikita</forename><surname>Nangia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amanpreet</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julian</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Omer</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Samuel</forename><surname>Bowman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3266" to="3280" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<title level="m" type="main">Glue: A multi-task benchmark and analysis platform for natural language understanding</title>
		<author>
			<persName><forename type="first">Alex</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amanpreet</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julian</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Omer</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><surname>Samuel R Bowman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1804.07461</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">Attention-based lstm for aspectlevel sentiment classification</title>
		<author>
			<persName><forename type="first">Yequan</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Minlie</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaoyan</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 conference on empirical methods in natural language processing</title>
				<meeting>the 2016 conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="606" to="615" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">GermEval 2017: Shared Task on Aspect-based Sentiment in Social Media Customer Feedback</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Wojatzki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eugen</forename><surname>Ruppert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Holschneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Torsten</forename><surname>Zesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Biemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2017 -Shared Task on Aspectbased Sentiment in Social Media Customer Feedback</title>
				<meeting>the GermEval 2017 -Shared Task on Aspectbased Sentiment in Social Media Customer Feedback<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-Art Natural Language Processing</title>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lysandre</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clement</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anthony</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierric</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tim</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rémi</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Morgan</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joe</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sam</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clara</forename><surname>Patrick Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yacine</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Canwen</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Teven</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sylvain</forename><surname>Le Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mariama</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Quentin</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><forename type="middle">M</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><surname>Rush</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.6</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
		<title level="s">Association for Computational Linguistics</title>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<monogr>
		<title level="m" type="main">Contextguided bert for targeted aspect-based sentiment analysis</title>
		<author>
			<persName><forename type="first">Zhengxuan</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Desmond</forename><forename type="middle">C</forename><surname>Ong</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.07523</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b49">
	<monogr>
		<title level="m" type="main">Bert post-training for review reading comprehension and aspect-based sentiment analysis</title>
		<author>
			<persName><forename type="first">Hu</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bing</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lei</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philip</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<monogr>
		<author>
			<persName><forename type="first">Heng</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Biqing</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianhao</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Youwei</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ruyang</forename><surname>Xu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1912.07976</idno>
		<title level="m">A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
