<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Clinical NER using Spanish BERT Embeddings</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ramya</forename><surname>Vunikili</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Digital Technology &amp; Innovation</orgName>
								<orgName type="institution">Siemens Healthineers</orgName>
								<address>
									<region>NJ</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Digital Technology &amp; Innovation</orgName>
								<orgName type="institution">Siemens Healthineers</orgName>
								<address>
									<settlement>Bangalore</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vasile</forename><surname>George</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Siemens</orgName>
								<address>
									<settlement>Brasov</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Oladimeji</forename><surname>Farri</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Digital Technology &amp; Innovation</orgName>
								<orgName type="institution">Siemens Healthineers</orgName>
								<address>
									<region>NJ</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Clinical NER using Spanish BERT Embeddings</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">518A9CA76B05B01A70C436769E42752F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Bidirectional Encoder Representations</term>
					<term>BERT</term>
					<term>NER</term>
					<term>IberLEF 2020</term>
					<term>Spanish embeddings</term>
					<term>BETO</term>
					<term>CANTEMIST</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents an overview of transfer learning-based approach to the Named Entity Recognition (NER) sub-task from Cancer Text Mining Shared Task (CANTEMIST) conducted as a part of Iberian Languages Evaluation Forum (IberLEF) 2020. We explore the use of Bidirectional Encoder Representations from Transformers (BERT) based contextual embeddings trained on general domain Spanish text to extract tumor morphology from clinical reports written in Spanish. We achieve an F1 score of 73.4% on NER without using any feature engineered or rule-based approaches, and present our work as inspiration for further research on this task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>There is a significant demand for automated analyses of electronic health record (EHR) documents to support clinical decision making and precision medicine. This is particularly true for documents written in Spanish language since nearly 10K of such documents are generated every 10 minutes in Spanish-speaking geographies <ref type="bibr" target="#b0">[1]</ref>.</p><p>According to the World Health Organisation (WHO), cancer was the second leading cause of death in 2018 1 . Leveraging Natural Language Processing (NLP) techniques for cancer related EHR documents can not only expedite the decision making process but can also improve the quality of patient care by providing intrinsic information. Therefore CANTEMIST <ref type="bibr" target="#b0">[1]</ref> focuses on automatic detection of the mentions related to tumor morphology through it's three independent tasks. We focus our work on the first sub-task, NER, by exploring contextual embeddings.</p><p>Contextualized language models rely heavily on large data sets to properly crystallize the deep embedding patterns specific to semantic meaning. As clinical text data on cancer reports is scarce, we chose to apply transfer learning using a BERT model <ref type="bibr" target="#b1">[2]</ref>, BETO <ref type="bibr" target="#b2">[3]</ref>, pre-trained on general domain Spanish text. Table <ref type="table" target="#tab_0">1</ref> presents a comparison between the training corpus used for BETO and the CANTEMIST dataset.</p><p>Disclaimer: The concepts and information presented in this paper are based on research results that are not commercially available. email: ramya.vunikili@siemens-healthineers.com (R. Vunikili); supriya.hn@siemens-healthineers.com (S.H. N); george.marica@siemens.com (V.G. Marica); oladimeji.farri@siemens-healthineers.com (O. Farri) orcid: 0000-0003-4629-3307 (R. Vunikili)  BETO has faithfully replicated the architecture behind the seminal contextualized embeddings inspired from Transformers <ref type="bibr" target="#b3">[4]</ref> and is enhanced through training techniques like dynamic-masking <ref type="bibr" target="#b4">[5]</ref> and whole-word-masking. As an example, Figure <ref type="figure" target="#fig_0">1</ref> shows the embedding of a Spanish sentence from the CANTEMIST corpus.</p><p>Also, since BETO has outperformed multilingual BERT (M-BERT) <ref type="bibr" target="#b1">[2]</ref> on seven of the eight NLP tasks <ref type="bibr" target="#b2">[3]</ref>, we chose to use BETO as the base for the CANTEMIST NER task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Contextualized language models have provided improved performance for a myriad of NLP tasks by relying on a common deep network architecture. These models are often trained on a single large corpus of multilingual, general domain texts with subsequent fine-tuning on specific data sets through transfer learning.</p><p>One important reference in this field is the BERT language representation model which serves as basis for many zero-shot cross-lingual transfer. Trained on the top 104 Wikipedia versions, multilingual BERT has proven competitive in many NLP tasks. <ref type="bibr" target="#b5">[6]</ref> Despite not benefiting from cross-lingual alignment, M-BERT outperforms models based on cross-lingual embeddings <ref type="bibr" target="#b6">[7]</ref>.</p><p>Such adaptability of M-BERT to various NLP tasks has been investigated end explained through the over-lapping effect of word-pieces across different languages. As such, common nouns, word roots, numbers, and URLs are mapped to a shared embedding space, determining co-occurring pieces <ref type="bibr" target="#b7">[8]</ref>. Another study on the cross-lingual ability of BERT concludes that performance is relatively invariant with respect to word-pieces overlap or multi-head attention complexity <ref type="bibr" target="#b8">[9]</ref> and suggests that the true versatility comes from a better network depth or a higher structural and semantic similarity between different languages.</p><p>Departing from the hypothesis that different languages have a common structural core to which M-BERT adapts during training, <ref type="bibr" target="#b9">[10]</ref> follow the intuition of splitting a M-BERT sentence representation into a neutral (language agnostic) component and a specific language component. Through a series of tasks oriented towards language identification, language similarity, parallel sentence retrieval and word alignment, this study concludes that core cross-lingual representations are not neutral/general enough to mirror similar semantic structure. Consequently, multilingual embeddings are not good enough to solve difficult NLP tasks after zero-shot transfer learning.</p><p>In the same vein, an extensive study <ref type="bibr" target="#b10">[11]</ref> regarding the internal structure of M-BERT used canonical correlation analysis <ref type="bibr" target="#b11">[12]</ref> between similar representations in multiple languages. By looking at the similarity of deep layer representations, a divergence pattern was identified. M-BERT was not just mapping different languages into the same space but instead it was reflecting "linguistic and evolutionary relationships". Embeddings similarity was mostly identified in word-pieces rather than in word or character tokenization, with Romantic and Germanic languages clustered into different branches of the network.</p><p>A more targeted approach for transfer learning would be the identification of language families, where word-piece overlap, and similar grammar structure preserve the compact nature of a semantic representation. English to Spanish transfer learning for POS tagging has been shown improve performance when labeled data is scarce <ref type="bibr" target="#b12">[13]</ref>, or improve NER tasks when referring to proper nouns or niche concepts <ref type="bibr" target="#b13">[14]</ref>. In the case where data is available in large quantities for individual languages, it is recommendable to combine specific language word representations with language-family models <ref type="bibr" target="#b14">[15]</ref>.</p><p>Considering these findings, we believe that multilingual contextualized embeddings are not optimal for those NLP tasks where either word-piece overlap, or semantic structure similarity are not high enough between pre-training corpus and task corpus. As such we have searched for a pre-trained BERT model that closely mimics the CanTeMiST data set. In ideal circumstances, such a model should have been pre-trained on Spanish EHR documents (labelled and/or unlabelled). However, we decided to explore the performance of the model trained on general domain Spanish text with fine-tuning, as the results can provide additional evidence to support the hypothesis that linguistic and evolutionary relationships can be learned from one domain and transferred to another.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset and Experiments</head><p>We chose as task, the automatic named entity recognition of tumor morphology mentions in plain text medical documents.</p><p>The CanTeMiST dataset contains 6,933 de-identified clinical documents which are annotated for mentions related to tumor morphology, denoted by entity MORPHOLOGIA_NEOPLASIA, using the BRAT tool <ref type="bibr" target="#b15">[16]</ref>. The annotations are done using well-established guidelines published by the Spanish Ministry of Health. Annotations have been made by clinical coding experts, according to eCIE-O-3.1 codes<ref type="foot" target="#foot_0">2</ref> following multiple iterations of quality control and annotation consistency. The choice of reports faithfully reflects the narative of electronic clinical reports. Table <ref type="table" target="#tab_1">2</ref> summarises the data splits used as train, development and test sets along with the average number of tokens per report in each of these sets.</p><p>As a pre-processing step, all the reports are lower-cased and tokenized according to either sentences or sections of the reports so as to maintain a sequence length of less than or equal to 512. The sentence tokenizations are further broken-down to word-level tokens such that the start and end offsets of these tokens with respect to the original report are preserved. These word-level tokens are then encoded in BILOU format and given as input to fine-tune the BERT model on CANTEMIST dataset. During prediction time, all the tokens are O encoded as the ground truth is not provided. The output from   the BERT model is then gathered and post-processed to produce BRAT format. Figure <ref type="figure" target="#fig_1">2</ref> shows an overview of the pipeline used for prediction. The BERT model is fine-tuned using AllenNLP platform <ref type="bibr" target="#b16">[17]</ref> on NVIDIA Tesla V100 (32GB) GPU for 40 epochs, on the shuffled set composed of train, dev1 and dev2 data. Prediction is carried on both test and background sets. The hyper-parameters for the best model are summarised in Table <ref type="table" target="#tab_2">3</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Table <ref type="table" target="#tab_3">4</ref> summarises the results obtained on test set using the official evaluation library for CanTeMiST <ref type="foot" target="#foot_1">3</ref> and Figure <ref type="figure" target="#fig_2">3</ref> presents excerpts from two reports and the entities predicted by the BERT model. In order to account for the lower precision, it's worth studying the overlap between the vocabulary between BETO and CANTEMIST. The two vocabularies have an overlap of 24% which can be observed from Figure <ref type="figure" target="#fig_3">4</ref>. Majority of these overlapped vocabulary contain suffixes such as '##s', '##l', '##al', '##a', '##op' that carry little-to-no information related to medical domain. And hence, the model struggled to differentiate between words such as mycoplasma (a bacteria) and neoplasm (abnormal growth of cells) which resulted in labelling the former as tumor related entity. In order to avoid such issues, it would be nice to add frequently occurring cancer related vocabulary to the unused tokens of BETO vocabulary so that the model can initialise different embedding irrespective of the suffix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Future Work</head><p>As Spanish and English languages are syntactically similar, it might be safe to assume that some of the architectures that worked well for English might also translate to Spanish. One such model based on BERT and dynamic span graphs is DyGIEPP <ref type="bibr" target="#b17">[18]</ref>. We plan on applying this architecture to CANTEMIST using the BETO embeddings as a next step.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: BETO embedding representation for the sentence: la broncoscopia no mostraba lesiones endobronquiales.</figDesc><graphic coords="2,85.57,70.16,424.13,89.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Overview of the prediction pipeline.</figDesc><graphic coords="4,85.57,197.15,424.13,220.49" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure3: Excerpts from two reports along with named entities predicted by BERT. Green represents correctly identified mentions along with their spans. Yellow refers to mentions that are annotated to be a single entity but the model identified as separate entities. Red represents mentions that are not present in the ground truth but predicted by the model.</figDesc><graphic coords="5,85.57,285.06,424.14,234.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: BERT and BETO vocabulary overlap</figDesc><graphic coords="6,210.04,70.16,175.20,115.68" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 BETO</head><label>1</label><figDesc></figDesc><table><row><cell>vs CANTEMIST corpus comparison</cell><cell></cell><cell></cell></row><row><cell>Criterion</cell><cell>BETO</cell><cell>CANTEMIST</cell></row><row><cell>Training corpus</cell><cell>ES Wiki; OPUS</cell><cell>-</cell></row><row><cell>Total number of tokens</cell><cell>3 billion</cell><cell>1.15 million</cell></row><row><cell>Unique tokens</cell><cell>31K</cell><cell>10.5 K</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Summary of the data splits provided for CANTEMIST-NER sub-task.</figDesc><table><row><cell>Split</cell><cell>Dataset</cell><cell cols="2">Number of reports Average number of tokens</cell></row><row><cell>Training Set</cell><cell>Train</cell><cell>501</cell><cell>739</cell></row><row><cell>Validation Set</cell><cell>Dev1 Dev2</cell><cell>250 250</cell><cell>734 585</cell></row><row><cell>Testing Set</cell><cell>Test + Background</cell><cell>300 + 4932</cell><cell>348</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Hyper-parameters of the BERT model</figDesc><table><row><cell>Parameter</cell><cell>Value</cell></row><row><cell>Learning rate</cell><cell>0.001</cell></row><row><cell>Optimizer</cell><cell>Adam</cell></row><row><cell>Maximum Sequence Length</cell><cell>512</cell></row><row><cell>Epochs</cell><cell>40</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Performance metrics for NER.</figDesc><table><row><cell cols="3">Dataset Precision Recall F1 Score</cell></row><row><cell>Test</cell><cell>72.7%</cell><cell>74.1% 73.4%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://eciemaps.mscbs.gob.es/ecieMaps/browser/index_o_3.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://github.com/TeMU-BSC/cantemist-evaluation-library</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Named entity recognition, concept normalization and clinical coding</title>
		<author>
			<persName><forename type="first">A</forename><surname>Miranda-Escalada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Farré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krallinger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Overview of the CANTEMIST track for cancer text mining in Spanish, Corpus, Guidelines, Methods and Results</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Spanish pre-trained bert model and evaluation data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cañete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chaperon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fuentes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Practical ML for Developing Countries Workshop@ ICLR 2020</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<title level="m">Roberta: A robustly optimized bert pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dredze</surname></persName>
		</author>
		<author>
			<persName><surname>Beto</surname></persName>
		</author>
		<author>
			<persName><surname>Bentz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.09077</idno>
		<title level="m">becas: The surprising cross-lingual effectiveness of bert</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">L</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">H</forename><surname>Turban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hamblin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">Y</forename><surname>Hammerla</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1702.03859</idno>
		<title level="m">Offline bilingual word vectors, orthogonal transformations and the inverted softmax</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Pires</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garrette</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1906.01502</idno>
		<title level="m">How multilingual is Multilingual BERT?</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Cross-lingual ability of multilingual bert: An empirical study</title>
		<author>
			<persName><forename type="first">K</forename><surname>Karthikeyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mayhew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Libovickỳ</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fraser</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.03310</idno>
		<title level="m">How language-neutral is Multilingual BERT?</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bert is not an interlingua and the bias of tokenization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mccann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP</title>
				<meeting>the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP<address><addrLine>DeepLo</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="47" to="55" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Relations between two sets of variates</title>
		<author>
			<persName><forename type="first">H</forename><surname>Hotelling</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Breakthroughs in statistics</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="162" to="190" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Cohen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1703.06345</idno>
		<title level="m">Transfer learning for sequence tagging with hierarchical recurrent networks</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Spanish NER with word representations and conditional random fields</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L C</forename><surname>Zea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E O</forename><surname>Luna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Glavaš</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the sixth named entity workshop</title>
				<meeting>the sixth named entity workshop</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="34" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Cross-lingual transfer learning for pos tagging without cross-lingual resources</title>
		<author>
			<persName><forename type="first">J.-K</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sarikaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fosler-Lussier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 conference on empirical methods in natural language processing</title>
				<meeting>the 2017 conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2832" to="2838" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">brat: a web-based tool for NLP-Assisted Text Annotation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Stenetorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Topić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tsujii</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/E12-2021" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics</title>
				<meeting>the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics<address><addrLine>Avignon, France</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="102" to="107" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Grus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Tafjord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dasigi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schmitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1803.07640</idno>
		<title level="m">Allennlp: A deep semantic natural language processing platform</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Entity, Relation, and Event Extraction with Contextualized Span Representations</title>
		<author>
			<persName><forename type="first">D</forename><surname>Wadden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Wennberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP/IJCNLP</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
