<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fazlourrahman</forename><surname>Balouchzahi</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Center for Computing Research</orgName>
								<orgName type="institution">Instituto Politécnico Nacional</orgName>
								<address>
									<settlement>CDMX</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Grigori</forename><surname>Sidorov</surname></persName>
							<email>sidorov@cic.ipn.mx</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Computing Research</orgName>
								<orgName type="institution">Instituto Politécnico Nacional</orgName>
								<address>
									<settlement>CDMX</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hosahalli</forename><forename type="middle">Lakshmaiah</forename><surname>Shashirekha</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<postCode>574199</postCode>
									<settlement>Mangalore</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3673B072B4A3E7CA8DFFB3769F57F83C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Profession</term>
					<term>Medical Documents</term>
					<term>NER</term>
					<term>BERT</term>
					<term>Flair</term>
					<term>Embeddings</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Technological developments in healthcare industry are generating lots of electronic health records as well as text data which is usually referred as medical text data. Processing medical text data in unstructured form is not only challenging but also has lot of applications. Named entity recognition, the task of extracting named entities and classifying them into predefined categories is an important preprocessing step in the NLP pipeline. Extracting named entities from medical text is very useful for many applications and at the same time very challenging because of the characteristics of medical text data. Considering the gravity of medical text processing, in this paper, we (Team MUCIC) describe the models submitted to "MEDical DOocu-ments PROFessions recognition" (MED-DOPROF), a first shared task consisting of three Tracks, namely: Track 1: MEDDOPROF-NER, Track 2: MEDDOPROF-CLASS, and Track 3: MEDDOPROF-NORM, in Spanish language. We participated in Track 1 and 2 and proposed two models based on fine-tuning BERT embeddings using i) BertForTokenClassification from transformers and ii) Flair framework, for the automatic detection of Occupations and Professions in medical text. The model using BertForTokenClassification reported micro F1 scores of 0.629 and 0.598 for Track 1 and 2 respectively, while the Flair framework model reported micro F1 scores of 0.8 and 0.764 for Track 1 and 2 respectively. Further, the Flair framework model for MEDDOPROF-NER track became one among the best models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The recent updates on medical and health-care information systems are generating large amount of Electronic Health Records (EHRs) <ref type="bibr" target="#b0">[1]</ref> as well as text data in medical domain. Despite the popularity of existing systems to manage EHRs there is a massive amount of unstructured medical text data that are required to be transformed into a more structured format for further processing <ref type="bibr" target="#b1">[2]</ref>. Medical text processing or text analytics is one of the exciting areas of research in NLP world that deals with various applications like Text Classification (TC) (classification of medical records, classification of medical news articles), Text Summarization (automatic generation of summaries from medical news articles, summarization of clinical in-formation), Hypothesis Generation and Knowledge Discovery and so on.</p><p>One of the most popular text processing applications is Named Entity Recognition (NER), which is used to automatically recognize and classify Named Entities (NEs) <ref type="bibr" target="#b2">[3]</ref> representing names of persons, and organizations, locations and so on from a given natural language text. NER is a crucial step in NLP pipeline as performance of the NER module decides the performance of subsequent modules <ref type="bibr" target="#b3">[4]</ref> and NER systems also act as a preprocessing step for tasks like Relation Extraction <ref type="bibr" target="#b4">[5]</ref>. Medical NER which deals with extracting medical NEs such as disease names, symptoms, medical conditions, medications, medical professions, employment status, etc., from medical texts is challenging due to specialized terminology, huge number of alternate spellings, and multi-word NEs. Even though a variety of works have been explored for processing medical texts in diverse aspects, very few works are reported in the literature on processing texts related to medical profession and employment status in general and in particular to identify and classify the NEs describing medical occupations in medical documents.</p><p>To address the challenges of identifying and classifying the NEs describing medical occupations and employment status in Spanish medical documents, in this paper, we (Team MUCIC) describe the models submitted to two Tracks of MEDical Documents PROFessions recognition (MEDDOPROF) <ref type="bibr" target="#b5">[6]</ref> task. MED-DOPROF is a first shared task of its kind that consists of three sub-tracks and the description of Tracks 1 and 2 (the ones in which we participated) are briefly given below:</p><p>-Track 1 -MEDDOPROF-NER: includes identifying the portion of texts that mentions an occupation and classifying them into one of three predefined categories, namely: PROFESION (PROFESSION), SITUACION LAB-ORAL (WORKING STATUS) or ACTIVIDAD (ACTIVIDAD). -Track 2 -MEDDOPROF-CLASS: includes automatically finding the beginning and end of occupation mentions and classifying them into one of the further categories, namely: PACIENTE (Patient), FAMILIAR (Family member), SANITARIO (Health professional) or OTRO (Other).</p><p>Based on the description of Tracks and categories, Track1 and 2 can be modeled as an NER task of identifying the NEs (tokens) which could be either a single word or multi-word and then classifying/labeling them into one of the pre-defined categories according to the Tracks. Of late transformer based models are achieving state-of-the art results for many NLP tasks compared to various Machine Learning (ML) and Deep Learning (DL) models. To explore transformers <ref type="bibr" target="#b17">[18]</ref>, we have proposed two models based on fine tuning Bidirectional Encoder Representations from Transformers (BERT) <ref type="bibr" target="#b6">[7]</ref> embeddings using i) BertForTo-kenClassification class from transformer library and ii) Flair framework, for the task of automatic detection of occupations and professions in Spanish medical texts for Track 1 and 2 of MEDDOPROF.</p><p>BERT as a language representation model employs bidirectional representations from text to pre-train both left and right context. It also can be fine-tuned for downstream tasks such as NER and TC, only by adding a specific output layer <ref type="bibr" target="#b7">[8]</ref>. The difference between BERT and Embeddings from Language Models (ELMo) <ref type="bibr" target="#b8">[9]</ref> that uses pre-trained language models is that ELMo uses the language model as additional features whereas BERT enables fine-tuning all parameters of pre-trained language model to make it task-specific on downstream task <ref type="bibr" target="#b6">[7]</ref>.</p><p>Flair framework provides a standard model for training along with hyperparameter selection and unified interface that reduce complexity of using various embeddings and enables researchers to mix the embeddings effectively. It also offers various embeddings that are publically available in HuggingFace <ref type="bibr" target="#b18">[19]</ref>. In the current work Flair is used with BERT embeddings <ref type="bibr" target="#b9">[10]</ref>. Generative Pretrained Transformer (OpenAI GPT) is another architecture that allows finetuning. However, it is limited in unidirectional representation whereas BERT utilize bidirectional representation which effectively overcomes the restrictions of OpenAi GPT's architecture <ref type="bibr" target="#b6">[7]</ref>.</p><p>The rest of paper is organized as follows: Section 2 gives an overview of the works carried out in the related area and Section 3 describes the proposed methodology. While Section 4 presents Experiments and Results, Section 5 gives the conclusion and throws light on future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>ML classifiers have reported reasonable and competitive performance for various TC applications such as NER, Sentiment Analysis, Opinion Mining, etc. However, these days' Neural Network (NN) based systems are commonly used for various TC applications in various domains including medical domain. Some recent adventures in medical text processing are described below:</p><p>Yepes et al. <ref type="bibr" target="#b10">[11]</ref> developed a NN based system for the identification of medical NEs from Twitter posts. The authors used 148 million tweets collected to generate a CBOW word embeddings that is used as weights in model construction. Two LSTMs are used to construct a sequence to sequence model where first LSTM acts as an encoder to encode the texts to vectors and second LSTM as main classification model to classify the tokens. The proposed model on Micromed <ref type="bibr" target="#b19">[20]</ref> dataset containing 1300 tweets obtained F1 scores of 0.665, 0.682, and 0.718 on diseases, pharmacological substances and symptoms entities respectively.</p><p>Li et al. <ref type="bibr" target="#b11">[12]</ref> presented a NN based model for medical NER in Chinese texts. The authors used character level and word level embeddings to capture orthographic and lexicosemantic features along with POS tags as word information features. A Chinese medical corpus containing 12,498 records is used and 1739 records out of them were manually annotated into two categories, namely, subject and lesion where symptoms related to body are considered as subject and lesion refers to the pathological changes of the subjects. The dataset is transformed into BIESO NE representation where B, I, E, and O indicates the beginning, inside, end, and outside of the entity respectively and S illustrates that the entity consists of only a single word. RNN, LSTM, GRU, BLSTM and BGRU are experimented with various configurations and feature combinations. Among all, BGRU without employing any embeddings and only POS tags features, had the best performance with 90.36 and 90.48% F1 scores for subject and lesion detection tasks respectively. Feature engineering step is one of the important steps in any NLP task as it aims to improve the performance of the system. Weegar et al. <ref type="bibr" target="#b12">[13]</ref> explored the impact of simple feature engineering in NER systems for medical texts of three languages namely, English, Swedish, and Spanish. The authors examined some basic features including POS and semantic tags along with prefixes, window size, and capitalization. Averaged structured perceptron algorithm is used with SemEval-2014 Task 7 Analysis of Clinical Text Shared Task dataset containing 9,694 disease NEs for English, EHRs consisting of patient records developed by Oronoz et al. <ref type="bibr" target="#b13">[14]</ref> containing 3,362 instances of diseases and 1,406 drugs entities as Spanish dataset and a dataset containing 4,000 entities corresponding to body parts, disorders and findings from over 500 different clinical units at Karolinska University Hospital for Swedish released by Dalianis et al. <ref type="bibr" target="#b14">[15]</ref>. The observation of the results illustrates that in many cases simple but neglected features can significantly enhance the performance of the systems. Their best performing systems which obtained F1 scores of 66.40 and 68.41, and 68.22 for English, Swedish, and Spanish languages respectively used specialized medical dictionaries. Sometimes instead of working on features and model construction, proposing new representation for the data might be more efficient. In one of such studies Hamada et al. <ref type="bibr" target="#b3">[4]</ref> proposed FROBES Segment Representation (SR) model which is an exten-sion of IOBES model, when NEs are multi-word in nature. In the proposed FROBES model F, R, O, B, and E represents front, rear, outside, begin, and end respectively and S represents a single word. FROBES is an extension of IOBES where tag I in IOBES is replaced with F and R when entity has more than two words. As it considers both halves of entities, first half are annotated with B and F and second half contains R and E. The proposed SR scheme is evaluated using BiLSTM as baseline model on two datasets, namely: i2b2/VA 2010 challenge dataset and JNLPBA 2004 shared task dataset and the results reported by the authors illustrate that using FROBES improved the performance slightly. However, ensembling the baseline models with different SR models, namely: IOB2, IOBES, and FROBES outperformed the baseline models with F1 scores of 71.99 and 83.62 on the same datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methodology</head><p>The two proposed models based on fine-tuning BERT embeddings using i) BertForTokenClas-sification from transformers and ii) Flair framework, designed and evaluated for the Tracks 1 and 2 of MEDDOPROF are described in this section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Transformation</head><p>The datasets provided by the organizers of MEDDOPROF shared task for the sub-Tracks are in Brat standoff annotation format. As per this format, for each text file there is a corresponding annotation file consisting of an annotation ID, a label, and beginning and ending offset for each NE which could be of a single word or multi-word. More details of Brat standoff format can be found on its website <ref type="bibr" target="#b20">[21]</ref>. As the data in CONLL IOB <ref type="bibr">[22]</ref> format is easy to handle, the given data in Brat standoff annotation format is transformed to CONLL format with IOB representation using brat to conll.py <ref type="bibr" target="#b21">[23]</ref> module. IOB representation assigns the tags I and O for the tokens that are inside and outside the NE respectively and assigns the tag B for the first word of the NE <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4]</ref>. A snapshot of data in Brat format and CONLL (IOB) format is shown in Figure <ref type="figure" target="#fig_0">1</ref>. As the data transformed into CONLL IOB format is used to train the classifier models, the predictions of the models will also be in CONLL IOB format. This requires a post processing step to re-transform the predictions in CONLL IOB format back to Brat standoff annotation format to generate .ann files as output as required by the organizers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Models</head><p>The main component of the proposed models is BETO <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b22">24]</ref> which is a Spanish BERT language model trained on a large amount of Spanish unannotated corpora <ref type="bibr" target="#b23">[25]</ref>. In this work, we have used bert-base-spanish-wwm-cased <ref type="bibr" target="#b24">[26]</ref> model which is more efficient for NER tasks as capitalization play a major role in identifying NEs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BertForTokenClassification using Transformer:</head><p>The first step of this model is to fine tune the BERT model on downstream task using transformers library. Using the data in CONLL IOB format, the finetuned models are further trained for Track 1 and 2 of the shared tasks. For each test dataset, the models generate tagged sequences sentence-wise in IOB annotation format which will be converted back to Brat standoff annotation format manually.</p><p>As BERT based models require to be fed with sequences of same length, the maximum length of sequences is set into 510 and the shorter sequences are padded to this length. However, an attention mask is employed to avoid distracting models with padded elements. Similar to Keras <ref type="bibr" target="#b26">[28]</ref>, BERT support attention masks that are used to allow the model focus on main part of the sequence ignoring padded elements. In other words, mask is typically used for attention when a batch has varying length of sentences. Therefore, it takes real tokens for training by assigning 1 to in sequence tokens and 0 for out of sequence. After assembling training data and corresponding masks using PyTorch <ref type="bibr" target="#b25">[27]</ref>, BETO will be initialized using BertForTokenClassification class from transformers library which adds a token level predictor on BERT model.</p><p>Setting the optimizer to AdamW the models have been trained for 50 epochs. Figure <ref type="figure">2</ref> represents training and validation loss where validation set is 10% of training set. Overview of the model based on BERT using transformer library is shown in Figure <ref type="figure">3</ref>.  Flair with BERT Embedding: Flair is a PyTorch based NLP tool that provides a model training framework in which various embeddings and language models can be used individually or in combination and fine-tuned for downstream tasks with special support for Medical domain data <ref type="bibr" target="#b9">[10]</ref>. However, to compare the performance of this model with that of BertForTokenClassification model, bert-base-spanish-wwm-cased model is used and fine-tuned using Sequence Tagger from Flair which is BiLSTM based backend. It is also possible to use CRF on top of the model, but it is not used in this work. As Flair requires the training data in CONLL format, data from Brat standoff annotation format is transformed to CONLL IOB format as described in Section 3.1 and is loaded using ColumnCorpus class from Flair library. A summary of the layers used in this model is given in our Github page <ref type="bibr">[29]</ref>. Parameters of the proposed model are set as given in Table <ref type="table" target="#tab_0">1</ref> and an overview of proposed Flair model is presented in Figure <ref type="figure" target="#fig_2">4</ref>. The main requirement of any task is an annotated dataset for training the models. MEDDOPROF corpus provided by the organizers contains 1844 clinical cases with more than 20 specialties annotated manually by clinical and linguistics experts following strict guidelines. Each clinical case is stored as a single text file along with a corresponding Brat standoff annotation file. The description of the dataset is available in the task website <ref type="bibr" target="#b27">[30]</ref> and the descriptions of the labels for both the Tracks are given in Table <ref type="table" target="#tab_1">2</ref>.</p><p>Evaluating models' performance is the most important task. As per the submission guidelines <ref type="bibr" target="#b5">[6]</ref>, the predictions for each test file should be in Brat standoff annotation format, i.e., the annotation file should have an extension .ann and should consist of an annotation ID, label, correct beginning and ending offset for each predicted NE, in one line, similar to the annotation file given in the training set. However, the value of annotation ID is generated at random as it does not have any influence on the prediction. Annotation files are generated for each file in the test set and submitted to the task organizer for evaluation. The performance of the models is evaluated in terms of Micro average Precision, Recall and F1-score.</p><p>Organizers provided the results obtained by a simple lookup system and the annotations from the training data as baseline. Therefore baseline's results and performances of the proposed models reported by the organizer for both the Tracks with Micro average scores are shown in Table <ref type="table" target="#tab_2">3</ref>. The results illustrate that the pro-posed models obtained quite good performances for both the Tracks. Further, the results also illustrate that the proposed models performed better for MEDDOPROF-NER task. In addition, the model using Flair framework and BERT embeddings outperformed the other proposed model and became one of the best performing models in the shared task. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Snapshot of data in Brat standoff and CONLL IOB format</figDesc><graphic coords="5,137.61,370.32,340.13,123.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .Fig. 3 .</head><label>23</label><figDesc>Fig. 2. Training and Validation loss while fine-tuning BERT</figDesc><graphic coords="6,165.95,461.06,283.46,154.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Overview of proposed Flair mode</figDesc><graphic coords="7,137.60,552.11,340.16,88.15" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Parameters in Flair model Parameter Max len Hidden size Learning rate Mini batch size Epochs</figDesc><table><row><cell>Value</cell><cell>512</cell><cell>256</cell><cell>5.0e-6</cell><cell>4</cell><cell>10</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Labels description in MEDDOPROF-NER and MEDDOPROF-CLASS</figDesc><table><row><cell>Track</cell><cell>Labels</cell><cell>Token Description</cell></row><row><cell cols="3">MEDDOPROF-NER PROFESION Indicates a profession</cell></row><row><cell></cell><cell>SITUACION LABORAL</cell><cell>Indicates an employment status</cell></row><row><cell></cell><cell cols="2">ACTIVIDAD Indicates an activity</cell></row><row><cell cols="3">MEDDOPROF-CLASS PACIENTE Token is related to the patient</cell></row><row><cell></cell><cell cols="2">FAMILIAR Token is related to a family member</cell></row><row><cell></cell><cell cols="2">SANITARIO Token is related to a health professional</cell></row><row><cell></cell><cell>OTROS</cell><cell>Token is related to someone else</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Performances of proposed models (Micro average)Medical text processing is one of more exciting as well as vital task in NLP. Considering its importance MEDDOPROF has called for a shared task with three Tracks and we participated in two of them, namely: MEDDOPROF-NER and MEDDOPROF-CLASS for the automatic detection of occupations and profession in Spanish medical texts. We (team MUCIC) proposed two models using BERT embed-dings, namely: BertForTokenClassification from transformers and Flair framework. The results illustrate that the models performed better in NER and Flair model outperformed the other model in both the Tracks and also obtained very good results with micro F1-scores of 0.8 and 0.764 for MEDDOPROF-NER and MEDDOPROF-CLASS respectively. Further, the Flair model for MEDDOPROF became one of the best per-forming models in the shared task. As future work it is planed for exploring the Language Understanding with Knowledge-based Embeddings (LUKE) model which is a new pre-trained contextualized representation of words and entities based on transformer. Improving the performances of system with modifications in NEs representations and also exploring various learning approaches for task of NER in medical texts are other plans of future work.</figDesc><table><row><cell>Subtask</cell><cell>Model</cell><cell cols="2">Precision Recall F1-score</cell></row><row><cell cols="2">MEDDOPROF-NER Baseline</cell><cell>0.465</cell><cell>0.508 0.486</cell></row><row><cell></cell><cell>BERT</cell><cell>0.809</cell><cell>0.515 0.629</cell></row><row><cell></cell><cell cols="2">Flair-BERT embeddings 0.813</cell><cell>0.788 0.8</cell></row><row><cell cols="2">MEDDOPROF-CLASS Baseline</cell><cell>0.391</cell><cell>0.377 0.384</cell></row><row><cell></cell><cell>BERT</cell><cell>0.77</cell><cell>0.488 0.598</cell></row><row><cell></cell><cell cols="2">Flair-BERT embeddings 0.77</cell><cell>0.75 0.764</cell></row><row><cell cols="2">5 Conclusion and Future Work</cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Acknowledgements</head><p>Team MUCIC deeply appreciates the organizers of MEDDOPROF shared task for their efforts, guidance and supports during the task and the anonymous reviewers for their valuable comments.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Improving NER for Clinical Texts by Ensemble Approach using Segment Representations</title>
		<author>
			<persName><forename type="first">H</forename><surname>Nayel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Shashirekha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th International Conference on Natural Language Processing</title>
				<meeting>the 14th International Conference on Natural Language Processing<address><addrLine>ICON-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-12">2017. 2017 Dec</date>
			<biblScope unit="page" from="197" to="204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Regular Expression Based Medical Text Clas-sification using Constructive Heuristic Approach</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Aickelin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ge</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="147892" to="147904" />
			<date type="published" when="2019-10-11">2019 Oct 11</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">PUNER-Parsi ULMFiT for Named-Entity Recognition in Persian Texts</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Balouchzahi Fazlourrahman</surname></persName>
		</author>
		<author>
			<persName><surname>Shashirekha</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>EasyChair</publisher>
			<biblScope unit="volume">4224</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Improving Multi-Word Entity Recog-nition for Biomedical Texts</title>
		<author>
			<persName><forename type="first">H</forename><surname>Nayel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Shashirekha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Shindo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsumoto</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.05691.2019</idno>
		<imprint>
			<date>Aug 15</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Comparative Study of Segment Representation for Biomedi-cal Named Entity Recognition</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Shashirekha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Nayel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Advances in Compu-ting, Communications and Informatics (ICACCI</title>
				<imprint>
			<date type="published" when="2016-09-21">2016. 2016 Sep 21</date>
			<biblScope unit="page" from="1046" to="1052" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">NLP applied to occupational health: MEDDO-PROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts</title>
		<author>
			<persName><forename type="first">Salvador</forename><surname>Lima-López</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eulàlia</forename><surname>Farré-Maduell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Antonio</forename><surname>Miranda-Escalada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vicent</forename><surname>Brivá-Iglesias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Krallinger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Bert: Pre-Training of Deep Bidirectional Trans-formers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805.2018</idno>
		<imprint>
			<date>Oct 11</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">A Clinical Trials Corpus Annotated with UMLS Entities to Enhance the Access to Evidence-Based Medicine. BMC medical informatics and decision making</title>
		<author>
			<persName><forename type="first">L</forename><surname>Campillos-Llanos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Valverde-Mateos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Capllonch-Carrión</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moreno-Sandoval</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021-12">2021 Dec</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1802.05365.2018</idno>
		<title level="m">Deep Con-textualized Word Representations</title>
				<imprint>
			<date>Feb 15</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">FLAIR: An Easy-To-Use Framework for State-Of-The-Art NLP</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akbik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bergmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blythe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rasul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schweter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vollgraf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations</title>
				<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations</meeting>
		<imprint>
			<date type="published" when="2019-06">2019 Jun</date>
			<biblScope unit="page" from="54" to="59" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">NER for Medical Entities in Twitter using Sequence to Sequence Neural Networks</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Yepes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mackinlay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Australasian Language Technology Association Workshop</title>
				<meeting>the Australasian Language Technology Association Workshop</meeting>
		<imprint>
			<date type="published" when="2016-12">2016. 2016 Dec</date>
			<biblScope unit="page" from="138" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">WCP-RNN: a Novel RNN-based Approach for Bio-NER in Chinese EMRs. The journal of supercomputing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020-03">2020 Mar</date>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="page" from="1450" to="1467" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The Impact of Sim-ple Feature Engineering in Multilingual Medical NER</title>
		<author>
			<persName><forename type="first">R</forename><surname>Weegar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Casillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>De Ilarraza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Oronoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gojenola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP</title>
				<meeting>the Clinical Natural Language Processing Workshop (ClinicalNLP</meeting>
		<imprint>
			<date type="published" when="2016-12">2016 Dec</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names</title>
		<author>
			<persName><forename type="first">M</forename><surname>Oronoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Casillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gojenola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Perez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Iberoamerican Congress on Pattern Recognition</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013-11-20">2013 Nov 20</date>
			<biblScope unit="page" from="536" to="543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">HEALTH BANK-A Work-bench for Data Science Applications in Healthcare</title>
		<author>
			<persName><forename type="first">H</forename><surname>Dalianis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Henriksson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weegar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CAiSE Industry Track</title>
		<imprint>
			<biblScope unit="volume">1381</biblScope>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2015-06-11">2015 Jun 11</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Spanish Pre-trained BERT Model and Evaluation data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Canete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chaperon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fuentes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PML4DC at ICLR</title>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dredze</forename><forename type="middle">M</forename><surname>Beto</surname></persName>
		</author>
		<author>
			<persName><surname>Bentz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.09077.2019</idno>
		<title level="m">becas: The Surprising Cross-lingual Effectiveness of BERT</title>
				<imprint>
			<date>Apr 19</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<ptr target="https://huggingface.co/transformers/modeldoc/bert.htmlbertfortokenclassification" />
		<title level="m">BertForTokenClassification</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="https://huggingface.co/" />
		<title level="m">Hugging Face homepage</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<ptr target="https://github.com/IBMMRL/medinfo2015" />
		<title level="m">MedInfo</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="https://brat.nlplab.org/standoff.html22.://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning(tagging" />
		<title level="m">brat standoff format homepage</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title/>
		<author>
			<persName><surname>Neuroner</surname></persName>
		</author>
		<ptr target="https://github.com/Franck-Dernoncourt/NeuroNER" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><surname>Beto</surname></persName>
		</author>
		<ptr target="https://github.com/dccuchile/beto" />
		<title level="m">Spanish BERT</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="https://github.com/josecannete/spanish-corpora" />
		<title level="m">Spanish Unannotated Corpora</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Spanish</forename><surname>Bert</surname></persName>
		</author>
		<ptr target="https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<ptr target="https://pytorch.org/" />
		<title level="m">PyTorch homepage</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title/>
		<author>
			<persName><surname>Keras Homepage</surname></persName>
		</author>
		<ptr target="https://keras.io/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<ptr target="https://temu.bsc.es/meddoprof/data/" />
		<title level="m">MEDDOPROF homepage</title>
				<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
