<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">HapLap at eHealth-KD Challenge 2020</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sergio</forename><surname>Santana</surname></persName>
							<email>ssantana005@ikasle.ehu.eus</email>
							<affiliation key="aff0">
								<orgName type="department">HiTZ Center -Ixa</orgName>
								<orgName type="institution">University of the Basque Country UPV/EHU</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20080</postCode>
									<settlement>Donostia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alicia</forename><surname>Pérez</surname></persName>
							<email>alicia.perez@ehu.eus</email>
							<affiliation key="aff0">
								<orgName type="department">HiTZ Center -Ixa</orgName>
								<orgName type="institution">University of the Basque Country UPV/EHU</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20080</postCode>
									<settlement>Donostia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arantza</forename><surname>Casillas</surname></persName>
							<email>arantza.casillas@ehu.eus</email>
							<affiliation key="aff0">
								<orgName type="department">HiTZ Center -Ixa</orgName>
								<orgName type="institution">University of the Basque Country UPV/EHU</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20080</postCode>
									<settlement>Donostia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">HapLap at eHealth-KD Challenge 2020</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">23729BAD149265B7E122FC68E538251B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Entity recognition</term>
					<term>Relation extraction</term>
					<term>Joint AB-LSTM neuronal network</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present the work carried out by the HapLap group in the in the subtask B of the eHealth-KD 2020 competition. Relation extraction was addressed with a pipeline system that makes use of a Joint AB-LSTM neuronal network together with a pre-process and a post-process phase. We obtained a result of 0.316 in Scenario 3.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>We present the work carried out by the HapLap group in the eHealth-KD 2020 task <ref type="bibr" target="#b0">[1]</ref>. In this third edition the purpose of the task is to automatically extract knowledge, represented by means of thirteen semantic relations, from Spanish electronic health documents. We have taken part in the optional subtask B: the input is a plain text with entity annotations in a BRAT file and the output is the previous BRAT file with both the entities and relations. To address this, we have implemented a pipeline system that makes use of a Joint AB-LSTM neuronal network together with a pre-process and a post-process phase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In the last years various competitions related to relation extraction have been emerging such as: Semeval 2018 task 7 <ref type="bibr" target="#b1">[2]</ref> to extract relations from scientific texts; eHealthKD 2018 <ref type="bibr" target="#b2">[3]</ref>, eHealthKD 2019 <ref type="bibr" target="#b3">[4]</ref> or BioNLP <ref type="bibr" target="#b4">[5]</ref> to extract and classify clinical relations from clinical texts. So the relation extraction problem is arousing interest in different areas and also in the clinical documentation area. Since the resurgence of neural networks, different approaches have been implemented for extracting clinical relations. DET-BLSTM system <ref type="bibr" target="#b5">[6]</ref> makes use of a Bi-LSTM network. In <ref type="bibr" target="#b6">[7]</ref> the authors presented a combination of two different networks gated recurrent unit (GRU) and convolutional neural network (CNN) to detect clinical relations. In <ref type="bibr" target="#b7">[8]</ref> a convolutiona neural network is also used to classify relations. In <ref type="bibr" target="#b8">[9]</ref> an Joint AB-LSTM neuronal network is used to extract adverse drug reaction relations. In this paper we present a Joint AB-LSTM neuronal, a modification of the work presented in <ref type="bibr" target="#b9">[10]</ref> network for the extraction of clinical relations in the context of eHealthKD 2020 competition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Materials and Methods</head><p>For this work we have divided the system into three phases: First the pre-process, where we adapt the data format to use with the Joint AB-LSTM. After that we have the training phase, where we train and evaluate the neural network and we get the prediction. And after getting the predictions we have the post-process, where we convert those predictions into the data format that is used in the competition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Pre-process</head><p>In the preprocess we do the following operations:</p><p>• Convert the input from the Brat standoff format to the format used in the eHeathKD 2019 challenge.</p><p>• Convert the data in the eHealthKD 2019 format into the format used by the Joint AB-LSTM.</p><p>• Create the NO_RELATION relations.</p><p>In the first part of the system we have pre-processed the input relations. We have converted the Brat Standoff input relation-format (also referred to as ann) to the format used in the previous eHealthKD 2019 competition by means of the ann2txt scripts ( https://github.com/ knowledge-learning/ehealthkd-2019/blob/master/scripts/ann2txt.py) provided there. Next, we needed to adapt it to what the Joint AB-LSTM requires. Three programs have been implemented for the pre-processing and their code has been posted on GitHub (https://github.com/Porobu/ HAPLAP-MAL). These three programs load the instances that are in the eHealthKD 2019 data format and they join them into a single file.</p><p>In an attempt to enable the neural network learn to discriminate between positive and negative relations (absence of relation), both types of instances should be provided in the inference stage. To this end, in the pre-processing an auxiliary relation class, NO_RELATION, was also created. A critical point, hence, is how to choose instances that contain pairs of entities that could be related and, thus, are candidate relations and label them as negative instances. Both the selection and the proportions might be crucial. We have used a simple way of choosing them, that only creates negative (NO_RELATION) relations between entity pairs that have at least one positive relation instance in the data set. To further reduce the negative relations, we have only created these between entity pairs in the same sentence.</p><p>At this stage we have a set of data with the candidates marked as either related or not-related. At this point a multi-class approach enables us to predict whether a candidate pair is related with some of the relation-classes available (including NO_RELATION). This was, indeed, our approach-1: a pair of entities that could be related (are a relation-candidate) are directly classified by means of the Joint AB-LSTM.</p><p>Needless to say, in the aforementioned sample negative instances substantially exceed the positive ones leading to skewed class distribution. In table <ref type="table" target="#tab_0">1</ref>  and negative relations in our training and development data sets. We have to remember that in our multi class classification approach (approach 1) the positive relation number contains all the thirteen classes, further skewing the data. Inference tends to be biased towards majority class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>we can see the number of positive</head><p>To cope with this we proposed to tackle the classification in two stages (our approach-2):</p><p>• In the first phase we have created the binary data set, and all the positive relations (target, causes...) have been grouped in the RELATION class. In this phase we filter all the negative relations, to reduce the imbalance.</p><p>• In the second phase we have now only the data set with the positive relations (arg, target, subject...), and we train the system to predict the relation.</p><p>Both approaches (and both phases in the second approach) were implemented by means of the Joint AB-LSTM approach. Further details are given in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Joint AB-LSTM network</head><p>After pre-processing the instances we load them into the Joint AB-LSTM neural network. The Joint AB-LSTM neural network has been inplemented by using Tensorflow. The network also does its own pre-processing. First all tokens are lower-cased.</p><p>The network employed word-embeddings as the main feature. For this work we have used pretrained embeddings from the clinical domain. The embeddings have been trained in corpora that consists of EHRs (electronic health records) that are not publicly available due to confidentiality issues. Other choices might have resulted more appropriate than ours since the amount and type of data employed has a big impact on the resulting embeddings. Apart from the wordembeddings, the network employs another powerful feature: the distance-embeddings. The distance is simply computed as the number of tokens between each annotated word in the sentence and the target word entity.</p><p>Having the relations completely pre-processed, the neural network is trained. This network combines two widely used neural networks in NLP: a Bi-LSTM with max pooling and an attentive Bi-LSTM. The Joint AB-LSTM is fed with the pre-processed sentences, their entities and relations between those, and the previously created distance embeddings.</p><p>We have optimised two hyper-parameters of the neural network, the dropout and learning rate to get the final model. We have trained the model with a mixture of the eHealthKD 2019 train+dev and the eHealthKD 2020 datasets, and we have used the eHealthKD 2020 dev dataset as validation. Note that this optimisation has been done over the so called multiclass dataset (approach 1), not over the binary dataset (approach 2). After doing the optimisation, we set 0.001 as the learning rate, and we used no dropout. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Postprocess</head><p>After getting the predictions from the neural network, we postprocess them to get the output relations in the Brat Standoff format, respecting the IDs if the gold entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>As described in section 3.1, we provided two different approaches. The results achieved with each of them are given in table <ref type="table" target="#tab_1">2</ref>. Approach 1 outperforms Approach 2 in terms of precision but with the recall occurs the opposite. Nevertheless, for both approaches the F1-measure has the same value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>Relation extraction was addressed with a neural approach, Joint AB-LSTM network. We applied two simple pre-processing approaches to get both positive and negative instances. This stage might result naive for the way in which the sampling was carried out and the proportions selected. We explored two pre-processing approaches: a straight one, approach 1, which just copes with multi-class problem; a filtered one (approach 2) that tried to get rid of negative candidates prior to the multi-class stage. None of them surpassed the other significantly. For future work, we should explore the embeddings provided to the network. Embeddings are the main source of knowledge in this stage with limited training sets and was proven significantly influential in related works.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Number of positive, negative and the total relations in the training and development data sets</figDesc><table><row><cell>Data set</cell><cell cols="3">Positive Relations Negative Relations Total</cell></row><row><cell>Training</cell><cell>8597</cell><cell>50812</cell><cell>59409</cell></row><row><cell>Development</cell><cell>1204</cell><cell>7144</cell><cell>8348</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Results on the eHealthKD 2020 dev dataset attained with Approach 1 (multi-class) and Approach 2 (working in two phases to filter binary relations).</figDesc><table><row><cell></cell><cell cols="2">Precision Recall</cell><cell>F1</cell></row><row><cell>Approach 1</cell><cell>0.336</cell><cell cols="2">0.298 0.316</cell></row><row><cell>Approach 2</cell><cell>0.328</cell><cell cols="2">0.306 0.316</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partially supported by the Spanish Ministry of Science and Technology PAD-MED (PID2019-106942RB-C31) and by the Basque Government (IXA IT-1343-19 and a Grant for the student Sergio Santana published in the 12/03/2020 BOPV).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the eHealth Knowledge Discovery Challenge at IberLEF</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piad-Morffis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cañizares-Diaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Estevez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Almeida-Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Muñoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montoyo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum co-located with 36th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2020</title>
				<meeting>the Iberian Languages Evaluation Forum co-located with 36th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2020<address><addrLine>Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-09">2020. September, 2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Semeval-2018 task 7: Semantic relation extraction and classification in scientific papers</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gábor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Buscaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-K</forename><surname>Schumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qasemizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zargayouna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Charnois</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="679" to="688" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Martínez</forename><surname>Cámara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">Almeida</forename><surname>Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Díaz Galiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Estévez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>García Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Vega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Montoyo</surname></persName>
		</author>
		<author>
			<persName><surname>Muñoz</surname></persName>
		</author>
		<title level="m">Overview of tass 2018: Opinions, health and emotions</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the ehealth knowledge discovery challenge at iberlef</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piad-Morffis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Consuegra-Ayala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Estevez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Almeida-Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Munoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montoyo</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum<address><addrLine>IberLEF</addrLine></address></meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2019">2019. 2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://www.aclweb.org/anthology/W19-5000" />
		<title level="m">Proceedings of the 18th BioNLP Workshop and Shared Task, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Demner-Fushman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><forename type="middle">B</forename><surname>Cohen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tsujii</surname></persName>
		</editor>
		<meeting>the 18th BioNLP Workshop and Shared Task, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Biomedical event extraction via long short term memory networks along dynamic extended tree</title>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Bioinformatics and Biomedicine (BIBM)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="739" to="742" />
		</imprint>
	</monogr>
	<note>IEEE International Conference on</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Convolutional gated recurrent units for medical relation classification</title>
		<author>
			<persName><forename type="first">B</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="646" to="650" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Joint classification of key-phrases and relations in electronic health documents</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Medina</forename><surname>Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Turmo</forename><surname>Borras</surname></persName>
		</author>
		<idno>CEUR-WS. org</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS 2018) co-located with 34nd SEPLN Conference (SEPLN 2018)</title>
				<meeting>TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS 2018) co-located with 34nd SEPLN Conference (SEPLN 2018)<address><addrLine>Sevilla, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-09-18">September 18th, 2018. 2018</date>
			<biblScope unit="page" from="83" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Exploring joint ab-lstm with embedded lemmas for adverse drug reaction discovery</title>
		<author>
			<persName><forename type="first">S</forename><surname>Santiso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Casillas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE journal of biomedical and health informatics</title>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Adverse drug reaction extraction on electronic health records written in spanish</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Santiso</forename><surname>González</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
