<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">UH-MAJA-KD at eHealth-KD Challenge 2019 Deep Learning Models for Knowledge Discovery in Spanish eHealth Documents</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jorge</forename><forename type="middle">Mederos</forename><surname>Alvarado</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Havana</orgName>
								<address>
									<country key="CU">Cuba</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ernesto</forename><forename type="middle">Quevedo</forename><surname>Caballero</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Havana</orgName>
								<address>
									<country key="CU">Cuba</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alejandro</forename><surname>Rodríguez Pérez</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Havana</orgName>
								<address>
									<country key="CU">Cuba</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rocío</forename><forename type="middle">Cruz</forename><surname>Linares</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Havana</orgName>
								<address>
									<country key="CU">Cuba</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">UH-MAJA-KD at eHealth-KD Challenge 2019 Deep Learning Models for Knowledge Discovery in Spanish eHealth Documents</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">91BFDC5554551F123508CA7368097919</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T16:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>eHealth</term>
					<term>Knowledge discovery</term>
					<term>Keyphrase extraction</term>
					<term>Keyphrase classification</term>
					<term>Relationships extraction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the solution presented by the UH-MAJA-KD team in IberLEF eHealth-KD 2019: eHealth Knowledge Discovery challenge. Separate strategies were developed to solve substasks A and B, both based on deep learning models using domain-specific word embeddings, and architectures using Bidirectional Long-Short Term Memory (BiLSTM) cells. In the case of Subtask A, Conditional Random Field was used to produce an output in BMEWO-V tag system to extract keyphrases. For Subtask B, two stacked BiLSTM layers are used along with Shortest Dependency Path in-between a pair of keyphrases to determine possible relationships between them.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the health domain, the large number of research and publications every year makes nearly impossible for doctors and biomedical researchers to keep up to date with the literature in their fields. Thus, finding ways to effectively manage the vast amounts of information and extract knowledge from it is really important nowadays. This could help in the task of obtaining new and better scientific results or in the diagnosis of complex diseases. Due to all of these reasons, a high interest around the scientific community has aroused in developing systems to automatically extract knowledge from medical texts.</p><p>There is an increasing amount of efforts oriented towards this direction. One of them is the IberLEF eHealth-KD 2019: eHealth Knowledge Discovery challenge <ref type="bibr" target="#b7">[8]</ref>, in which context this paper was developed. The goal of this challenge was the discovery of knowledge in medical texts, via the extraction and classification of keyphrases, as well as the determination of semantic relationships between pairs of keyphrases. The challenge was divided into two subtasks: A and B, one for keyphrase extraction and classification, and the other oriented to the extraction of semantic relationships.</p><p>This paper describes the solution presented by the UH-MAJA-KD team in IberLEF eHealth-KD 2019: eHealth Knowledge Discovery challenge. It proposes a strategy using a hybrid model that combines a Bidirectional Long Short Memory (BiLSTM) layer with a Conditional Random Field (CRF) layer for Subtask A. This model is inspired on the model presented by UCM team <ref type="bibr" target="#b9">[10]</ref> in the past edition of the challenge; in addition, domain-specific word embeddings are used. For Subtask B a multiclass classifier is proposed, taking as input a sequence of features vectors of the tokens in the Shortest Dependency Path between pairs of keyphrases.</p><p>The rest of the paper is organized as follows. In section 2 is given a brief overview of word embeddings, and the particular one used along the rest of the paper. Sections 3 and 4 describe specifically the approach to solve Subtasks A and B respectively. Then, the results of the models proposed are presented in section 5, and finally, brief conclusions and future work lines are presented in section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Word embeddings</head><p>Word embeddings are a strategy to represent words as real numbers vectors on a reduced-dimension space. It is desired for these vectors to have the property of context similarity, this is, for words that appear commonly in the same context, their respective vectors must be close in the embedding space, under some distance measure. There are many methods to obtain such embeddings in literature, most of them based on probabilistic models and/or neural networks. Among most popular are found word2vec <ref type="bibr" target="#b4">[5]</ref>, fastText morphological representation <ref type="bibr" target="#b0">[1]</ref> and GloVe (Global Vectors for Word Representations) <ref type="bibr" target="#b6">[7]</ref>.</p><p>Regarding neural network-based word embeddings, the corpus used to train them is crucial in its performance, precisely because the corpus determines the words and contexts in which the words appear. Intuitively, domain-specific corpora should be better at showing contextual and semantic relations regarding that specific domain. Consequently, a corpus was built based on Spanish Wikipedia<ref type="foot" target="#foot_0">1</ref> , extracting medical content pages. The corpus size is of approximately 27 million words, with essentially medical content. To capture domainspecific semantic and contextual information, a word embedding was trained on this corpus. To do this, it was used the word2vec algorithm API offered by gensim <ref type="bibr" target="#b8">[9]</ref> python library, using the architecture CBOW (Continuous Bag of Words) <ref type="bibr" target="#b1">[2]</ref>. Embedding details are shown next:</p><p>. Embedding space dimensions: 300. . Windows size: 5. . Vocabulary size: approximately 500 thousand words. . Negative sample: 5</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Subtask A</head><p>The goal of Subtask A was to extract keyphrases from sentences and to classify them as Concept, Action, Reference or Predicate. The proposed solution splits this subtask into four more specific ones, each of those to extract and classify concepts, actions, references and predicates respectively. The defined architecture is the same in all the four cases, but each model is trained independently, using as training examples only those of its corresponding task (e.g the model that extracts and classifies keyphrases in Concept, only receives as input annotations of Concept keyphrases). This is done in order to improve specific weight learning for each type of keyphrase since they could be under different hypothesis functions, making difficult to the model learning 'good' weights for all of them together. Moreover, to process them united could lead to more ambiguity in the decoding process (which will be explained at 3.3), making more solutions unfeasible. Finally, all the keyphrases detected by all the four models are put together.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Model Input</head><p>The system receives as input a sentence string, thus it needs some preprocessing to build an appropriated input to the models. The first step is to tokenize the sentences as all model inputs expect a sequence of tokens.</p><p>For each token in which the sentence was split, the input for that token consist of a list of three feature vectors:</p><p>. Character encodings: Concatenation of one-hot encoded vectors of the characters contained in the word. . PoS-tag vector: One hot encoded vector of Part of Speech (PoS) information. . Word indexes: One hot encoded index in the word embedding vocabulary.</p><p>To obtain the first standard ASCII alphabet was used. To extract PoS-tag information the python library spacy<ref type="foot" target="#foot_2">2</ref> was used. In the case of the third input, some words are captured using regular expressions and substituted with special tokens defined in the word embedding vocabulary (e.g currencies, units of measurement and other words with digits or non-latin characters). In the case of words not appearing in the vocabulary, a special token 'unseen' was defined.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Model Architecture</head><p>Each of the four models used to solve the Subtask A receives a sequence of token inputs as described in 3.1, and produces a same sized sequence with labels for each token in the BMEWO-V tagging system which will be described in the section below.</p><p>The architecture is conformed by four main components:</p><p>. Word embedding matrix . Char embedding BiLSTM <ref type="bibr" target="#b2">[3]</ref> . Token-level BiLSTM . CRF classifier <ref type="bibr" target="#b3">[4]</ref> It is pipelined as follows. For each token in the input sequence, the pretrained word embedding layer produces an embedding vector using the word index input. The character embedding layer receives the sequence of character encodings contained in the word and produces a vector, capturing character level information for each word. These two vectors are concatenated with the PoS-tag vector information of the word, and all together serve as input to each time step of the token-level BiLSTM layer. Finally, the outputs of the BiLSTM layer are passed to a CRF layer.</p><p>A summary of the model is shown in Figure <ref type="figure" target="#fig_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Postprocessing</head><p>The CRF layer produces a sequence of tags in the BMEWO-V tagging system. This classification corresponds to B for begin of a keyphrase, M for medium, E for end, W for tokens that are a keyphrase themselves and O for tokens that do not represent anything. It also takes into account the possibility of keyphrases overlapping, including the tag V in such cases. For the sentence: El cáncer de pulmón causa muerte prematura, the model detecting Concept keyphrases should produce the output:</p><formula xml:id="formula_0">O-V-M-E-O-B-E.</formula><p>Since the expected output in Subtask A is a sequence of keyphrases for each sentence, a procedure is necessary to transform the BMEWO-V tag sequence got from a given sentence, in a keyphrase sequence corresponding to the output expected in Subtask A. This process was called decoding. There is an important challenge in this process: tokens belonging to a keyphrase are not necessarily continuous in the sentence. Taking this into account, the decoding process is divided into two stages. First, discontinuous keyphrases are detected and then, at a second moment, continuous keyphrases.</p><p>In accordance to Spanish correct use, The set of tag sequences that must be interpreted as a group of discontinuous keyphrases were reduced to those that match the regular expressions (V+)((M*EO*)+)(M*E) and ((BO)+)(B)(V+). The first one corresponds to keyphrases that share their initial tokens, and the second one to those that share their final tokens. These two capture most of the desired discontinuous keyphrases. Among the examples of the first case it is found the fragment cáncer de pulmón y de mama, tagged as V-M-E-O-M-E, where keyphrases cáncer de pulmón and cáncer de mama are found. And, as example of the latter, the fragment tejidos y órganos humanos, tagged as B-O-B-V, where keyphrases tejidos humanos and órganos humanos are found. When a match is detected and the keyphrases are extracted, all the tags in that fragment are set to tag O.</p><p>After the detection of possible discontinuous keyphrases, the second stage starts assuming all the remaining keyphrases appear as continuous sequences of tokens. To extract continuous keyphrases, an iterative process is carried on over the tag sequence produced by the model. Due to limitations in the BMEWO-V system, the procedure also assumes that the maximum overlapping depth is 2. Assuming otherwise only makes the process more ambiguous and does not capture much more information since is not common in Spanish to find examples with deeper overlapping. Given this, along with the procedure, two inconstruction keyphrases are maintained. In each iteration these two keyphrases are created, extended or emitted in accordance to rules defined considering only the previous and the current tag. Tag B indicates to start a new keyphrase, M the extension of an existent keyphrase and E its ending. Tag V introduces overlapping, hence this is the one that causes that there could be two in-construction keyphrases at a given moment. Tag W causes the current token to be reported automatically as a keyphrase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Subtask B</head><p>The goal of Subtask B was to detect semantic relationships between pairs of keyphrases. The solution proposed consists of traversing every pair of keyphrases and determine whether one of the defined semantic relationships is established between them or not, via a multiclass classifier. This is accomplished by building a dependency tree for the tokens in the sentence and finding the shortest path in-between the keyphrases along this tree. This is called Shortest Dependency Path <ref type="bibr" target="#b5">[6]</ref>. The model is agnostic to any restrictions defined on the relations domain (e.g it is not told in advance that for relation Subject, one of the keyphrases should be an Action), needing to learn it by itself.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Model Input</head><p>Similar to Subtask A models, this model expects a sequence of tokens. For each token in that sequence, the input for that token consists of a list of four feature vectors:</p><p>. Word indexes: One hot encoded index in the word embedding vocabulary. . Syntactic dependency relation vector: One hot encoded vector of syntactic dependency information. . BMEWO-V tag encoding: One hot encoded BMEWO-V tag.</p><p>. Subtask A type of keyphrase encoding: One hot classification on Concept, Action, Reference or Predicate of the keyphrase to which token belongs.</p><p>The word indexes are obtained as described in 3.1. To extract syntactic dependency information the python library spacy was used. The third and fourth inputs are obtained from Subtask A if they were pipelined as in the case of Scenario 1 in the challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Model Architecture</head><p>The architecture is conformed by three main components:</p><p>. Word embedding matrix . Stacked BiLSTMs . Two dense multiclass classifiers It is pipelined as follows. For each token in the input sequence, the pre-trained word embedding layer produces an embedding vector using the word index input. The embedding vector is then concatenated with the other three input vectors, and all together serve as input to each time step of the stacked BiLSTM layers. Finally, the last time step output of the stacked BiLSTM layers serves as input of two Dense layers serving as multiclass classifiers, one for each direction in which relationships could be established between the pair of keyphrases, since those are not symmetric.</p><p>A summary of the model is shown in Figure <ref type="figure" target="#fig_1">2</ref>. The evaluation in both subtasks was carried out using the annotated corpus proposed in the challenge. The results were measured with precision, recall and F1 in three scenarios as described in the details of IberLEF eHealth-KD 2019: eHealth Knowledge Discovery <ref type="bibr" target="#b7">[8]</ref>. Tables 1, 2 and 3 show the results obtained by participants in Scenarios 1,2 and 3 respectively. Scenario 2 measures the results in Subtask A and Scenario 3 only in Subtask B, whereas Scenario 1 combines both Subtask A and B.</p><p>As can be observed, the proposal for Subtask A had a competitive performance, being only 0.0047 points lower than the first place in F1 score. However, results on Subtask B are not as promising. The first place critically outperformed the model proposed for Subtask B.</p><p>In the case of Subtask A, the model showed faster convergence when training on both Action and Reference labels. This is probably because of the syntactic patterns they show, that are rapidly captured by the model.</p><p>It is worth to mention the evaluations that were made on the BMEWO-V decoder. It turned to be over 99% in both precision and recovery when evaluated on perfectly annotated labels. It showed, however, a non-linear decline in performance when evaluated on inaccurately-classified labels.</p><p>The set of parameters and the hyper-parameters used to test the models are the following: The number of epochs was selected empirically, based on the fast convergence of the models, tending quickly overfit on training dataset, even though validation data was used. The remaining parameters were selected as standard for similar applications in literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions and Future Work</head><p>In this work were described the models presented by the UH-MAJA-KD team for the IberLEF eHealth-KD 2019: eHealth Knowledge Discovery.</p><p>In Subtask A a hybrid BiLSTM and CRF model with specific domain pretrained word embeddings was proposed. Our model obtained the third place in the Scenario 2. In Subtask B a multiclass classifier using Shortest Dependency Path with pre-trained word embeddings in a specific domain was proposed. Our model obtained the sixth place in the Scenario 3. Our team reached the sixth position in the overall competition standing.</p><p>The corpus in which the domain-specific word embedding was trained is relatively small. It is proposed as future work to build a more expressive and abundant corpus to improve the word embedding performance. Also, could be promising to try to concatenate both domain-specific and general purpose word embeddings, in order to gain one's specificity and the generalization capability of the latter. To improve the capabilities of the system in the overall task, it could be convenient to train the system (l.e both models) as a whole, providing Subtask B with the output from Subtask A, needing the first to deal with the errors produced by the latter. us to use high-performance computational equipment to develop and test our ideas.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Subtask A model summary</figDesc><graphic coords="4,165.95,305.30,283.47,238.88" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Subtask B model summary</figDesc><graphic coords="7,165.95,115.84,283.47,238.71" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 3 .</head><label>3</label><figDesc>Scenario 3 results    </figDesc><table><row><cell>Scenario 3</cell><cell cols="2">F1 Precision Recall</cell></row><row><cell>TALP</cell><cell>0.6269</cell><cell>0.6667 0.5915</cell></row><row><cell>NLP UNED(lsi uned)</cell><cell>0.5337</cell><cell>0.6235 0.4665</cell></row><row><cell>VSP</cell><cell>0.4933</cell><cell>0.5892 0.4243</cell></row><row><cell>coin flipper (ncatala)</cell><cell>0.4931</cell><cell>0.7133 0.3768</cell></row><row><cell>IxaMed(iakesg)</cell><cell>0.4356</cell><cell>0.5195 0.3750</cell></row><row><cell>UH-MAJA-KD</cell><cell cols="2">0.4336 0.4306 0.4366</cell></row><row><cell cols="2">LASTUS-TALN (abravo) 0.2298</cell><cell>0.1705 0.3521</cell></row><row><cell>baseline</cell><cell>0.1231</cell><cell>0.4878 0.0704</cell></row><row><cell>Hulat-TaskAB</cell><cell>0.1231</cell><cell>0.4878 0.0704</cell></row><row><cell>Hulat-TaskA(jlcuad)</cell><cell>0.1231</cell><cell>0.4878 0.0704</cell></row><row><cell>lsi2 uned</cell><cell>0.1231</cell><cell>0.4878 0.0704</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">es.wikipedia.orgProceedings of the Iberian Languages Evaluation Forum (IberLEF</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2019" xml:id="foot_1">)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">spacy.ioProceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_3">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_4">https://www.cupet.cu/footer/informatica-automatica-y-comunicaciones/ Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We would like to acknowledge the joint project Tec-UH of Tecnomática 3 enterprise and the Artificial Intelligence Group at the University of Havana, to allow</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Enriching word vectors with subword information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="135" to="146" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Deep Learning with Keras</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gulli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pal</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Packt Publishing Ltd</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Conditional random fields: Probabilistic models for segmenting and labeling sequence data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A neural joint model for entity and relation extraction from biomedical text</title>
		<author>
			<persName><forename type="first">F</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ji</surname></persName>
		</author>
		<idno type="DOI">10.1186/s12859-017-1609-9</idno>
		<ptr target="https://doi.org/10.1186/s12859-017-1609-9" />
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">12</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Overview of the ehealth knowledge discovery challenge at iberlef</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piad-Morffis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Consuegra-Ayala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Estevez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Almeida-Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Muñoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montoyo</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Software framework for topic modelling with large corpora</title>
		<author>
			<persName><forename type="first">R</forename><surname>Rehurek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A hybrid bi-lstm-crf model for knowledge recognition from ehealth documents</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M R</forename><surname>Zavala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martınez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</title>
				<meeting>TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
