<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Construction of UMLS Metathesaurus with Knowledge-Infused Deep Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hong</forename><forename type="middle">Yung</forename><surname>Yip</surname></persName>
							<email>hyip@email.sc.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Artificial Intelligence Institute</orgName>
								<orgName type="institution">University of South Carolina</orgName>
								<address>
									<settlement>Columbia</settlement>
									<region>SC</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vinh</forename><surname>Nguyen</surname></persName>
							<email>vinh.nguyen@nih.gov</email>
							<affiliation key="aff1">
								<orgName type="department">National Library of Medicine</orgName>
								<orgName type="institution">National Institute of Health</orgName>
								<address>
									<settlement>Bethesda</settlement>
									<region>MD</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Olivier</forename><surname>Bodenreider</surname></persName>
							<email>obodenreider@mail.nih.gov</email>
							<affiliation key="aff1">
								<orgName type="department">National Library of Medicine</orgName>
								<orgName type="institution">National Institute of Health</orgName>
								<address>
									<settlement>Bethesda</settlement>
									<region>MD</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Construction of UMLS Metathesaurus with Knowledge-Infused Deep Learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C731611B1005DE704023772FA35D2A6C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T09:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Unified Medical Language System</term>
					<term>Semantic Similarity</term>
					<term>Deep Learning</term>
					<term>Contextualized Knowledge Graph</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Unified Medical Language System (UMLS) is a Metathesaurus of biomedical vocabularies developed to integrate a variety of ways the same concepts are expressed by different terminologies and to provide cross-walk among them. However, the current process of constructing and inserting new resources to the existing Metathesaurus relies heavily on lexical knowledge, semantic pre-processing, and manual audits by human editors. This project explores the use of supervised Deep Learning approach to identify synonymy and non-synonymy among English UMLS concepts at the atom level. We use a Siamese network with Long Short-Term Memory and Convolutional Neural Network models to learn the similarities and dissimilarities between pairs of atoms from the active subset of 2019AA UMLS. To disambiguate concepts with lexically identical atoms, we contextualize the pairs with various enrichment strategies that reflect the information available to the UMLS editors including the source synonymy, hierarchical context, and source semantic group. Learning from base lexical features of the atoms yields an overall F1-score of 75.97%. Infusing source synonymy to the base yields a higher precision and overall F-1 score of 86.54% and 87.63% respectively. Whereas, infusing hierarchical context trades precision for higher recall of 90.38%. Infusing source synonymy, hierarchical context, and semantic group provides an overall increase in accuracy to 95.20%. However, infusing source synonymy of hierarchical context does not yield any noticeable improvement. A knowledge-infused learning approach provides a good performance indicating promising potential for emulating the current building process. Future works include evaluation with rule-based normalization approach of constructing the Metathesaurus and investigation of the applicability, maintenance, and scalability of these models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The Unified Medical Language System (UMLS) is a rich repository of biomedical vocabularies developed by the US National Library of Medicine. It is an effort to overcome challenges to effective retrieval of machine-readable information. One of which is the variety of ways the same concepts are expressed by different terminologies <ref type="bibr" target="#b0">[1]</ref>. For example, the concept of "Addison's Disease" is expressed as "Primary hypoadrenalism" in the Medical Dictionary for Regulatory Activities (MedDRA) and as "Primary adrenocortical insufficiency" in the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10). The lack of integration between these synonymous terms often leads to poor interoperability between information systems (i.e. how does one map a concept from one terminology to another) and confusion among health professionals. Hence, the UMLS aims to integrate and provide cross-walk among various terminologies as well as facilitate the creation of more effective and interoperable biomedical information systems and services, including electronic health records <ref type="foot" target="#foot_0">3</ref> . Till date, it is increasingly being used in areas such as patient care coordination, clinical coding, information retrieval, and data mining. There are three components to the UMLS Knowledge Sources: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon and Lexical Tools.</p><p>The Metathesaurus is a vocabulary database organized by concept or meaning. It is built from the electronic versions of various thesauri, code sets, classifications, and lists of controlled terms used in biomedical, clinical, and health services, known as "terminologies" or interchangeably as "source vocabularies". It connects alternative names (i.e. name variants) that are considered to be synonymous under the same concept and identifies useful relationships between various concepts <ref type="bibr" target="#b0">[1]</ref>. Concepts are assigned at least one Semantic Type from the Semantic Network to provide semantic categorization. The Lexical Tools provide lexical information for language processing such as identifying string variants and providing normalization as normalized string indexes to the Metathesaurus. As of May 6, 2019, the 2019AA release of the UMLS Metathesaurus contains approximately 3.85 million biomedical and health-related concepts and 14.6 million concept names from 210 source vocabularies including the National Center for Biotechnology Information (NCBI) taxonomy, Systematized Nomenclature of Medicine -Clinical Terms (SNOMED CT), Gene Ontology, the Medical Subject Headings (MeSH), and OMIM<ref type="foot" target="#foot_1">4</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Construction of the UMLS Metathesaurus</head><p>The current approach of building the Metathesaurus relies on the use of lexical knowledge, semantic pre-processing, and UMLS human editors. The core idea is that synonymous terms originating from different source vocabularies are clustered into a concept with a preferred term and a Concept Unique Identifier (CUI). The basic building block of the Metathesaurus, also known as an "atom", is a concept string from each of the source vocabularies. Simply put, each occurrence of a string in each source vocabulary is assigned a unique atom identifier (AUI). When a lexically identical string appears in multiple source vocabularies for example "Headache" appearing in both MeSH and ICD-10, they are assigned different AUIs. These AUIs are then linked to a single string identifier (SUI) to represent occurrences of the same string. Each SUI is linked to all of its English lexical variants (detected using the Lexical Variant Generator tool) by a common term identifier (LUI). These LUIs may subsequently be linked to more than one CUI due to strings that are lexical variants of each other have different meanings. Table <ref type="table" target="#tab_0">1</ref> illustrates how synonymous terms are clustered into a CUI. In addition, some source vocabularies provide source synonyms, hierarchical and non-hierarchical relationships as well as metadata information for semantic pre-processing. The UMLS human editors are involved to associate concepts and perform manual reviews <ref type="bibr" target="#b0">[1]</ref>. These processes of constructing and inserting new resources to the existing Metathesaurus from identifying lexical variants to manual audits by domain experts can be both arduous and time-consuming given the current size of Metathesaurus comprises of over 3.85 million concepts. Given the recent successes of supervised Deep Learning (DL) approaches in their applications to the medical and healthcare domains <ref type="bibr" target="#b1">[2]</ref>, we hypothesize that these DL models can be trained to emulate the current building process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Supervised Deep Learning</head><p>Supervised DL is a learning function that maps an input to an output based on examples of input-output pairs through layers of dense networks <ref type="bibr" target="#b2">[3]</ref>. The Metathesaurus comprises of approximately 10 million English atoms with each assigned a CUI. One can simply train a supervised classifier to predict which CUI should be labeled to a "new" atom (since atoms having the same CUI are synonymous) as an approach to insert new resources to the current Metathesaurus. However, this approach is considered as an extreme classification task <ref type="bibr" target="#b3">[4]</ref> due to the huge prediction space of 3.85 million CUIs. Nonetheless, the CUI is merely a "mechanism" to cluster synonymous terms under the same "bucket". We are primarily interested in whether two atoms are synonymous and hence be labeled with the same CUI regardless of whether this CUI has already existed in the Metathesaurus. Hence, this project is modeled as a similarity task where we want to assess similarity based not only on the lexical features of an atom but also based on its context (represented by the lexical features of neighboring concepts in this source vocabulary). Concretely, a fully-trained model should identify and learn scenarios where 1. Atoms that are lexically similar in nature but are not synonymous, e.g., "Lung disease and disorder" versus "Head disease and disorder" 2. Atoms that are lexically dissimilar but are synonymous, e.g., "Addison's disease" versus "Primary adrenal deficiency" Similarity assessment between words and sentences, also known as Semantic Text Similarity (STS) task is an active research area in Natural Language Processing (NLP) due to its crucial role in various downstream tasks such as information retrieval, machine translation, and in our case, synonyms clustering. The STS task can be expressed as follows: given two sentences, a system returns a probability score of 0 to 1 indicating the degree of similarity. STS is a challenging task due to the inherent complexity in language expressions, word ambiguities, and variable sentence lengths. Traditional approach relies on hand-engineering lexical features (e.g. word overlap and subwords <ref type="bibr" target="#b4">[5]</ref>, syntactic relationship <ref type="bibr" target="#b5">[6]</ref>, structural representations <ref type="bibr" target="#b6">[7]</ref>), linguistic resources (e.g. corpora), bag-of-words and term frequency-inverse document frequency (TF-IDF) models that incorporate a variety of similarity measures <ref type="bibr" target="#b7">[8]</ref> for example string-based <ref type="bibr" target="#b8">[9]</ref> and termbased <ref type="bibr" target="#b9">[10]</ref>. However, most are syntactically and semantically constrained. Recent successes in STS <ref type="bibr" target="#b10">[11]</ref> in predicting sentence similarity and relatedness have been obtained by using corpus-based <ref type="bibr" target="#b11">[12]</ref> and knowledge-based similarity, e.g. word embedding for feature representation <ref type="bibr" target="#b12">[13]</ref> with supervised DL approaches, e.g. Siamese Network with Recurrent Neural Network (RNN) <ref type="bibr" target="#b13">[14]</ref> and Convolutional Neural Networks (CNN) <ref type="bibr" target="#b14">[15]</ref> to perform deep analysis of words and sentences to learn the necessary semantics and structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Siamese Recurrent Architecture</head><p>Contrary to the traditional neural network which takes in one input at a time, the Siamese network is an architecture that takes in a pair of inputs and learns representations based on the explicit similarity and dissimilarity information (i.e. the pair of similar and dissimilar inputs) <ref type="bibr" target="#b15">[16]</ref>. It was originally used for signature verification <ref type="bibr" target="#b15">[16]</ref> and has since been applied to various applications such as face verification <ref type="bibr" target="#b16">[17]</ref>, unsupervised acoustic modeling <ref type="bibr" target="#b17">[18]</ref>, and learning semantic entailment <ref type="bibr" target="#b13">[14]</ref> as well as text similarity <ref type="bibr" target="#b18">[19]</ref>. A series of DL models can be incorporated within the Siamese architecture. RNN is a type of DL model that excels at processing sequential information due to the presence of memory cell to store and "remember" data read over time <ref type="bibr" target="#b19">[20]</ref>. Another variant of RNN is the Long Short-Term Memory (LSTM). It enhances the standard RNN to handle longterm dependencies and to minimize the inherent vanishing gradient problem of RNN with the introduction of "gates" (input, output and forget gates) to control the flow of and retain information better through time. It is more accurate in handling long sequences, however, it comes at the cost of higher memory consumption and slower training times compared to standard RNN which is faster but less accurate. Nonetheless, a combination of Siamese network with RNN and LSTM have been applied to various NLP tasks including similarity assessment with great success <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22]</ref>. On the other hand, CNN (another type of DL model) has also performed well in NLP due to its ability to extract distinctive features at a higher granularity <ref type="bibr" target="#b22">[23]</ref>. A Siamese CNN model learns sentence embedding and predicts sentence similarity with features from various convolution and pooling operations <ref type="bibr" target="#b23">[24]</ref>.</p><p>In this paper, we explore the use of DL, specifically the Siamese recurrent architecture with a combination of LSTM and CNN for the following contributions:</p><p>1. Identify synonymy and non-synonymy among English UMLS concepts at the atom level (i.e. given two English atoms, are they synonymous and thus belong to the same CUI?) 2. Investigate whether the DL approach could emulate the current Metathesaurus building process</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methodology</head><p>The scope of this project can be divided into four components: (i) retrieving and parsing the UMLS dataset, (ii) generating features for learning, (iii) designing the Siamese architecture, and (iv) evaluating the Siamese network with different data enrichment strategies (i.e., infusing various knowledge provided by the source vocabularies). The UMLS dataset used in this study can be retrieved with a UMLS license at https://www.nlm.nih.gov/research/umls/ licensedcontent/umlsknowledgesources.html.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Dataset</head><p>We use the active subset of the 2019AA UMLS and remove the derivative, duplicative, and spelling variants sources. The final dataset consists of 9,533,853 atoms grouped into 3,793,516 CUIs. Table <ref type="table" target="#tab_1">2</ref> shows the sources removed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Feature Engineering</head><p>The goal is to learn the similarities between pairs of atoms within a CUI and dissimilarities between pairs of atoms from different CUIs. Prior to generating the positive and negative pairs, we preprocess the lexical features of the atoms similar to how <ref type="bibr" target="#b24">[25]</ref> preprocess their dataset (remove all punctuation except hyphen, lowercase, and tokenize by space) to ensure conformity as we leverage their pre-trained BioWordVec embedding in our downstream network (Section 2.4).</p><p>Synonyms. We generate positive pairs based on CUI-asserted synonymy between atoms. Table <ref type="table" target="#tab_2">3</ref> shows examples of positive pairs generated from one CUI. Non-Synonyms. On the contrary, it is computationally infeasible, time and space complexities wise, to generate all the negative pairs, which is approximately 9.5 million atoms squared since it is one atom against all other atoms from non-related CUIs. In addition, the class imbalance between positive and negative will induce learning bias in which the model will suffer from lower precision in detecting synonyms due to a higher preference towards non-synonyms. Intuitively, we want the DL model to learn interesting negative pairs that are lexically similar but differ in semantics. Hence, we adopt a heuristic approach to reduce the sample space where we compute Jaccard index between atoms to include only negative pairs with high Jaccard similarity from different CUIs (with a cut-off threshold of 0.6 Jaccard index) (Table <ref type="table" target="#tab_3">4</ref>). The pairs are then sorted from the highest to lowest Jaccard index and the number of inclusion pairs is shown in Table <ref type="table" target="#tab_4">5</ref>. The final dataset consists of pairs of strings sampled in a 1:1, 3:1, 4:1, 6:1, and 10:1 ratio of between-CUI (negative) pairs to within-CUI (positive) pairs. These ratios are adopted from <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref> for Siamese networks.</p><formula xml:id="formula_0">JaccardIndex(A, B) = |A ∩ B| |A ∪ B| = |A ∩ B| |A| + |B| − |A ∩ B| (1)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Experiments</head><p>The entry point of our experiment is the lexical features of an atom. However, in order to disambiguate concepts with lexically identical atoms, e.g. the concept "nail" with CUI "C0222001" and "C0021885" shown in Figure <ref type="figure">1</ref>, there is a need to contextualize the two different "nail" concepts (denoted by two distinct CUIs)   Base. The base consists of only the lexical features of an atom for all synonym (positive) and non-synonym (negative) pairs.</p><p>Source synonymy. Some source vocabularies provide synonyms to the atoms which enrich the original atom with additional lexical features that are synonymous. We generate these source synonyms based on the Source Concept Unique Identifier (SCUI) of each atom.</p><p>Hierarchical context. Some source vocabularies provide hierarchical relationships (ancestor-descendant or parent-child or broader-narrow relations) which extend the original atom with surrounding contexts. We generate the hierarchical context using the unique lexical features of immediate (1-level) parents and children based on the source relations.</p><p>Semantic group. The semantic group provides an additional layer of high-level semantic categorization to an atom. Figure <ref type="figure">1</ref> shows the two concepts "nail" are syntactically similar but they differ in semantics in which one refers to "anatomy" and another refers to the "devices". We assign semantic group based on the second-level concept from the root node of the original atom as a proxy to semantic categorization. For source vocabularies that do not provide hierarchical relationships, we assign a semantic group to the best knowledge of the human editors to the source of these atoms. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Siamese Models</head><p>Two different Siamese Models are designed: the Siamese LSTM and Siamese CNN-LSTM.</p><p>Siamese LSTM. This model adopts the Siamese structure from <ref type="bibr" target="#b13">[14]</ref> (Figure <ref type="figure" target="#fig_0">2</ref>). A pair of atoms are first transformed into their respective numerical word representations, i.e. embedding of word vectors. A word embedding is a language modeling and feature learning techniques in NLP where words are mapped to vectors of real numbers with varying dimensions. These word vectors are positioned in the vector space in a manner where words that share similar contexts in the corpus are situated close to one another in the space <ref type="bibr" target="#b25">[26]</ref>. Instead of training the word vectors from scratch, we leverage the pre-trained biomedical word embedding (BioWordVec-intrinsic) with dimension size of 200 per word vector that is trained on PubMed text corpus and MeSH data <ref type="bibr" target="#b24">[25]</ref>. The rationale is to "precondition" the Siamese network with prior knowledge of the inherent similarity between words in the UMLS vocabulary. Upon plotting a word length distribution, approximately 97% of atoms in the UMLS have a word length of lesser or equal to 30. Hence, we apply padding or truncation to restrict the word length of each atom to a maximum of length 30 to ensure a uniformity in dimension to speed up the training process. The embedding of the pair of atoms are fed to LST M A and LST M B which each processes one of the atoms in the given pair and consists of 50 hidden learning units. These units learn the specific semantic and syntactic features based on word orders of each individual atoms through time. The output of the model is a Manhattan distance similarity func-</p><formula xml:id="formula_1">tion, exp(−||LST M A − LST M B || 1 ) ∈ [0, 1]</formula><p>, a function that is well-suited for high dimensional space <ref type="bibr" target="#b26">[27]</ref>. We apply this model to Experiment 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lung disease and disorder</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BioWordVec Embedding</head><p>Head disease and disorder Siamese CNN-LSTM. We use this model for Experiment 2, 3, 4, and 5 to infuse the additional knowledge and features: source synonymy, hierarchical context, and semantic group information. This model adopts the Siamese structure from <ref type="bibr" target="#b27">[28]</ref> (Figure <ref type="figure" target="#fig_1">3</ref>). It differs from the first architecture in its hidden learning layers. For this model, instead of having only one embedding from the lexical features of the atoms, we concatenate two extra vectors learned from the embedding that represents the extra context information to the original atom vector.</p><p>To generate the "context bag", we extract 60 unique lexical features from source synonyms and/or hierarchical context to enrich the base features of an atom and sort them in alphabetical order to minimize word order randomness as the word order is less prioritized prior to transforming them into a context embedding. We apply one layer of CNN with 100 filters and a window size of 5 <ref type="bibr" target="#b27">[28]</ref> with batch normalization (to reduce overfitting) to extract and generate an intermediary representation and subsequently apply a layer of LSTM with 50 hidden learning units to learn these features. Similarly, the semantic group information is "infused" by transforming it using BioWordVec embedding and subsequently feeding it to a layer of LSTM with 50 hidden units. The outputs of each LSTM layer (base, context, and semantic group) are averaged over time and these three 50-dimensional vectors are concatenated and used as input to a 2-layer dense Fully Connected (FC) network with learning units of 128 and 50 respectively and Manhattan distance similarity function,</p><formula xml:id="formula_2">exp(−||F C A − F C B || 1 ) ∈ [0, 1],</formula><p>as the final output layer. The parameters of both models are optimized using the Adam method <ref type="bibr" target="#b28">[29]</ref>.  Each experiment (Experiment 1, 2, 3, 4, 5) is trained against five various proportions (1:1, 3:1, 4:1, 6:1, and 10:1 ratio) of negative to positve pairs independently for 20 epochs and validated with 5-fold cross-validation with Biowulf Cluster from the National Institute of Health (NIH) High-Performance Computing (HPC) Systems using a mix of Nvidia Tesla P100 and V100 graphical processing unit. A set of experiments are conducted prior on a small data set (training and validation size of 100,000 and 20,000 respectively) to gauge the performance and desired capabilities of the models as well as to fine-tune the hyper-parameters with different incremental range (e.g. learning rate with a range of 0.0005 to 0.001, batch size with a range from 128 to 512). Table <ref type="table" target="#tab_8">7</ref> summarizes the final set of parameters and hyper-parameters that are used for Siamese LSTM (baseline experiment 1) and Siamese CNN-LSTM (enriched experiment 2, 3, 4, and 5) respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results and Evaluations</head><p>We evaluate the performance of the models in terms of validation accuracy, precision, recall, overall F1-Score, specificity, sensitivity, and false-positive rate. Out of all the various proportions of negative to positive pairs, the 6:1 ratio achieves the best performance in terms of validation accuracy in identifying and classifying synonyms and non-synonyms. Table <ref type="table" target="#tab_9">8</ref> shows the full performance metrics achieved by the 6:1 ratio of negative to positive pairs and Table <ref type="table" target="#tab_10">9</ref> shows various examples of true positives and true negatives correctly identified, false positives identified, and false negatives not identified by experiment 5.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discussion</head><p>Based on Table <ref type="table" target="#tab_9">8</ref>, we observe that using only the lexical features of atom yields an overall F1-score of 75.97%. Infusing source synonymy to the base yields a higher precision and overall F-1 score of 86.54% and 87.63% respectively. Whereas, infusing hierarchical context trades precision for higher recall of 90.38%. Infusing source synonymy, hierarchical context, and the semantic group gives an overall boost to the accuracy of 95.20%. However, infusing source synonymy of hierarchical context does not yield any noticeable improvement. Some of the plausible explanations are synonyms provided by the source are closely related and they are alternative variants to the base atom, hence the higher precision. Whereas, hierarchical contexts or parents and children relationships represent broader and narrower relations that encompass a wider variety of lexical features to the base atom, hence the higher recall. However, extending the hierarchical context to include the source synonymy of the parents and children atoms may be overstretched from the original semantics of the base atom and the model may perceive them as noise.</p><p>Based on Table <ref type="table" target="#tab_10">9</ref>, we observe the performance of the trained Siamese model from Experiment 5 on real-scenario examples. With the incorporation of LSTM, the model is able to handle both short and long sequences as well as learn the positional variants of the atoms, e.g. "injury of salivary gland" versus "salivary gland injury". Combining with CNN, the model is able to extract and learn pairs that are lexically similar in nature but are not synonymous, e.g., "product containing only iron medicinal product" versus "product containing only levorphanol medicinal product" and vice versa, atoms that are lexically dissimilar but are synonymous, e.g., "avulsion" versus "fracture sprain". Nonetheless, for words that are closely related to each other semantically such as "wrist" and "knee", and "wound" and "cyst", the model fails to recognize them as non-synonyms.</p><p>In addition, the model fails to identify synonyms with lexical features that are rare such as "pyelotomy" which indicates that there is still room for fine-tuning the model e.g. expanding the capability of the current architecture to learn from more examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>In conclusion, this study demonstrates the feasibility of using DL to identify synonymy and non-synonymy among atoms with relatively good performance indicating a promising potential for emulating the current Metathesaurus building process. In addition, a knowledge-infused DL approach leveraging multiple streams of knowledge provides the necessary contextualization to disambiguate lexically identical features and achieves an overall higher performance compared to vanilla DL approach. Future works include (a) evaluations with the manual rule-based normalization process of constructing the Metathesaurus since the current evaluations are done within the scope of DL, i.e. evaluating whether infusing additional knowledge (features) provide better performance, but not between the traditional and automatic building process, and (b) investigation of the scalability, maintenance, and applicability aspects of these models to complement the current lexical processing and the UMLS human editors.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>useFig. 2 .</head><label>2</label><figDesc>Fig. 2. The Siamese LSTM Model. Both left and right branch of the model share the same weights of all the layers.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. The Siamese CNN-LSTM Model. Similarly, both left and right branch of the model share the same weights of all the layers.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Metathesaurus AUI, SUI, LUI, and CUI</figDesc><table><row><cell>String (Source)</cell><cell>AUI</cell><cell>SUI</cell><cell>LUI</cell><cell>CUI</cell></row><row><cell>Headache (MeSH)</cell><cell>A0066000</cell><cell>S0046854</cell><cell></cell><cell></cell></row><row><cell>Headache (ICD-10)</cell><cell>A0065992</cell><cell></cell><cell>L0018681</cell><cell>C0018681</cell></row><row><cell>Headaches (MedDRA)</cell><cell>A0066007</cell><cell>S0046855</cell><cell></cell><cell></cell></row><row><cell>Headaches (OMIM)</cell><cell>A12003304</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Cephalodynia (MeSH)</cell><cell>A0540936</cell><cell>S0475647</cell><cell>L0380797</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Sources Removed</figDesc><table><row><cell>Sources Removed</cell><cell>Sources</cell></row><row><cell></cell><cell>NCI_BRIDG, NCI_BioC, NCI_CDC, NCI_CDISC,</cell></row><row><cell></cell><cell>NCI_CDISC-GLOSS, NCI_CPTAC, NCI_CRCH,</cell></row><row><cell></cell><cell>NCI_CTCAE, NCI_CTCAE_3, NCI_CTCAE_5,</cell></row><row><cell></cell><cell>NCI_CTEP-SDC, NCI_CTRP, NCI_CareLex,</cell></row><row><cell>Derivative and Duplicative</cell><cell>NCI_DCP, NCI_DICOM, NCI_DTP, NCI_EDQM-HC, NCI_FDA, NCI_GAIA, NCI_GENC, NCI_ICH, NCI_INC, NCI_JAX, NCI_KEGG, NCI_NCI-GLOSS,</cell></row><row><cell></cell><cell>NCI_NCI-HGNC, NCI_NCI-HL7, NCI_NCPDP,</cell></row><row><cell></cell><cell>NCI_NICHD, NCI_PI-RADS, NCI_PID, NCI_RENI,</cell></row><row><cell></cell><cell>NCI_UCUM, NCI_ZFin, HCDT, HCPT,</cell></row><row><cell></cell><cell>ICPC2P, LCH_NW</cell></row><row><cell>Spelling Variants</cell><cell>ICD10AE, ICD10AMAE, MTHICPC2EAE, MTHICPC2ICD10AE</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Positive Pairs from a Single CUI</figDesc><table><row><cell>CUI</cell><cell>Atom</cell></row><row><cell></cell><cell>Addison disease</cell></row><row><cell>C0001403</cell><cell>Primary hypoadrenalism Primary adrenocortical insufficiency</cell></row><row><cell></cell><cell>Addison's disease (disorder)</cell></row><row><cell cols="2">Positive Pairs</cell></row><row><cell>Addison disease</cell><cell>Primary hypoadrenalism</cell></row><row><cell>Addison disease</cell><cell>Primary adrenocortical insufficiency</cell></row><row><cell>Addison disease</cell><cell>Addison's disease (disorder)</cell></row><row><cell>Primary hypoadrenalism</cell><cell>Primary adrenocortical insufficiency</cell></row><row><cell>Primary hypoadrenalism</cell><cell>Addison's disease (disorder)</cell></row><row><cell>Primary adrenocortical insufficiency</cell><cell>Addison's disease (disorder)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Jaccard Computation on a Pair of Atom from Different CUIs</figDesc><table><row><cell>C0000473</cell><cell>C0038784</cell></row><row><cell>Product containing para-aminobenzoic acid</cell><cell>Product containing sulfuric acid</cell></row><row><cell cols="2">Jaccard Index = Intersection (3)/ Union (5) = 0.6</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Final Dataset Sizewith additional features/ knowledge that indicate different meanings. Hence, we compose the experiments (Table6) with different data enrichment strategies i.e. infusing various knowledge that reflect the information available to the UMLS editors during manual construction of the Metathesaurus including the source synonymy, hierarchical context, and source semantic group.</figDesc><table><row><cell>Feature</cell><cell>Number of Pairs</cell></row><row><cell>Synonyms</cell><cell>15,647,133</cell></row><row><cell cols="2">Ratio of between-CUI non-synonym pairs to within-CUI synonym pairs</cell></row><row><cell>1:1</cell><cell>15,647,133</cell></row><row><cell>3:1</cell><cell>46,941,399</cell></row><row><cell>4:1</cell><cell>62,588,532</cell></row><row><cell>6:1</cell><cell>93,882,798</cell></row><row><cell>10:1</cell><cell>156,471,330</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Five Experimental Setup</figDesc><table><row><cell>Experiment</cell><cell>Features</cell></row><row><cell>1</cell><cell>Base Atom Lexical Features</cell></row><row><cell>2</cell><cell>Base Atom Lexical Features + Source Synonymy</cell></row><row><cell></cell><cell>Base Atom Lexical Features</cell></row><row><cell>3</cell><cell>+ Hierarchical Context</cell></row><row><cell></cell><cell>+ Semantic Group</cell></row><row><cell></cell><cell>Base Atom Lexical Features</cell></row><row><cell>4</cell><cell>+ Source Synonymy + Hierarchical Context</cell></row><row><cell></cell><cell>+ Semantic Group</cell></row><row><cell></cell><cell>Base Atom Lexical Features</cell></row><row><cell></cell><cell>+ Source Synonymy</cell></row><row><cell>5</cell><cell>+ Hierarchical Context</cell></row><row><cell></cell><cell>+ Hierarchical Source Synonymy</cell></row><row><cell></cell><cell>+ Semantic Group</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 7 .</head><label>7</label><figDesc>The Set of Parameters used for Siamese LSTM and Siamese CNN-LSTM respectively.</figDesc><table><row><cell>Parameters/ Hyperparameters</cell><cell>Siamese LSTM</cell><cell>Siamese CNN-LSTM</cell></row><row><cell>Framework</cell><cell cols="2">Keras 2.0 with Tensorflow backend</cell></row><row><cell>Word Vector Size</cell><cell></cell><cell>200</cell></row><row><cell>Maximum Input Length</cell><cell></cell><cell>30</cell></row><row><cell>Maximum Context Input Length</cell><cell>-</cell><cell>60</cell></row><row><cell>Embedding</cell><cell></cell><cell>BioWordVec</cell></row><row><cell>LSTM Hidden Units</cell><cell></cell><cell>50</cell></row><row><cell>LSTM Activation</cell><cell></cell><cell>Tanh</cell></row><row><cell>CNN Filters</cell><cell>-</cell><cell>100</cell></row><row><cell>CNN Window Size</cell><cell>-</cell><cell>5</cell></row><row><cell>CNN Activation</cell><cell>-</cell><cell>ReLU with batch normalization</cell></row><row><cell>Fully Connected Layer 1</cell><cell>-</cell><cell>128 units with ReLU activation</cell></row><row><cell>Fully Connected Layer 2</cell><cell>-</cell><cell>50 units with ReLU activation</cell></row><row><cell>Weights and Biases</cell><cell cols="2">Random Initialization</cell></row><row><cell>Optimizer</cell><cell></cell><cell>Adam</cell></row><row><cell>Learning Rate</cell><cell></cell><cell>0.001</cell></row><row><cell>Loss Function</cell><cell cols="2">Mean Squared Error (MSE)</cell></row><row><cell>Batch Size</cell><cell></cell><cell>128</cell></row><row><cell>Number of Training Epochs</cell><cell></cell><cell>20</cell></row><row><cell>Validation</cell><cell cols="2">5-fold cross-validation</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 8 .</head><label>8</label><figDesc>Performance of the 6:1 Ratio of Negative to Positive Pairs</figDesc><table><row><cell>Model/</cell><cell>Siamese LSTM</cell><cell></cell><cell cols="2">Siamese CNN-LSTM</cell><cell></cell></row><row><cell>Performance Metrics</cell><cell>Exp. 1</cell><cell>Exp. 2</cell><cell>Exp. 3</cell><cell>Exp. 4</cell><cell>Exp. 5</cell></row><row><cell></cell><cell>Base</cell><cell>Base + SS</cell><cell>Base + HC + SG</cell><cell>Base + SS + HC + SG</cell><cell>Base + SS + HC + HSS + SG</cell></row><row><cell>Accuracy</cell><cell>0.93333</cell><cell>0.8720</cell><cell>0.9486</cell><cell>0.9520</cell><cell>0.9541</cell></row><row><cell>Precision</cell><cell>0.7828</cell><cell>0.8654</cell><cell>0.7643</cell><cell>0.8296</cell><cell>0.8009</cell></row><row><cell>Recall</cell><cell>0.7379</cell><cell>0.8874</cell><cell>0.8381</cell><cell>0.9038</cell><cell>0.8978</cell></row><row><cell>F1-Score</cell><cell>0.7597</cell><cell>0.8763</cell><cell>0.7995</cell><cell>0.8428</cell><cell>0.8466</cell></row><row><cell>Specificity</cell><cell>0.9659</cell><cell>0.8560</cell><cell>0.9640</cell><cell>0.9601</cell><cell>0.9633</cell></row><row><cell>Sensitivity</cell><cell>0.7379</cell><cell>0.8874</cell><cell>0.8381</cell><cell>0.9038</cell><cell>0.8978</cell></row><row><cell>False Positive Rate</cell><cell>0.0341</cell><cell>0.1440</cell><cell>0.0360</cell><cell>0.0399</cell><cell>0.0367</cell></row><row><cell cols="6">Exp.: Experiment, SS: Source Synonymy, HC: Hierarchical Context, SG: Semantic Group,</cell></row><row><cell cols="2">HSS: Hierarchical Source Synonymy</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head>Table 9 .</head><label>9</label><figDesc>Examples of True Positives, True Negatives, False Positives, and False Negatives from Experiment 5</figDesc><table><row><cell cols="2">True Positives (Synonyms) Correctly Identified</cell></row><row><cell>nail clipper</cell><cell>cutters nail</cell></row><row><cell>injury of salivary gland</cell><cell>salivary gland injury</cell></row><row><cell>avulsion</cell><cell>fracture sprain</cell></row><row><cell cols="2">True Negatives (Non-synonyms) Correctly Identified</cell></row><row><cell>fingernail</cell><cell>infection of fingernail</cell></row><row><cell>product containing only iron</cell><cell>product containing only levorphanol</cell></row><row><cell>medicinal product</cell><cell>medicinal product</cell></row><row><cell>medical and surgical gastrointestinal system</cell><cell>medical and surgical gastrointestinal system</cell></row><row><cell>insertion ileum via natural or artificial</cell><cell>revision stomach via natural or artificial</cell></row><row><cell>opening endoscopic infusion device</cell><cell>opening endoscopic other device</cell></row><row><cell cols="2">False Positives (Non-synonyms) Identified</cell></row><row><cell>finding of wrist joint</cell><cell>finding of knee joint</cell></row><row><cell>malignant neoplasm of upper limb</cell><cell>malignant neoplasm of muscle of upper limb</cell></row><row><cell>skin wound of axillary fold</cell><cell>skin cyst of axillary fold</cell></row><row><cell cols="2">False Negatives (Synonyms) Not Identified</cell></row><row><cell>hla antigen</cell><cell>human leukocyte antigen</cell></row><row><cell>pyelotomy</cell><cell>incision of renal pelvis treatment</cell></row><row><cell>routine cervical smear</cell><cell>screening for malignant neoplasm of cervix</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://www.nlm.nih.gov/research/umls/index.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/ release/notes.html</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Acknowledgment</head><p>This work was supported by the Intramural Research Program of the NIH, National Library of Medicine. This research was also supported in part by an appointment to the National Library of Medicine Research Participation Program. This program is administered by the Oak Ridge Institute for Science and Education through an inter-agency agreement between the U.S. Department of Energy and the National Library of Medicine.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The Unified Medical Language System (UMLS): integrating biomedical terminology</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="267D" to="270" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A guide to deep learning in healthcare</title>
		<author>
			<persName><forename type="first">A</forename><surname>Esteva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robicquet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ramsundar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kuleshov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Depristo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thrun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Medicine</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="24" to="29" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Russell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Norvig</surname></persName>
		</author>
		<title level="m">Artificial intelligence: a modern approach</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dembczyński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kloft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Varma</surname></persName>
		</author>
		<title level="m">Extreme Classification</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A denotational and distributional approach to semantics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hockenmaier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Workshop on Semantic Evaluation</title>
				<meeting>the 8th International Workshop on Semantic Evaluation<address><addrLine>SemEval</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="329" to="334" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ecnu: One stone two birds: Ensemble of heterogeneous measures for semantic relatedness and textual entailment</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Workshop on Semantic Evaluation</title>
				<meeting>the 8th International Workshop on Semantic Evaluation<address><addrLine>SemEval</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="271" to="277" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning semantic textual similarity with structural representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Severyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nicosia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moschitti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Short Papers</title>
		<meeting>the 51st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="714" to="718" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A survey of text similarity approaches</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Gomaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Fahmy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Applications</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="issue">13</biblScope>
			<biblScope unit="page" from="13" to="18" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Approximate string matching</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">R</forename><surname>Dowling</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM computing surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="381" to="402" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Term-weighting approaches in automatic text retrieval</title>
		<author>
			<persName><forename type="first">G</forename><surname>Salton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Buckley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information processing &amp; management</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="513" to="523" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A solution to Plato&apos;s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Landauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Dumais</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological review</title>
		<imprint>
			<biblScope unit="volume">104</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">211</biblScope>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Corpus-based and knowledge-based measures of text semantic similarity</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mihalcea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Corley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Strapparava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AAAI</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="775" to="780" />
			<date type="published" when="2006">2006. 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Siamese recurrent architectures for learning sentence similarity</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thyagarajan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirtieth AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Multi-perspective sentence similarity modeling with convolutional neural networks</title>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1576" to="1586" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Signature verification using a &quot;siamese&quot; time delay neural network</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bromley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Säckinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="737" to="744" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Learning a similarity metric discriminatively, with application to face verification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chopra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CVPR</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="539" to="546" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A temporal coherence loss function for learning unsupervised acoustic embeddings</title>
		<author>
			<persName><forename type="first">G</forename><surname>Synnaeve</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dupoux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="page" from="95" to="100" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning text similarity with siamese recurrent networks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Neculoiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Versteegh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rotaru</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Representation Learning for NLP</title>
				<meeting>the 1st Workshop on Representation Learning for NLP</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="148" to="157" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity</title>
		<author>
			<persName><forename type="first">B</forename><surname>Rychalska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Pakulska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chodorowska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Walczak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Andruszkiewicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</title>
				<meeting>the 10th International Workshop on Semantic Evaluation (SemEval-2016)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="602" to="608" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">LSTM: A search space odyssey</title>
		<author>
			<persName><forename type="first">K</forename><surname>Greff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Koutník</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Steunebrink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on neural networks and learning systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="2222" to="2232" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Improved semantic representations from tree-structured long short-term memory networks</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1503.00075</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Convolutional neural networks for sentence classification</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1408.5882</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Multi-perspective sentence similarity modeling with convolutional neural networks</title>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1576" to="1586" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">BioWordVec, improving biomedical word embeddings with subword information and MeSH</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific data</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">52</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">On the surprising behavior of distance metrics in high dimensional space</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hinneburg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Keim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on database theory</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="420" to="434" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Predicting the Semantic Textual Similarity with Siamese CNN and LSTM</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Pontes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Linhares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Torres-Moreno</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.10641</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
