<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploring the Synergies between Biocuration and Ontology Alignment Automation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">David</forename><surname>Dearing</surname></persName>
							<email>ddearing@stottlerhenke.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Stottler Henke Associates, Inc</orgName>
								<address>
									<addrLine>1107 NE 45 th St, Suite 310</addrLine>
									<postCode>98105</postCode>
									<settlement>Seattle</settlement>
									<region>WA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Terrance</forename><surname>Goan</surname></persName>
							<email>goan@stottlerhenke.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Stottler Henke Associates, Inc</orgName>
								<address>
									<addrLine>1107 NE 45 th St, Suite 310</addrLine>
									<postCode>98105</postCode>
									<settlement>Seattle</settlement>
									<region>WA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploring the Synergies between Biocuration and Ontology Alignment Automation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">309ACF0B5D0BEAC166CFB90B869D0A2A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Ontology Matching Ensembles</term>
					<term>Word Embeddings</term>
					<term>Biocuration</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Researchers have long recognized the value trapped in natural language publications and have continued to advance the development of ontologies that can help unleash this value. Among these advances are efforts to apply NLP techniques to streamline the labor-intensive process of scientific literature curation, which encodes relevant information in a form that is accessible to both humans and computers. In this paper, we report on our initial efforts to improve ontology alignment within the context of scientific literature curation by exploiting value within a large corpus of annotated PubMed abstracts. We employ an ensemble learning approach to augment a collection of publicly available ontology matching systems with a matching technique that leverages the word embeddings learned from this corpus in order to more successfully match the concepts of two disease ontologies (MeSH and OMIM). Our experiments show that word embedding-based similarity scores do contribute value beyond traditional matching systems. Our results show that the performance of an ensemble trained on a small number of manually reviewed mappings is improved by their inclusion.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Technological advancements have given rise to an explosion in the rate that biomedical data is generated. The incredible volume of data now far exceeds the ability of researchers to capitalize on it. This is due, in large part, to the vagaries of the natural languages in which that data is published for consumption by human readers. The wide variety of lexical forms employed in the research literature present persistent challenges for both humans and computers in finding, assessing, and assimilating relevant data.</p><p>The research community has long recognized the value trapped in natural language publications and has continued to advance the development of ontologies that can mitigate the challenges posed by natural language. Today, ontologies are a critical foundation for emerging technologies that seek to better inform and accelerate biomedical research. Notable among recent advances are efforts to apply Natural Language Processing (NLP) techniques to streamlining the labor-intensive processes of biocuration and systematic scientific reviews.</p><p>Biocuration involves the interpretation, representation, and integration of information relevant to biology into a form that is accessible to both humans and computers. This process results in databases or knowledgebases (e.g., UniProt <ref type="bibr" target="#b0">[1]</ref>, NCBI Database Resources <ref type="bibr" target="#b1">[2]</ref>, and the Rat Genome Database (RGD) <ref type="bibr" target="#b2">[3]</ref>) that assimilate the scientific literature as well as large data sets. Biocuration efforts range in both approach and scope, but they are increasingly supported by automated tools that facilitate information triage and tagging <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>Similar to biocuration is the systematic review: a literature review that gathers and analyzes research literature according to a structured methodology and guided by one or more specific research questions. The aim of systematic review is to produce an exhaustive summary of current literature relevant to those research questions. Sometimes a systematic review is simply an instance of a biocuration effort without sufficient resources to codify the collected knowledge <ref type="bibr" target="#b5">[6]</ref>. As with biocuration, there are increasing efforts to employ natural language processing and other artificial intelligence methods to streamline an expert-driven process that is otherwise very labor intensive <ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref>.</p><p>Biocuration and systematic review processes (whether manual or automated) are complicated by the applicability of overlapping ontologies that cover a breadth of multispecies knowledge that ranges across biological scales from molecules to populations. Ultimately, the exploitation of numerous (but well-aligned) ontologies will provide a comprehensive landscape of biomedical knowledge that will speed the identification of new hypotheses and avenues of investigation.</p><p>In this paper, we report on our initial efforts to improve ontology alignment within the context of scientific literature curation. More specifically, we describe an ensemble learning approach that augments a collection of ontology matching systems with word embeddings generated from an annotated corpus of relevant scientific literature.</p><p>The rest of this paper is organized as follows: In the next section, we provide background and discuss related work. In Sections 3 and 4 we describe our experiments, research hypothesis, and results. Finally, in Section 5, we summarize our conclusions and plans for future work, including extensions that support learning from work-centered user interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background and Related Work</head><p>The best-performing ontology matching tools all rely on collections of complementary matchers in order to compensate for context-specific weaknesses of each contributing/competing heuristic. The challenge of matcher selection and evidence combination has been addressed in a variety of ways ranging from ad hoc rules and manual settings <ref type="bibr" target="#b10">[11]</ref> to ensemble learning methods <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref> that utilize machine learning to select and weight contributing matchers. Methods, such as "mapping gain" measurement, are applicable to the related challenge of selecting appropriate background knowledge sources <ref type="bibr" target="#b13">[14]</ref>. Background knowledge sources play an important role in the performance of ontology matching tools. While string distance measures and taxonomic structure comparison form the backbone of most tools for ontology matching, it is also widely recognized that ontologies constructed by independent experts can differ significantly in both organization and lexical features. In these situations, researchers commonly seek to bridge the gap by drawing on various sources of background knowledge, such as: other ontologies, thesauri, lexical databases, online encyclopedias, and text corpora <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b13">14]</ref>. These knowledge sources can then be used to implement matching functions that account for spelling variations and synonyms, and that also support some measure of semantic comparison <ref type="bibr" target="#b14">[15]</ref>.</p><p>One approach to measuring semantic similarity of elements is to employ WordNet similarity <ref type="bibr" target="#b15">[16]</ref>. However, WordNet offers little coverage of concepts found in realworld ontologies. Another approach is to learn word embeddings directly from text corpora. Word embeddings are distributed word representations that are trained through deep neural networks. Each dimension of the embeddings represents a latent feature of the word, often capturing useful syntactic and semantic properties <ref type="bibr" target="#b16">[17]</ref>.</p><p>Word embeddings have proved to be useful at improving the performance of a wide range of Natural Language Processing (NLP) tasks <ref type="bibr" target="#b17">[18]</ref>. Zhang et al. <ref type="bibr" target="#b14">[15]</ref> showed that word embeddings learned over Wikipedia can improve the effectiveness of matcher ensembles applied to OAEI benchmark, conference track, and real-world ontologies.</p><p>Our own work is similar to that of Zhang et al. <ref type="bibr" target="#b14">[15]</ref> but is differentiated in two primary ways. First, we learn word embeddings from a corpus of annotated scientific literature related to the ontologies to be aligned, rather than from Wikipedia. Second, we employ ensemble learning to integrate open source ontology matchers with our word embedding based matcher.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experimental Setup</head><p>Our research centers on the hypothesis that the information gleaned from the word embeddings learned from a relevant, annotated corpus would improve matching results within a learned ensemble of existing open source ontology matchers. We tested this hypothesis with systematic experiments using the datasets and techniques described in the following.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Datasets</head><p>To evaluate our ensemble matching system, we used two ontologies of disease vocabularies: the subset of the Online Mendelian Inheritance in Man (OMIM) disease vocabulary, a flat list of disease terms covering genetic disorders; and the 'Diseases' branch of the National Library of Medicine's Medical Subject Headings (MeSH). A third vocabulary, the Comparative Toxicogenomics Database's (CTD) 'merged disease vocabulary' (MEDIC) <ref type="bibr" target="#b18">[19]</ref> serves as a reference alignment between OMIM and MeSH. We chose these datasets primarily because there exists a corpus of PubMed titles and abstracts where disease mentions are annotated with the corresponding MEDIC identifiers-such a corpus is needed to train the model from which we train the underlying neural network for our word embedding matcher. In particular, PubTator (a Web-based tool for accelerating manual literature curation) provides an archive of the computer annotation results for the entire collection of PubMed articles in PubTator<ref type="foot" target="#foot_0">1</ref> . This computer-annotated corpus is generated using the DNorm tool for disease named entity recognition <ref type="bibr" target="#b19">[20]</ref>.</p><p>The data files for our ontologies were collected at the end of 2015 for the MeSH, OMIM, and MEDIC disease vocabularies. The ontology for the MeSH 'Diseases' branch includes 11,344 concepts. The ontology of OMIM genetic disorders includes 8,064 concepts. The MEDIC reference alignment identifies 3,435 direct mappings between MeSH and OMIM concepts. Lastly, the entire PubTator corpus contains 14,412,044 documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Word Embedding Matcher (Word2vec)</head><p>Our word embedding matcher uses the similarity scores, as learned by the Word2vec component of the Deeplearning4j library <ref type="bibr" target="#b20">[21]</ref>, as the confidence for a match between a given pair of ontology concepts. Word2vec is a two-layer neural net that processes text, taking a text corpus as input and outputting a set of feature vectors for words in the corpus. The vectors used to represent words are called neural word embeddings and represent a word with numbers based on other neighboring words within the input corpus (see Table <ref type="table" target="#tab_0">1</ref>). Given a large enough corpus, Word2vec can make highly accurate guesses about a particular word's meaning-without human intervention-based solely on numerical representations of word features, such as the context of individual words. Word embedding similarity scores are calculated as the cosine similarity of the vectors for a pair of concepts in the MeSH and OMIM ontologies. Before training the Word2vec model, we preprocess the PubTator corpus so that the annotated phrases for each PubMed document (title and abstract) are replaced by a unique single-token identifier for the corresponding MeSH or OMIM concept. This is necessary because Word2vec learns similarity vectors based on individual words/tokens (and not multi-word phrases). The unique identifier allows us to look up similarity scores for a given pair of concepts from the trained word embedding model. We used Deeplearning4j's suggested configuration: a word window size of 10 for calculating within-sentence word context and the skip-gram technique for predicting the target context, which produces more accurate results on large datasets.</p><p>Training the Word2vec model for more than 14 million documents is very time consuming (on the order of weeks). Once the model is built, however, extracting the similarity score for a given pair of terms is fast. The training time can be reduced by distributing the processing with, for example, an Apache Spark cluster.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Ontology Matching Systems</head><p>In addition to the word embedding matcher, we also utilized a number of publicly available ontology matching systems. These matchers are used both alone and as part of a learned ensemble to evaluate the relative impact of the addition of our word embedding matcher. These systems have all participated in past Ontology Alignment Evaluation Initiative (OAEI) campaigns.</p><p>LogMap. LogMap <ref type="bibr" target="#b21">[22]</ref> is a scalable ontology matching system that utilizes highly optimized data structures to index the input ontologies (both lexically and structurally) to compute an initial set of anchor mappings with corresponding confidence values. These anchors are then used in an iterative process of mapping repair and mapping discovery to uncover new mappings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AgreementMakerLight (AML).</head><p>AML is an ontology matching framework based on AgreementMaker <ref type="bibr" target="#b22">[23]</ref>, one of the leading ontology matching systems. However, whereas AgreementMaker is memory-intensive and was not designed to match ontologies with more than a few thousand concepts, AML is a lightweight system developed with a focus on computational efficiency and is specialized on the biomedical domain but applicable to any ontologies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Generic Ontology Matching and Mapping Management (GOMMA)</head><p>. GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies, but as a generic tool it can be used to match ontologies from other domains <ref type="bibr" target="#b23">[24]</ref>. GOMMA preprocesses all information relevant for matching ontology concepts (e.g., name, synonyms, comments) and uses maximal string similarity to generate matches before aggregating the mappings, filtering out any mappings below a certain threshold, and applying constraints to improve the consistency of mappings.</p><p>(not) Yet Another Matcher (YAM++). The underlying idea of the YAM++ system is that the complexity and, therefore, the cost of the ontology matching algorithms can be reduced by using indexing data structures to avoid exhaustive pair-wise comparisons <ref type="bibr" target="#b24">[25]</ref>. YAM++ preprocesses the input ontologies to calculate the information content of each word to determine the weights of labels. Candidate mappings are passed to a process that uses machine learning to combine several different string-based comparisons to compare the labels/synonyms of entities. Those results are then passed to a structural matcher, which looks at related entities to find more mappings, before combining and filtering the results.</p><p>Falcon-AO. Falcon-AO is a prominent component of the Falcon infrastructure for Semantic Web applications <ref type="bibr" target="#b25">[26]</ref>. For our datasets, Falcon-AO primarily uses partitionbased block matching (PBM), which first divides each ontology into blocks that have a high degree of cohesiveness; then, mappings are discovered by matching similar blocks. The similarity between blocks is a function of the number of "anchors" (alignments with high similarity based on string comparison techniques) that they share.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Ensemble Learning</head><p>We utilize machine learning techniques to determine the weights and confidence level thresholds for each ensemble configuration, allowing for the systematic learning of rules for estimating the correctness of a correspondence based on the output of the different techniques. Our experiments were conducted with the Weka Toolkit <ref type="bibr" target="#b26">[27]</ref>, using the Weka implementation of the REPTree classifier, a fast decision tree learner which builds a tree using information gain as the splitting criterion and then prunes it using reduced error pruning. Our feature vectors comprise the individual mapping confidence scores for each technique being evaluated as well as a single meta-level feature-average matcher confidence. The inclusion of this meta-level feature is based on the findings of Eckert et al. <ref type="bibr" target="#b11">[12]</ref> in which it was found that the most significant feature was not the confidence scores themselves, but the fraction of matchers that found a correspondence. All experiments were conducted with the default Weka classifier settings, making our experiments more easily reproducible.</p><p>Dealing with imbalanced data. Each individual matcher can generate mappings with a range of confidence scores between 0.0 and 1.0 and, unsurprisingly, a large number of incorrect mappings appear at low confidence levels. This introduces a problem during classifier training known as class imbalance-a large difference in the number of positive and negative instances used to train a classifier (i.e., correct vs. incorrect mappings), which may result in a classifier that is biased towards this majority class. At the extreme, this can lead to a classifier with high accuracy that has actually learned to always choose the majority class (i.e., that the mapping is incorrect). In order to account for this when training the classifier, we use a common resampling approach in which the training instance are sampled to provide an even distribution of correct and incorrect training instances. We achieve this by using the Resample filter of the Weka framework for sampling without replacement, and biasing towards a uniform class distribution (i.e., an even split between positive and negative instances).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>Here we describe the results of our experiments to evaluate the performance of our Word2vec-based word embedding matcher. We analyze the performance of the word embedding matcher both in isolation and by measuring its contribution when combined with one or more existing ontology matching systems, showing that this novel technique adds value that is not identified by standard ontology matching systems.</p><p>For the evaluation of each particular classifier configuration, we follow a technique meant to mimic a practical training process for each classifier within the context of scientific literature curation. More specifically, we limit the training of each classifier to a small subset of the mappings produced by the corresponding matchers. We split the training collection into n folds, with each fold consisting of approximately 362 instances, and train a separate classifier on each of the n individual folds. This is meant to simulate the process of training the classifier with a small number of manually reviewed mappings. 362 was chosen as the approximate fixed size for each fold so that the smallest training collection (YAM++ by itself; 3,628 mappings) would have 10 folds for training. Every evaluation uses the same test collection, consisting of the union of all of the potential mappings generated by each of the matching systems (including Word2vec). This allows for a more accurate comparison of the evaluation results across different classifier configurations. We report the average and standard deviation of the traditional precision, recall, and F-measure metrics across each of the n folds for each classifier configuration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Word Embedding Similarity Scores</head><p>We first analyzed the similarity scores produced by the Word2vec technique, which are the cosine similarity of the vectors for each pair of concepts in the MeSH and OMIM ontologies. For comparison, we built two word embedding models for the PubTator corpus: one with the standard configuration and one providing a list of stop words, which Word2vec ignores during training. The chart in Fig. <ref type="figure" target="#fig_0">1</ref> shows the raw counts of the correct and incorrect mappings for both of these models. The results from both models are very similar, with the global distribution of similarity scores (both correct and incorrect) following a normal distribution. The  Word2vec model that ignores stop words finds slightly more correct mappings when at lower values for the similarity score threshold (i.e., below 0.9). It is understandable that ignoring stop words makes little difference if the window size is sufficient, since the Word2vec model automatically accounts for the information gain afforded by specific context words (which should be near zero for stop words). In both models, the number of incorrect mappings increases drastically as the similarity score threshold decreases, with the number of correct and incorrect mappings being roughly equal with a similarity score threshold of 0.85.</p><p>For our experiments, we use similarity scores of at least 0.69. This threshold was chosen so that the number of mappings would be at least twice the size of the larger of the two ontologies (the MeSH ontology contains 11,344 concepts) because a concept in the MeSH ontology may map to more than one concept in the smaller OMIM ontology (8,064 concepts), but not the other way around. By comparison, the number of potential mappings generated by the other ontology matching systems ranges from 3,628 to 7,145. Classifiers trained from the Word2vec similarity scores alone do not perform particularly well (Table <ref type="table" target="#tab_2">2</ref>). Surprisingly, precision was high and recall was low, which is the reverse of what we had expected. For our remaining reported experimental results, we use the model with stop words ignored, representing 25,610 total instances (5.6% of which are correct mappings). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Ensemble Comparisons</head><p>For our baseline, we first look at each ontology matching system alone, using our ensemble approach to learn how to distinguish correct from incorrect mappings using only the confidence scores produced by each system (Table <ref type="table" target="#tab_3">3</ref>).</p><p>The scores for each individual matching system vary widely, which is not particularly surprising given the relatively small fixed-size folds that are used for training each classifier. In the individual configuration, GOMMA and Falcon-AO perform the best on these datasets, with F-measures of 0.590 and 0.546, respectively. Having identified the baseline values for each ontology matching system, we then included the similarity scores generated from our Word2vec word embedding matcher when training a new ensemble for each of the individual ontology matching systems (Table <ref type="table" target="#tab_3">3</ref>).</p><p>When including the Word2vec similarity scores, we see improved F-measure scores across the board and, in general, the standard deviation for each statistic decreases. The most significant gains are to the recall of the LogMap and AML systems as well as in the precision of LogMap and YAM++. Interestingly, the recall for YAM++ drops when adding Word2vec similarity scores. Finally, we combined all of the ontology matching systems together to compare the results both with and without Word2vec, as shown in Table <ref type="table" target="#tab_4">4</ref>. The F-measure for the model trained using the results from all of the ontology matching systems (without Word2vec) improves over the classifiers trained on the results of each system alone (even if the improvement is only marginal, as in the case of GOMMA). The only evaluation statistics to decrease in the full ensemble configuration are the recall for GOMMA and for YAM++. Interestingly, when comparing the performance of the full ensemble classifier (with Word2vec) against the individual matchers each paired with Word2vec, we see that the F-measure for both AML and GOMMA does not change significantly when including the other systems. This would seem to indicate that neither GOMMA nor AML, when combined with Word2vec, are further improved by adding any of the additional matching systems. However, note that GOMMA produces the highest recall of any combination evaluated (0.846 ±0.239), whereas the full ensemble and AML (each including Word2vec) appear to be more balanced as illustrated by their lower recall and higher precision scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Future Work</head><p>In this paper, we have described an ensemble learning approach that augments a collection of ontology matching systems with word embeddings generated from an annotated corpus of relevant scientific literature. We have shown that, within this ensemble approach to ontology matching, the information within word embeddings does contribute to learning an improved model for identifying correct alignments between two ontologies, beyond what state-of-the-art ontology matching systems identify-both individually and in combination. More specifically, the best overall performance (by Fmeasure) was found in the combination of word embedding-derived similarity scores with either the full ensemble containing all of the matching systems under evaluation or the individual AML and GOMMA matching system. However, each of those configurations differed in precision and recall and, therefore, the needs of any particular use case will inform the best configuration for each individual situation. There are also several items that remain to be answered by future work as well as by our own ongoing research. First, we are currently analyzing the PubTator corpus to extract a list of multi-word expressions-using a novel technique for extracting salient variable-length phrases from large text corpora <ref type="bibr" target="#b27">[28]</ref>-which we will use in a similar approach to preprocess the corpus and, prior to training the word embedding model, remove all text that is not among the top expressions in the corpus. We also see opportunities to improve upon our ensemble learning approach by providing additional metalevel features when training our ensemble model, such as binary matcher voting, global ontology features, and concept-specific lexical features used by Eckert et al. <ref type="bibr" target="#b11">[12]</ref>.</p><p>Repeating our experiments with different ontologies and/or in a different domain would help to corroborate our results. Training the relevant Word2vec model, however, requires identifying a sufficiently large domain-relevant corpus that is also annotated with concepts from those ontologies. Given a domain-relevant corpus, it may be possible to use an automated system to automatically detect and annotate concept labels in text, as was done by the DNorm disease tagger for the PubTator corpus.</p><p>There is also an opportunity to significantly reduce the processing time needed to train a Word2vec model from a given corpus. We briefly explored using Deeplearn-ing4j's support for the Apache Spark cluster-computing framework, but we were unable to fully implement the functionality due to time limitations. With Spark, Deeplearn-ing4j can distribute the processing and train models in parallel for individual shards of the large corpus before iteratively averaging the parameters into a central model.</p><p>Lastly, in specific regard to manual biocuration and systematic review processes, we see an opportunity to exploit additional sources of evidence beyond the resulting annotated corpus. More specifically, it may be possible to collect incremental pieces of feedback from work-centered interfaces over the course of a user's normal interaction during biocuration and annotation tasks-for example, while searching for or disambiguating specific concepts for annotating a particular text mention or reference-that can be utilized to further improve ontology matching processes.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The raw number of correct and incorrect mappings by Word2vec similarity score for two word embedding models, trained with and without stop words ignored.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Examples of neural word embedding vectors learned from the PubTator corpus.</figDesc><table /><note>bone marrow, (bmt), solid-organ, disseminated, allogeneic, … blood pressure, rate, hypotension, arterial, concentration, … heart rate, cardiac, re-infarction, pressure, o2, arterial, … liver renal, hepatic, failure, acute, function, chronic, …</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>The average and standard deviation of the F-measure and corresponding precision and recall statistics for each Word2vec word embedding model alone.</figDesc><table><row><cell></cell><cell>Precision</cell><cell>F-measure</cell><cell>Recall</cell></row><row><cell>Word2vec</cell><cell>0.623 ±0.278</cell><cell>0.281 ±0.111</cell><cell>0.190 ±0.082</cell></row><row><cell>Word2vec; Stop Words Ignored</cell><cell>0.618 ±0.234</cell><cell>0.301 ±0.099</cell><cell>0.208 ±0.078</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>The average and standard deviation of the F-measure and corresponding precision and recall statistics for each ontology matching system alone and the difference when combined with the Word2vec word embedding matcher.</figDesc><table><row><cell></cell><cell>Precision</cell><cell>F-measure</cell><cell>Recall</cell></row><row><cell>LogMap</cell><cell>0.304 ±0.270</cell><cell>0.260 ±0.269</cell><cell>0.293 ±0.345</cell></row><row><cell>Δ LogMap with Word2vec</cell><cell cols="3">+0.243 ±0.179 +0.344 ±0.121 +0.477 ±0.226</cell></row><row><cell>AML</cell><cell>0.471 ±0.200</cell><cell>0.436 ±0.165</cell><cell>0.530 ±0.148</cell></row><row><cell>Δ AML with Word2vec</cell><cell cols="3">+0.131 ±0.123 +0.203 ±0.038 +0.217 ±0.159</cell></row><row><cell>GOMMA</cell><cell>0.460 ±0.158</cell><cell>0.590 ±0.202</cell><cell>0.821 ±0.282</cell></row><row><cell>Δ GOMMA with Word2vec</cell><cell cols="3">+0.084 ±0.172 +0.038 ±0.124 +0.025 ±0.239</cell></row><row><cell>Falcon-AO</cell><cell>0.500 ±0.122</cell><cell>0.546 ±0.113</cell><cell>0.658 ±0.087</cell></row><row><cell cols="4">Δ Falcon-AO with Word2vec +0.039 ±0.179 +0.025 ±0.142 +0.023 ±0.217</cell></row><row><cell>YAM++</cell><cell>0.340 ±0.242</cell><cell>0.331 ±0.158</cell><cell>0.705 ±0.288</cell></row><row><cell>Δ YAM++ with Word2vec</cell><cell cols="3">+0.236 ±0.106 +0.249 ±0.083 -0.084 ±0.145</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 .</head><label>4</label><figDesc>The average and standard deviation of the F-measure and corresponding precision and recall statistics for all of the ontology matching systems combined and when combined with the Word2vec word embedding matcher.</figDesc><table><row><cell></cell><cell>Precision</cell><cell>F-measure</cell><cell>Recall</cell></row><row><cell>ALL without Word2vec</cell><cell>0.593 ±0.023</cell><cell>0.593 ±0.061</cell><cell>0.683 ±0.165</cell></row><row><cell>Δ ALL with Word2vec</cell><cell cols="3">+0.053 ±0.151 +0.040 ±0.082 +0.083 ±0.213</cell></row><row><cell cols="4">Word2Vec contributes value beyond the traditional matching systems: including the</cell></row><row><cell cols="4">Word2vec similarity scores when training the ensemble model boosts recall, precision,</cell></row><row><cell cols="4">and (the standard deviation across each training fold also increases).</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/tutorial/index.html#DownloadFTP</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work is supported by the US Army Medical Research and Materiel Command under Contract No. W81XWH-13-C-0036.</p><p>The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of the Army position, policy or decision unless so designated by other documentation.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The Uniprot Consortium (2015) UniProt: a hub for protein information</title>
		<idno type="DOI">10.1093/nar/gku989</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="D204" to="D212" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Database resources of the national center for biotechnology information</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">W</forename><surname>Sayers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Barrett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Benson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bolton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Bryant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Canese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Feolo</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gks1189</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">D1</biblScope>
			<biblScope unit="page" from="D13" to="D25" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The rat genome database 2015: genomic, phenotypic and environmental variations and disease</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shimoyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">De</forename><surname>Pons</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hayman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">T</forename><surname>Laulederkind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Nigam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Petri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Tutaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Worthey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dwinell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jacob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<idno type="DOI">10.1093/nar/gku1026</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="D743" to="D750" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Introducing a text annotation tool (OnToMate); assisting curation at rat genome database</title>
		<author>
			<persName><forename type="first">O</forename><surname>Ghiasvand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shimoyama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics (BCB &apos;16</title>
				<meeting>the 7th ACM international conference on bioinformatics, computational biology, and health informatics (BCB &apos;16<address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="465" to="465" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On expert curation and scalability: UniProtKB/Swiss-Prot as a case study</title>
		<author>
			<persName><forename type="first">S</forename><surname>Poux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">N</forename><surname>Arighi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Magrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bateman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Boutet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bye-A-Jee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Famiglietti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Roechert</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/btx439</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1093/bioinformatics/btx439" />
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">439</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Biocuration with insufficient resources and fixed timelines</title>
		<author>
			<persName><forename type="first">R</forename><surname>Rodriguez-Esteban</surname></persName>
		</author>
		<idno type="DOI">10.1093/database/bav116</idno>
	</analytic>
	<monogr>
		<title level="j">Database: The Journal of Biological Databases and Curation</title>
		<imprint>
			<biblScope unit="volume">2015</biblScope>
			<biblScope unit="page">116</biblScope>
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Systematic review toolbox: A catalogue of tools to support systematic reviews</title>
		<author>
			<persName><forename type="first">C</forename><surname>Marshall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Brereton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th international conference on evaluation and assessment in software engineering</title>
				<meeting>the 19th international conference on evaluation and assessment in software engineering<address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page">23</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Automatic evidence retrieval for systematic reviews</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Choong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Galgani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Dunn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tsafnat</surname></persName>
		</author>
		<idno type="DOI">10.2196/jmir.3369</idno>
	</analytic>
	<monogr>
		<title level="j">J Med Internet Res</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page">e223</biblScope>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Extracting PICO sentences from clinical trial reports using supervised distant supervision</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kuiper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Marshall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">132</biblScope>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Systematic reviews by automatically building information extraction training corpora</title>
		<author>
			<persName><forename type="first">T</forename><surname>Basu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kalyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jayaswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pettifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jonnalagadda</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1606.06424</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Ontology matching: state of the art and future challenges</title>
		<author>
			<persName><forename type="first">P</forename><surname>Shvaiko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Euzenat</surname></persName>
		</author>
		<idno type="DOI">10.1109/TKDE.2011.253</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on knowledge and data engineering</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="158" to="176" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Improving ontology matching using metalevel learning</title>
		<author>
			<persName><forename type="first">K</forename><surname>Eckert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Meilicke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stuckenschmidt</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-02121-3</idno>
	</analytic>
	<monogr>
		<title level="j">LNCS</title>
		<editor>Aroyo L, et al.</editor>
		<imprint>
			<biblScope unit="volume">5554</biblScope>
			<biblScope unit="page" from="158" to="172" />
			<date type="published" when="2009">2009</date>
			<publisher>Springer International Publishing</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Uncertain schema matching</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Synthesis Lectures on Data Management</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="97" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Automatic background knowledge selection for matching biomedical ontologies</title>
		<author>
			<persName><forename type="first">D</forename><surname>Faria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pesquita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Couto</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0111226</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1371/journal.pone.0111226" />
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page">e111226</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Ontology matching with word embeddings</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lv</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-12277-9</idno>
	</analytic>
	<monogr>
		<title level="m">Chinese computational linguistics and natural language processing based on naturally annotated big data</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Maosong</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham, Switzerland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="34" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A survey of exploiting wordnet in ontology matching</title>
		<author>
			<persName><forename type="first">F</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sandkuhl</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-0-387-34747-9</idno>
	</analytic>
	<monogr>
		<title level="m">Artificial intelligence in theory and practice</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Bramer</surname></persName>
		</editor>
		<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer US</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="341" to="350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Word representations: a simple and general method for semi-supervised learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Turian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ratinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics</title>
				<meeting>the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics<address><addrLine>Stroudsburg, PA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="384" to="394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Word embedding for understanding natural language: survey</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-53817-4</idno>
	</analytic>
	<monogr>
		<title level="m">Guide to big data applications</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Srinivasan</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham, Switzerland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="83" to="104" />
		</imprint>
	</monogr>
	<note>Studies in big data</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Wiegers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Rosenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Mattingly</surname></persName>
		</author>
		<idno type="DOI">10.1093/database/bar065</idno>
	</analytic>
	<monogr>
		<title level="j">Database: The Journal of Biological Databases and Curation</title>
		<imprint>
			<biblScope unit="volume">2012</biblScope>
			<biblScope unit="page">65</biblScope>
			<date type="published" when="2012">2012. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">DNorm: disease name normalization with pairwise learning to rank</title>
		<author>
			<persName><forename type="first">R</forename><surname>Leaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Islamaj</forename><surname>Doğan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page" from="2909" to="2917" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="https://deeplearning4j.org/about" />
		<title level="m">Deeplearning4j: Open-source distributed deep learning for the JVM</title>
				<imprint>
			<date type="published" when="2016-07-27">2016. 27 July 2017</date>
		</imprint>
		<respStmt>
			<orgName>Deeplearning4j Development Team</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">LogMap: Logic-based and scalable ontology matching</title>
		<author>
			<persName><forename type="first">E</forename><surname>Jiménez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cuenca</forename><surname>Grau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-25073-6_18</idno>
	</analytic>
	<monogr>
		<title level="m">The semantic web -ISWC 2011</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Aroyo</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">7031</biblScope>
			<biblScope unit="page" from="273" to="288" />
		</imprint>
	</monogr>
	<note>ISWC 2011</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The Agreement-MakerLight ontology matching system</title>
		<author>
			<persName><forename type="first">D</forename><surname>Faria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pesquita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmonari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Couto</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-41030-7_38</idno>
	</analytic>
	<monogr>
		<title level="m">On the move to meaningful internet systems: OTM 2013 Conferences. OTM 2013</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">R</forename><surname>Meersman</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">8185</biblScope>
			<biblScope unit="page" from="527" to="541" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kirsten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hartung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
		<idno type="DOI">10.1186/2041-1480-2-6</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Semantics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">6</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Overview of YAM++-(not) Yet Another Matcher for ontology alignment task</title>
		<author>
			<persName><forename type="first">N</forename><surname>Duyhoa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Bellahsene</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
		<respStmt>
			<orgName>LIRMM</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Dissertation</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Falcon-AO: A practical ontology matching system</title>
		<author>
			<persName><forename type="first">W</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qu</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.web-sem.2008.02.006</idno>
	</analytic>
	<monogr>
		<title level="j">Web Semantics: Science, Services and Agents on the World Wide Web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="237" to="239" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">The WEKA data mining software: an update</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Reutemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Witten</surname></persName>
		</author>
		<idno type="DOI">10.1145/1656274.1656278</idno>
	</analytic>
	<monogr>
		<title level="j">ACM SIGKDD Explorations Newsletter</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="10" to="18" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Automated phrase mining from massive text corpora</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1702.04457</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
