<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A curation pipeline and web-services for PDF documents</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">André</forename><surname>Santos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">DETI/IEETA</orgName>
								<orgName type="institution">University of Aveiro</orgName>
								<address>
									<postCode>3810-193</postCode>
									<settlement>Aveiro</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sérgio</forename><surname>Matos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">DETI/IEETA</orgName>
								<orgName type="institution">University of Aveiro</orgName>
								<address>
									<postCode>3810-193</postCode>
									<settlement>Aveiro</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">David</forename><surname>Campos</surname></persName>
							<email>david.campos@bmd-software.com</email>
							<affiliation key="aff1">
								<orgName type="institution">BMD Software</orgName>
								<address>
									<postCode>3810-074</postCode>
									<settlement>Aveiro</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">José</forename><forename type="middle">Luís</forename><surname>Oliveira</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">DETI/IEETA</orgName>
								<orgName type="institution">University of Aveiro</orgName>
								<address>
									<postCode>3810-193</postCode>
									<settlement>Aveiro</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A curation pipeline and web-services for PDF documents</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0B29FD6A0C0753607B321E55F3AA64BB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The continuous growth of the biomedical literature and the need to efficiently find and extract information from its content led to the development of various text mining tools. More recently, these tools started being integrated in user-friendly applications facilitating their use by expert database curators. However, these tools were mainly designed to extract information from text based documents, in XML and other formats, while today a considerable part of the biomedical literature is published and distributed in PDF format.</p><p>To address this limitation, we extended the web-based literature curation tool Egas, adding support for direct document curation and annotation over PDF files, with side-by-side visualization of the original PDF document and of the extracted textual content. Egas' PDF document processing and text-mining features are supported by a newly developed web-services platform built over Neji, a highly efficient information extraction framework. These web services allow integrating PDF text extraction and annotation capabilities to other tools and text mining pipelines.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The large amount of information and knowledge continuously produced in the biomedical domain is reflected on the number of published journal articles. In 2015, the bibliographic database MED-LINE contained over 23 million references to journal articles in life sciences, of which 1 million were added in that year (U.S. National Library of <ref type="bibr">Medicine, 2016)</ref>. At this rate, staying updated with the current knowledge and identifying the most relevant publications and information on a given subject is a very challenging task for researchers.</p><p>To facilitate the access to knowledge, several resources started by manually curating scientific articles, extracting and structuring relevant and validated information. However, with the rapid growth of data this task became unfeasible <ref type="bibr" target="#b13">(Yeh et al., 2003;</ref><ref type="bibr" target="#b10">Rebholz-Schuhmann et al., 2005)</ref>, and automatic information extraction tools were developed and integrated in the curation pipeline in order to accelerate the curation process <ref type="bibr" target="#b8">(Neves and Leser, 2012)</ref>. This also led to the need of creating end-user interfaces to these tools, allowing their use by curators in a efficient manner. The success of the BioCreative Interactive Annotation Task series demonstrates the importance of these efforts <ref type="bibr" target="#b0">(Arighi et al., 2013)</ref>.</p><p>While existing information extraction tools have been shown to achieve robust performance in various tasks, and various literature curation tools have been proposed that make use of such automated methods, they were generally designed to work with plain text or with structured formats such as XML. There is however a lack of tools for supporting curation workflows that make use of the Portable Document Format (PDF), which has become one of the most popular file formats for publishing and sharing documents.</p><p>We have previously presented Neji <ref type="bibr" target="#b2">(Campos et al., 2013)</ref>, an open source framework for biomedical concept recognition, and Egas <ref type="bibr" target="#b3">(Campos et al., 2014)</ref>, a web-based tool for literature curation built with modern web technologies and providing simple inline representation of annotations and user-friendly interaction. In this paper we present new features added to Egas and Neji to support text-mining and curation workflows over PDF documents. In Section 2 we describe Neji's new PDF processing functionalities and present its Figure <ref type="figure">1</ref>: Neji processing pipeline and modular architecture <ref type="bibr" target="#b2">(Campos et al., 2013)</ref> new web-services platform. These web-services are used by the curation tool for extracting the text from PDF documents and for obtaining automatic concept annotations, and also facilitate the integration of Neji's functionalities in external textmining pipelines and tools. Egas is described in Section 3, highlighting the new PDF annotation features including side-by-side synchronous visualization of the extracted text and of the original PDF, and also the display of concept annotations over the PDF document.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Neji</head><p>Neji is an open source framework for biomedical concept recognition built around four crucial characteristics: modularity, scalability, speed and usability. It follows several state-of-the-art methods for biomedical natural language processing (NLP), namely methods for sentence splitting, tokenization, lemmatization, POS, chunking and dependency parsing. The concept recognition tasks are performed using dictionary matching and machine learning techniques with normalization. This framework implements a very flexible and efficient concept tree to store the document annotations, supporting nested and intersected concepts with one or more identifiers. It supports several input and output formats including the most popular ones in biomedical text mining, such as IeXML, Pubmed XML, A1, CONLL and BioC. The architecture of Neji allows users to configure the processing of documents according to their specific objectives and goals, for example by simply combining existing or new modules for reading, processing and writing data, or by selecting the appropriate dictionaries or machine learning models according to the concept types of interest.</p><p>Neji has been evaluated on several corpora, covering different concept types <ref type="bibr" target="#b2">(Campos et al., 2013;</ref><ref type="bibr" target="#b4">Campos et al., 2015;</ref><ref type="bibr" target="#b7">Matos et al., 2016)</ref>. Table <ref type="table" target="#tab_0">1</ref> shows a summary of the concept identification performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Pipeline and modules</head><p>The main component of Neji is the processing pipeline (Figure <ref type="figure">1</ref>), a series of independent modules, each of them responsible for a specific processing task, that are executed sequentially. We used Monq.jfa<ref type="foot" target="#foot_0">1</ref> , a library for fast and flexible text filtering with regular expressions, to implement each pipeline module as a custom deterministic finite automaton (DFA) with specific rules and actions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1">Handling PDF files</head><p>Thanks to Neji's modular architecture, adding PDF processing capabilities only required the implementation of a new reader module. For this, we integrated LA-PDFText <ref type="bibr" target="#b9">(Ramakrishnan et al., 2012)</ref>, a state-of-the-art open-source tool for handling PDF documents. LA-PDFText makes use of a carefully crafted set of rules defined on the business rules management system DROOLS, allow- In order to evaluate the text extraction quality, we obtained the original PDF documents corresponding to the 67 full-text articles that compose the CRAFT corpus <ref type="bibr" target="#b1">(Bada et al., 2012)</ref>, and compared the text extracted by LA-PDFText, through our processing pipeline, to the distributed text contents, which were extracted from XML files. For these articles, published in 21 different journals and having distinct layouts, we obtained an exact match in 90% of the extracted sentences.</p><p>Apart from extracting the text, which is sufficient for running the processing pipeline, we added additional capabilities to the reader, in order to make use of PDF processing in the curation tool Egas. Namely, we apply sentence splitting to the extracted chunks of text, and extract the position of each sentence in each page to allow aligning and navigating between the plain text and PDF views in the user interface. This information is associated to each sentence and carried over to the remaining modules in the pipeline. A new writer module was also implemented that exports this extended information in JSON format, for simple reuse in external tools.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Web-services</head><p>Neji web-services are intended to facilitate the use and access to Neji functionalities by providing a simple RESTful API that allows developers to send their input documents and receive the plain text extracted from the submitted PDF file and also annotation results in various well-known formats, including standoff (A1) <ref type="bibr" target="#b6">(Kim et al., 2009;</ref><ref type="bibr" target="#b11">Stenetorp et al., 2012) and</ref><ref type="bibr">BioC (Comeau et al., 2013)</ref>.</p><p>Different annotation services can be configured in the platform, in which a service is an annotation pipeline with a custom set of resources (dictionaries and ML models) and processing properties. This provides a way to easily manage concurrent annotation services, allowing the configuration of the properties and resources of each of them independently. Additionally, resources are loaded into memory as soon as a new service is created. Since this usually is an expensive step, especially for large ML models, having the resources in memory greatly reduces the total annotation time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Egas</head><p>Egas is a web-based platform for biomedical text mining and collaborative curation that supports inline annotation of concept occurrences and of relations between these concepts. Annotations can be performed automatically, using the available services for automatic concept and relation identification, or manually, wherein a user can add new annotations and also edit or remove automatically generated annotations. The results can be then exported to various standard annotation formats.</p><p>To adapt Egas to support literature curation over PDF documents, we integrated PDF text extraction using the Neji web services RESTful API and adapted the interface for side-by-side visualization of the extracted text alongside the original PDF Egas' file import web services were also extended to support PDF files. As with the remaining file formats, this web service is responsible for receiving the file, extracting the text content using Neji's PDF processing feature as described above, and creating the whole data structure to support document annotations. This structure includes also sentence information retrieved from Neji, such as the start and end indexes, with respect to the extracted plain text, and its position within the PDF page, allowing synchronous scrolling and navigation between the plain text and PDF views.</p><p>Figure <ref type="figure" target="#fig_0">2</ref> shows Egas' user interface for PDF annotation. The original PDF document is displayed on the right-side panel, while the left panel shows the annotation panel with the extracted text, allowing annotation using the same simple interactions as for other document formats, as described in <ref type="bibr" target="#b3">(Campos et al., 2014)</ref>. As can be seen in the figure, concept annotations added by the automatic annotation services or by the curator are displayed on the plain text as well as on the PDF document. Additionally, a tooltip with information associated to each annotation is shown when hovering the mouse over the annotation on either panel. By clicking a sentence number on the annotation panel, the PDF document is scrolled accordingly, and the corresponding sentence is briefly highlighted to facilitate its identification. Conversely, double-clicking a sentence on the PDF scrolls the text on the annotation panel and highlights the corresponding sentence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>Assisted literature curation tools, based on text mining and information extraction methods, are increasingly being used by curation teams, helping to expedite their tasks. However, there is a lack of tools that support direct annotation of PDF documents, which is a very common format for the scientific literature and other document types, such as patents. We present a new feature of Egas that allows direct document curation and annotation over PDF files, with side-by-side visualization of the original PDF document and of the extracted textual content. By aligning the user-friendliness of Egas with the possibility of reading the document in a very familiar format such as PDF, we provide a more convenient and agreeable literature curation environment, which could contribute to improved efficiency.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Egas PDF annotation interface</figDesc><graphic coords="4,72.00,62.80,453.55,231.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,86.17,62.81,425.20,212.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Neji concept recognition results on a variety of corpora and concept types. D: Dictionary; ML:</figDesc><table><row><cell>Machine-Learning</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Corpus</cell><cell>Concept type</cell><cell>F-score</cell><cell>Method</cell></row><row><cell>CRAFT</cell><cell>Species</cell><cell>95%</cell><cell>D</cell></row><row><cell></cell><cell>Cell</cell><cell>92%</cell><cell>D</cell></row><row><cell></cell><cell>Gene and Protein</cell><cell>76%</cell><cell>ML</cell></row><row><cell></cell><cell>Chemicals</cell><cell>65%</cell><cell>D</cell></row><row><cell></cell><cell>Cellular Component</cell><cell>83%</cell><cell>D</cell></row><row><cell></cell><cell>Biological Process and Molecular Function</cell><cell>63%</cell><cell>D</cell></row><row><cell>NCBI Disease</cell><cell>Disorders</cell><cell>85%</cell><cell>D</cell></row><row><cell>Anem</cell><cell>Anatomy</cell><cell>82%</cell><cell>D</cell></row><row><cell>BC II Gene Mention</cell><cell>Gene and Protein</cell><cell>87%</cell><cell>ML</cell></row><row><cell>tmVar</cell><cell>Genetic Variants</cell><cell>86%</cell><cell>ML</cell></row><row><cell>BC IV ChemdNER</cell><cell>Chemicals</cell><cell>87%</cell><cell>ML</cell></row><row><cell cols="2">ing to correctly handle different PDF layouts such</cell><cell></cell><cell></cell></row><row><cell cols="2">as one column, two columns and mixed layouts.</cell><cell></cell><cell></cell></row><row><cell cols="2">This feature also allows defining different sets of</cell><cell></cell><cell></cell></row><row><cell cols="2">rules for specific PDF layouts if necessary, and we</cell><cell></cell><cell></cell></row><row><cell cols="2">therefore included in the new Neji reader an op-</cell><cell></cell><cell></cell></row><row><cell>tional parameter for this.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.pifpafpuf.de/Monq.jfa/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An overview of the biocreative 2012 workshop track iii: interactive text mining task</title>
		<author>
			<persName><forename type="first">Cecilia</forename><forename type="middle">N</forename><surname>Arighi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Carterette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Bretonnel Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Krallinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Petra</forename><surname>Wilbur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Fey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Laurel</forename><surname>Dodson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ceri</forename><forename type="middle">E</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wasila</forename><surname>Van Slyke</surname></persName>
		</author>
		<author>
			<persName><surname>Dahdul</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">56</biblScope>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Concept annotation in the craft corpus</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Bada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miriam</forename><surname>Eckert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Donald</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristin</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Krista</forename><surname>Shipley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dmitry</forename><surname>Sitnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">A</forename><surname>Baumgartner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bretonnel Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Karin</forename><surname>Verspoor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Judith</forename><forename type="middle">A</forename><surname>Blake</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">1</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A modular framework for biomedical concept recognition</title>
		<author>
			<persName><forename type="first">David</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Matos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">José Luís</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">281</biblScope>
			<date type="published" when="2013-01">2013. jan</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Egas: a collaborative and interactive document curation platform</title>
		<author>
			<persName><forename type="first">David</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jóni</forename><surname>Lourenc ¸o</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Matos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">José Luís</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Database : the journal of biological databases and curation</title>
				<imprint>
			<date type="published" when="2014-01">2014. 2014. jan</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A document processing pipeline for annotating chemical entities in scientific documents</title>
		<author>
			<persName><forename type="first">David</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Matos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">José L</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of cheminformatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">1</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Bioc: a minimalist approach to interoperability for biomedical text processing</title>
		<author>
			<persName><forename type="first">Rezarta</forename><surname>Donald C Comeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Islamaj Dogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Ciccarese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Bretonnel Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Florian</forename><surname>Krallinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiyong</forename><surname>Leitner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yifan</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manabu</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><surname>Torii</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">64</biblScope>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of bionlp&apos;09 shared task on event extraction</title>
		<author>
			<persName><forename type="first">Jin-Dong</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomoko</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sampo</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshinobu</forename><surname>Kano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun'ichi</forename><surname>Tsujii</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task</title>
				<meeting>the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Mining clinical attributes of genomic variants through assisted literature curation in egas</title>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Matos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Renato</forename><surname>Pinho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raquel</forename><forename type="middle">M</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthew</forename><surname>Mort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">N</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">José Luís</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">96</biblScope>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A survey on annotation tools for the biomedical literature</title>
		<author>
			<persName><forename type="first">Mariana</forename><surname>Neves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ulf</forename><surname>Leser</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Briefings in bioinformatics</title>
		<imprint>
			<biblScope unit="page">84</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Layout-aware text extraction from full-text PDF of scientific articles</title>
		<author>
			<persName><forename type="first">Cartic</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abhishek</forename><surname>Patnia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eduard</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gully Apc</forename><surname>Burns</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Source code for biology and medicine</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">7</biblScope>
			<date type="published" when="2012-01">2012. jan</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Facts from textis text mining ready to deliver?</title>
		<author>
			<persName><forename type="first">Dietrich</forename><surname>Rebholz-Schuhmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Harald</forename><surname>Kirsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francisco</forename><surname>Couto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS Biol</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">e65</biblScope>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">BRAT: a web-based tool for NLP-assisted text annotation</title>
		<author>
			<persName><forename type="first">Pontus</forename><surname>Stenetorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sampo</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Goran</forename><surname>Topić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomoko</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sophia</forename><surname>Ananiadou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun'ichi</forename><surname>Tsujii</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012-04">2012. apr</date>
			<biblScope unit="page" from="102" to="107" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename></persName>
		</author>
		<title level="m">Detailed Indexing Statistics</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1965" to="2015" />
		</imprint>
		<respStmt>
			<orgName>National Library of Medicine</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Evaluation of text data mining for database curation: lessons learned from the kdd challenge cup</title>
		<author>
			<persName><forename type="first">Lynette</forename><surname>Alexander S Yeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><forename type="middle">A</forename><surname>Hirschman</surname></persName>
		</author>
		<author>
			<persName><surname>Morgan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="331" to="339" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
	<note>suppl</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
