<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Populating and Refining an Ontology of Cellulose Materials with Terms from Scientific Publications: Extended Abstract</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Umayer</forename><surname>Reza</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Science</orgName>
								<orgName type="institution">University of Maine</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Torsten</forename><surname>Hahmann</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computing and Information Science</orgName>
								<orgName type="institution">University of Maine</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Populating and Refining an Ontology of Cellulose Materials with Terms from Scientific Publications: Extended Abstract</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BA9C9BA5E8D124B8FFF9569D536DD020</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Named Entity Recognition</term>
					<term>Cellulose Ontology</term>
					<term>Knowledge Graph</term>
					<term>Scientific Publication</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Cellulose is a highly versatile biopolymer with numerous applications, such as paper and paperboard production, textiles, packaging, biofuels, and biomedical applications. Though, the scattered nature of cellulose knowledge with ambiguous terms and datasets presents significant obstacles to its optimal utilization. This project seeks to address these challenges by systematically accumulating scattered knowledge about cellulose, enabling it to be modifiable, extensible, and reusable. The objective of the project is to develop an automated system to extract relevant cellulosic terms from scientific publications which will show an improved performance in named entity classification by taking additional context and disambiguous information from an existing cellulose ontology. An incremental training process will be utilized to train a ScispaCy language model, which is specifically designed for analyzing scientific, clinical, and biomedical texts, in order to accomplish this task. The system will also generate new terms for the ontology by taking the existing ontology into account. Therefore, the proposed system will facilitate the extension of the ontology, while simultaneously benefiting from the ontology to enhance performance in named entity classification. By meeting these objectives, the project aims to contribute to the development of a sustainable bioproduct-based society by providing a resource of state-of-the-art knowledge in cellulose materials that can facilitate material science research.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Cellulose, the most abundant and versatile biopolymer on earth found in plant cells and some bacteria, is the building block of cellulosic materials, which have many applications in various domains because of their sustainable, renewable, and biodegradable nature. One of the most significant applications of cellulosic materials is the production of paper and paperboard. The unique dimensions and characteristics of cellulose nanofibrils (CNFs) make them crucial in papermaking for enhancing the strength properties of paper <ref type="bibr" target="#b0">[1]</ref>. In addition to their use in paper production, several nanocelluloses (NCs) are alternatives for the textile industry because of their higher mechanical resistance <ref type="bibr" target="#b1">[2]</ref>, and as a substitute for petroleum-based packaging <ref type="bibr" target="#b2">[3]</ref>, and as a natural polymer with low toxicity, high crystallinity, biocompatibility, and biosafety for biomedical applications <ref type="bibr" target="#b3">[4]</ref>.</p><p>However, the knowledge on cellulose, much of which is stored only in scientific publication and, even when available in digital formats like PDF or HTML, is not easily processed at a large scale due to the scattered and ambiguous nature of text. Information extraction (IE) approaches are needed to extract this information from scientific publications and make it accessible in structured formats. By putting the information into an ontology, the knowledge can be subsequently queried and reasoned with more efficiently. One of the key steps in IE is NER, or Named Entity Recognition, which is the task of extracting nouns and noun chunks, called entities or named entities, from text. Identification and classification of named entities is the key part of the information extraction process. Though, the current NER approaches are limited in their ability to recognize cellulosic named entities and to handle variations in the naming conventions of them. ChemSpot <ref type="bibr" target="#b4">[5]</ref> is a hybrid chemical named entity recognizer that utilized a CRF (Conditional Random Field) model to identify chemical named entities in natural language texts. In the biomedical domain, tmChem <ref type="bibr" target="#b5">[6]</ref> employed a model combination approach using two different CRF models to recognize chemical mentions, properties, and their relationships. <ref type="bibr" target="#b6">Akkasi et al. (2016)</ref> introduced ChemTok [7], a rule-based tokenizer specifically designed for chemical named entity recognition. <ref type="bibr" target="#b7">Swain et al. (2016)</ref> presented ChemDataExtractor <ref type="bibr" target="#b7">[8]</ref>, a toolkit capable of extracting chemical entities along with their properties, measurements, and relationships. <ref type="bibr" target="#b8">Corbett and Boyle (2018)</ref> developed Chemlistem <ref type="bibr" target="#b8">[9]</ref>, a chemical named entity recognizer based on recurrent neural networks. Although there are other methods available for extracting material entities from text, they also struggle in recognizing the diverse range of entities encountered in the cellulosic domain. Zhao et al. ( <ref type="formula">2021</ref>) introduced a fine-tuned BERT model <ref type="bibr" target="#b9">[10]</ref> specifically designed for materials named entity recognition. Similarly, Miah and Sulaiman (2023) proposed a deep neural network-based model <ref type="bibr" target="#b10">[11]</ref> tailored for materials named entity recognition. <ref type="bibr" target="#b11">Shetty et al. (2023)</ref> presented an alternative approach <ref type="bibr" target="#b11">[12]</ref> for extracting material property data. Furthermore, Weston et al. (2019) presented a comprehensive approach <ref type="bibr" target="#b12">[13]</ref> that not only extracts material properties but also captures their applications and mentions of inorganic materials.</p><p>In order to comprehensively capture cellulosic knowledge, it is crucial to extract a wide range of relevant entities beyond just chemicals and materials. This includes extracting properties associated with materials and chemicals, manufacturing processes, as well as names of products and equipment. Therefore, existing methods often fail to accurately identify a significant portion of cellulosic entities due to their limited familiarity with cellulosic data. The ultimate goal of the project is to contribute in growing an ontology-guided knowledge body about cellulose by extracting relevant terms from the scientific literatures which are the preferred source of knowledge. Initially, a manually made cellulose ontology will play a significant role in NER by providing a structured representation of the knowledge and relationships among entities. It can also assist to improve NER performance by providing additional context and disambiguous information, as well as enabling more sophisticated reasoning and inference. Later on the cellulose ontology itself will grow by adding newly identified cellulosic entities to speed up ontology development. The manual amendment of ontologies can be a time-consuming and costly process which limits their usefulness in practice. In contrast, an automated process can help to overcome these limitations and enable more efficient and effective use of ontologies in NER and other NLP tasks. Additionally, the automatic amendment of ontologies can help to ensure that they are up-to-date and reflect the latest developments in the domain, but there is little work in leveraging the synergies between NER and ontologies: in (1) utilizing ontologies for NER and (2) using NER to amend and populate ontologies.</p><p>In scientific domains where accurate organization of terms is important to avoid misrepresentation of knowledge, relying solely on an automatic ontology construction method that builds the ontology from scratch may not be effective. Instead, a semi-automated method, where domain experts contribute initial concepts and relationships to establish a core domain ontology, can be utilized to amend the ontology with additional terms. The purpose of incorporating an ontology in the named entity recognition process is to improve its performance in the cellulosic domain. This integration will enable the system to identify the named entities that align with the concepts and relationships of the ontology and classify them accordingly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Research Questions</head><p>The proposed dissertation aims to leverage the synergies between ontologies and NER by specifically addressing the following three research questions:</p><p>1. Under what conditions can pre-trained language models be incrementally re-trained for improved named entity recognition of terms in the cellulosic domain? 2. How can cellulose-related terms that are identified by such NER approaches be categorized more effectively and precisely by leveraging a small hand-curated ontology of cellulose materials? 3. What methods are suitable for determining whether a particular identified term refers to a concept that already exists in the ontology, or a new concept that requires amending the ontology?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Objective</head><p>The objective of the project is to develop an automated system that will identify cellulosic terms from given text with a higher accuracy. Additionally, the system aims to classify these identified terms, as much as possible, with the most relevant concepts available in a cellulose ontology which is currently being developed. To accomplish this, the proposed system will establish internal communication with the ontology, enabling the identification and classification of new cellulosic terms that are not yet incorporated in the ontology. The resulting set of new terms will be shared with domain experts for review, allowing them to assess the relevance of those terms and determine their appropriate placement within the taxonomy. If a new term is found to carry a more refined semantic meaning than an existing term in the ontology, the ontology will be amended accordingly. Furthermore, the identification and exclusion of irrelevant terms in every NER process will accelerate the continual assessment process for recognized cellulosic terms and their association with the ontology over time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Research Methodology</head><p>A ScispaCy <ref type="bibr" target="#b13">[14]</ref> language model will be selected considering its current performance on the cellulosic data. This performance will be measured based on the number of correctly recognized cellulosic terms from a curated corpus of evaluation data. The best performing model will go through an incremental training process using spaCy <ref type="bibr" target="#b14">[15]</ref> model training pipeline. In the first phase of training, the selected model will be introduced with CHEMDNER corpus <ref type="bibr" target="#b15">[16]</ref> which is currently the largest corpus for chemical terms. Then the model will undergo training using various corpus of training data, including materials, properties, processes, and more, across different training phases. This iterative process will enable the model to progressively enhance its understanding and knowledge on diverse terms of the cellulosic domain. After each phase of training the performance of the model will be evaluated using standard metrics such as precision, recall, and F1-score. To assess the improvement, a comparison will also be conducted between the performance of the model in the current training phase and the performance of the models in the previous training phases using an identical evaluation dataset. Finally, the improved language model will be employed to extract cellulosic named entities from a vast collection of text documents. The extracted cellulosic terms will be comprehensively compared to the existing terms in the ontology, generating a set of candidate terms to be forwarded for further verification by domain experts. The domain experts will assess the suitability of these terms and make informed decisions regarding their rejection or integration into the ontology. This collaborative process will ensure that the ontology is enriched with relevant and accurate terms, enhancing its overall effectiveness and comprehensiveness. The effectiveness of the approach will be measured by calculating the percentage of terms accepted by domain experts. This metric will provide valuable insights into the success and acceptance of the proposed method in enriching the ontology with relevant and authoritative terms.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>FOIS 2023</head><label>2023</label><figDesc>Early Career Symposium (ECS), held at FOIS 2023, co-located with 9th Joint Ontology Workshops (JOWO 2023), 19-20 July, 2023, Sherbrooke, Québec, Canada Envelope a.reza@maine.edu (U. Reza); torsten.hahmann@maine.edu (T. Hahmann) GLOBE https://umaine.edu/scis/people/rezaumayer (U. Reza); https://umaine.edu/scis/people/torsten-hahmann (T. Hahmann) Orcid 0000-0003-4013-3513 (U. Reza); 0000-0002-5331-5052 (T. Hahmann)</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was supported in part by the U.S. Department of Agriculture:</p><p>• Forest Service, Project 20-JV-11111124-055 • USDA Agricultural Research Service (ARS), Project 0204-41510-001-98S • National Institute of Food and Agriculture (NIFA), Award 2021-67022-34366</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Role of cellulose nanofibrils in improving the strength properties of paper: a review</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Jele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lekha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">B</forename><surname>Sithole</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10570-021-04294-8</idno>
	</analytic>
	<monogr>
		<title level="j">Cellulose</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="55" to="81" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Trends on the cellulose-based textiles: Raw materials and technologies</title>
		<author>
			<persName><forename type="first">C</forename><surname>Felgueiras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">G</forename><surname>Azoia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M A</forename><surname>Gonçalves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dourado</surname></persName>
		</author>
		<idno type="DOI">10.3389/fbioe.2021.608826</idno>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Bioengineering and Biotechnology</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Prospects for replacement of some plastics in packaging with lignocellulose materials: A brief review</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lutes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>He</surname></persName>
		</author>
		<idno type="DOI">10.15376/biores.13.2.Su</idno>
	</analytic>
	<monogr>
		<title level="j">Bioresources</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="4550" to="4576" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">General scenarios of cellulose and its use in the biomedical field</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gopi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Balakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chandradhara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Poovathankandy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thomas</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.mtchem.2019.04.012</idno>
	</analytic>
	<monogr>
		<title level="j">Materials Today Chemistry</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="59" to="78" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Chemspot: a hybrid system for chemical named entity recognition</title>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Weidlich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Leser</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/bts183</idno>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="1633" to="1640" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">tmchem: a high performance approach for chemical named entity recognition and normalization</title>
		<author>
			<persName><forename type="first">R</forename><surname>Leaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<idno type="DOI">10.1186/1758-2946-7-S1-S3</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cheminformatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">S3</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Chemtok: A new rule based tokenizer for chemical named entity recognition</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akkasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Varoğlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dimililer</surname></persName>
		</author>
		<idno type="DOI">10.1155/2016/4248026</idno>
	</analytic>
	<monogr>
		<title level="j">BioMed Research International</title>
		<imprint>
			<biblScope unit="volume">2016</biblScope>
			<biblScope unit="page" from="1" to="9" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Chemdataextractor: A toolkit for automated extraction of chemical information from the scientific literature</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Swain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Cole</surname></persName>
		</author>
		<idno type="DOI">10.1021/acs.jcim.6b00207</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Information and Modeling</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="1894" to="1904" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Chemlistem: chemical named entity recognition using recurrent neural networks</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T</forename><surname>Corbett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Boyle</surname></persName>
		</author>
		<idno type="DOI">10.1186/s13321-018-0313-8</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cheminformatics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">59</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Fine-tuning bert model for materials named entity recognition</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Greenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">T</forename><surname>Hu</surname></persName>
		</author>
		<idno type="DOI">10.1109/BigData52589.2021.9671697</idno>
	</analytic>
	<monogr>
		<title level="m">2021 IEEE International Conference on Big Data (Big Data)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="3717" to="3720" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Material named entity recognition (mner) for knowledge-driven materials using deep learning approach</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S U</forename><surname>Miah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sulaiman</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-19-9483-8_17</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Kaiser</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Waheed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Bandyopadhyay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mahmud</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Ray</surname></persName>
		</editor>
		<meeting>the Fourth International Conference on Trends in Computational and Cognitive Engineering<address><addrLine>Singapore, Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="199" to="208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing</title>
		<author>
			<persName><forename type="first">P</forename><surname>Shetty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Rajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kuenneth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Panchumarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Holm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ramprasad</surname></persName>
		</author>
		<idno type="DOI">10.1038/s41524-023-01003-w</idno>
	</analytic>
	<monogr>
		<title level="j">npj Computational Materials</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">52</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Named entity recognition and normalization applied to large-scale information extraction from the materials science literature</title>
		<author>
			<persName><forename type="first">L</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tshitoyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dagdelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Kononova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Trewartha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Persson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ceder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<idno type="DOI">10.1021/acs.jcim.9b00470</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of chemical information and modeling</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="3692" to="3702" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Scispacy: Fast and robust models for biomedical natural language processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ammar</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W19-5034</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th BioNLP Workshop and Shared Task, Association for Computational Linguistics</title>
				<meeting>the 18th BioNLP Workshop and Shared Task, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="319" to="327" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">spacy: Industrial-strength natural language processing in python</title>
		<author>
			<persName><forename type="first">M</forename><surname>Honnibal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Montani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Landeghem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Boyd</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.1212303</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The chemdner corpus of chemicals and drugs and its annotation principles</title>
		<author>
			<persName><forename type="first">M</forename><surname>Krallinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Rabal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Leitner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vázquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Salgado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Leaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D.-H</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Sayle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Batista-Navarro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Huber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Matos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Munkhdalai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">H</forename><surname>Ryu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Ramanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Nathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Žitnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bajec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Irmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Akhondi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Kors</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">K</forename><surname>Sikdar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ekbal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yoshioka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Dieb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Verspoor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khabsa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Giles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Ravikumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lamurias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Couto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-J</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename></persName>
		</author>
		<author>
			<persName><forename type="first">.-H</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Can</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Usie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Alves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Oyarzábal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Valencia</surname></persName>
		</author>
		<idno type="DOI">10.1186/1758-2946-7-S1-S2</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cheminformatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">S2</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
