<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">SCAIView -A Semantic Search Engine for Biomedical Research Utilizing a Microservice Architecture</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Jens</forename><surname>Dörpinghaus</surname></persName>
							<email>jens.doerpinghaus@scai.fraunhofer.de</email>
							<affiliation key="aff0">
								<orgName type="department">Fraunhofer Institute for Algorithms and Scientific Computing</orgName>
								<orgName type="institution">Schloss Birlinghoven</orgName>
								<address>
									<settlement>Sankt Augustin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jürgen</forename><surname>Klein</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Fraunhofer Institute for Algorithms and Scientific Computing</orgName>
								<orgName type="institution">Schloss Birlinghoven</orgName>
								<address>
									<settlement>Sankt Augustin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Johannes</forename><surname>Darms</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Fraunhofer Institute for Algorithms and Scientific Computing</orgName>
								<orgName type="institution">Schloss Birlinghoven</orgName>
								<address>
									<settlement>Sankt Augustin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sumit</forename><surname>Madan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Fraunhofer Institute for Algorithms and Scientific Computing</orgName>
								<orgName type="institution">Schloss Birlinghoven</orgName>
								<address>
									<settlement>Sankt Augustin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marc</forename><surname>Jacobs</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Fraunhofer Institute for Algorithms and Scientific Computing</orgName>
								<orgName type="institution">Schloss Birlinghoven</orgName>
								<address>
									<settlement>Sankt Augustin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">SCAIView -A Semantic Search Engine for Biomedical Research Utilizing a Microservice Architecture</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4358991CA36DBE51015AFBB46508C3EC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Biological and medical researchers explore the mechanisms of living organisms and tend to gain a better understanding of underlying fundamental biological processes of life. To tackle such complex tasks they constantly need to gather and accumulate new knowledge by performing experiments and studying scientific literature. We will present the novel semantic search engine "SCAIView" for knowledge discovery and retrieval and, additionally, discuss the most recent paradigm shifts in communication technologies, which leads to a completely new architecture that improves scalability, achieves better interoperability, and also increases fault-tolerance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Biological and medical researchers are interested in exploring the mechanisms of living organisms and gaining a better understanding of underlying fundamental biological processes of life. To tackle such complex tasks they constantly gather and accumulate new knowledge by performing experiments, and also studying scientific literature that includes results of further experiments performed by researchers. Existing solutions are mainly based on the methods of biomedical text mining to extract key information from unstructured biomedical text (such as publications, patents, and electronic health records).</p><p>Especially in the field of biomedical sciences, we have a long history of developing applications that solve the above mentioned tasks. For instance, SCAIView<ref type="foot" target="#foot_0">3</ref> is an information retrieval system that allows semantic searches in large textual collections by combining free text searches with the ontological representations of automatic recognized biological entities (see Hodapp et al. <ref type="bibr" target="#b4">[5]</ref>). SCAIView was used in many recent research projects, for example regarding neurodegenerative diseases <ref type="bibr" target="#b3">[4]</ref> or brain imaging features <ref type="bibr" target="#b5">[6]</ref>. Furthermore, it was also used for document classification and clustering <ref type="bibr" target="#b2">[3]</ref>. Another important real-world task is the creation of biological knowledge graphs that is tackled by the BELIEF environment <ref type="bibr" target="#b8">[9]</ref>. It assists researchers during the curation process by providing relationships extracted by automatic text mining solutions and represented in a human-readable form <ref type="bibr" target="#b9">[10]</ref>. At the core of both technologies several implementations of the methods of biomedical text mining are in place.</p><p>In this poster we will present the recent development of SCAIView, and how SCAIView (as well as BELIEF) evolved using the same core technologies to an interoperable software system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">SCAIView architecture</head><p>To keep up with the state-of-the-art technologies and to be prepared for integration of novel and game-changing developments, we migrated the SCAIView ecosystem from a large monolith to microservice-based system. It allows us to reuse parts for different purposes and the data itself can be easily processed, shared and accessed. Additionally, the new system also allows us to focus on FAIR (Findable, Accessible, Interoperable, and Reusable) principles, introduced in <ref type="bibr" target="#b10">[11]</ref>, that are becoming a standard in the biological scientific community.</p><p>The microservice infrastructure of SCAIView is an ecosystem of three main services: Core, API, and Indexer (Figure <ref type="figure" target="#fig_0">1</ref>), which communicate through the message broker (Apache ActiveMQ). The core fulfills various important tasks to persist, retrieve, and process data. Beside further text mining microservices, there are also specialized microservices such as BEL Commons Professional, which allows to validate text-mined biological entities and relationships, that are shared by BELIEF and SCAIView ecosystems. SCAIView's user interface itself is a web-based microservice application running on Apache Tomcat communicating via REST-API calls with the backend. The visualization of the document corpus includes document elements that are stored and represented as semantic digital assets (SDA) (Jacobs et al. <ref type="bibr" target="#b6">[7]</ref>). The SDA represent various semantically-enriched domain models that can be binary data like images or plain-text such as natural language. The corpus itself is pre-processed and stored in a document store.</p><p>The Document Store is based on Apache Accumulo and Apache Solr. The first one is used to persist raw results of the text mining pipelines. This allows us to compare and validate the development of old and new text mining components really fast, which is necessary in the research area. The latter one contains SDAs such as the document text, recognized semantic concepts, and further metadata that is needed for fast retrieval. SCAIView can also handle multiple text mining and knowledge discovery pipelines by communicating through the message broker. Common steps are the usage of a DocumentDecomposer, Lemmatizer, JProMiner for named entity recognition. Other text processing components, such as UIMA Ruta-based components (see <ref type="bibr" target="#b7">[8]</ref>) or ChemoCR (see <ref type="bibr" target="#b11">[12]</ref>) can be used on demand and be easily integrated into processing pipelines.</p><p>Search queries and knowledge discovery in SCAIView is linked to ontology and terminology data. Semantic searches are a combination of free text search and entities represented in ontologies or terminologies. For instance, SCAIView includes Alzheimer's Disease Ontology (ADO), BioMarker terminology, drug names, the Hypothesis Finder and many more. These resources are displayed in a tree format and can be used to make detailed, faceted search queries and to perform statistical analysis on the retrieved document corpus. The access to these resources is provided by our internal-hosted OLS service (Ontology Lookup Service <ref type="bibr" target="#b1">[2]</ref>) and the upcoming TeMOwl (Terminology Management based on OWL) service.</p><p>In general, SCAIView is developed to handle any kind of document corpus but currently we focus on the biomedical research area. Therefore, as input we use databases such as PubMed 2017 <ref type="bibr" target="#b0">[1]</ref> that contains around 27 million abstracts and PMC 2017<ref type="foot" target="#foot_1">4</ref> that includes around 2 million biomedical-related full-text articles. Following <ref type="bibr" target="#b6">[7]</ref> and <ref type="bibr" target="#b4">[5]</ref> the processing of huge data is not only possible, but also very efficient and the microservice infrastructure is highly scalable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusion</head><p>Although several risks and problems have to be faced, we are sure that positive advantages of implementation of a microservice system do outweigh. For both applications, SCAIView as well as BELIEF, several microservices are used and shared for purpose of data retrieval, data persistence, and text mining. The latter are classical microservices, whereas the retrieval and persistence services are more general microservices. Additionally, the microservices in the data layer can also be traditional webservices such as the terminology management or authentication systems. We benefit from a highly scalable and fault-tolerant environment for data processing. Furthermore, the system is flexible enough to easily add or remove microservices from the processing pipeline. The continuous delivery process for externally-developed software like OLS or Keycloak is not an issue anymore. An additional benefit is the safe and fast switching from one technology to another: TeMOWl and OLS can be used at the same time for multiple instances of SCAIView.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The shared architecture for the semantic search engine (SCAIView) and the semiautomatic knowledge graph creation environment (BE-LIEF). It consists of three different layers: application, microservice, and data layer. The BEL network-related microservices are called BEL Commons Professional.</figDesc><graphic coords="3,229.26,115.83,248.99,192.26" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://www.scaiview.com/ (an academia version is freely available at http://academia.scaiview.com/academia/)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://www.ncbi.nlm.nih.gov/pmc/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Database resources of the national center for biotechnology information</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">R</forename><surname>Coordinators</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page">D12</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The ontology lookup service: more data and better tools for controlled vocabulary queries</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Côté</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Apweiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hermjakob</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="W372" to="W376" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>suppl</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Document clustering using a graph covering with pseudostable sets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dörpinghaus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schaaf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fluck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jacobs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="329" to="338" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Using drugs as molecular probes: A computational chemical biology approach in neurodegenerative diseases</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A E K</forename><surname>Emon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Karki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Younesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hofmann-Apitius</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Alzheimer&apos;s Disease</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="677" to="686" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Integration of UIMA Text Mining Components into an Event-based Asynchronous Microservice Architecture</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hodapp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fluck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zimmermann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2016 Workshop &quot;Cross-Platform Text Mining and Natural Language Processing Interoperability</title>
				<meeting>the LREC 2016 Workshop &quot;Cross-Platform Text Mining and Natural Language Processing Interoperability<address><addrLine>Portorož, Slovenia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="19" to="23" />
		</imprint>
	</monogr>
	<note>European Language Resources Association (ELRA)</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Neuroimaging feature terminology: A controlled terminology for the annotation of brain imaging features</title>
		<author>
			<persName><forename type="first">A</forename><surname>Iyappan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Younesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Redolfi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Vrooman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khanna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">B</forename><surname>Frisoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hofmann-Apitius</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Alzheimer&apos;s Disease</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1153" to="1169" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">SDA: Towards a novel Knowledge Discovery Model for Information Systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jacobs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hodapp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dörpinghaus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th IADIS International Conference Information Systems</title>
				<meeting>the 11th IADIS International Conference Information Systems</meeting>
		<imprint>
			<publisher>IADIS</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="300" to="302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Uima ruta: Rapid development of rule-based information extraction applications</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kluegl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Toepfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Beck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Puppe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Natural Language Engineering</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="40" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The BEL information extraction workflow (BE-LIEF): evaluation in the BioCreative V BEL and IAT track</title>
		<author>
			<persName><forename type="first">S</forename><surname>Madan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hodapp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Senger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ansari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Szostak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Peitsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fluck</surname></persName>
		</author>
		<idno type="DOI">10.1093/database/baw136</idno>
		<ptr target="http://database.oxfordjournals.org/lookup/doi/10.1093/database/baw136" />
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">136</biblScope>
			<date type="published" when="2016-10">2016. oct 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Construction of biological networks from unstructured information based on a semi-automated curation workflow</title>
		<author>
			<persName><forename type="first">J</forename><surname>Szostak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ansari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fluck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Talikka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iskandar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>De León</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hofmann-Apitius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Peitsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoeng</surname></persName>
		</author>
		<idno type="DOI">10.1093/database/bav057</idno>
		<ptr target="https://doi.org/10.1093/database/bav057" />
	</analytic>
	<monogr>
		<title level="j">Database : the journal of biological databases and curation</title>
		<imprint>
			<biblScope unit="page">2015</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The fair guiding principles for scientific data management and stewardship</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Wilkinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dumontier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Aalbersberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Appleton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Axton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Blomberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Boiten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">B</forename><surname>Da Silva Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Bourne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific data</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Chemical structure reconstruction with chemocr</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zimmermann</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>TREC</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
