<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SCAIView { A Semantic Search Engine for Biomedical Research Utilizing a Microservice Architecture</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jens Dorpinghaus</string-name>
          <email>jens.doerpinghaus@scai.fraunhofer.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jurgen Klein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Darms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumit Madan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Jacobs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Algorithms and Scienti c Computing</institution>
          ,
          <addr-line>Schloss Birlinghoven, Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Biological and medical researchers explore the mechanisms of living organisms and tend to gain a better understanding of underlying fundamental biological processes of life. To tackle such complex tasks they constantly need to gather and accumulate new knowledge by performing experiments and studying scienti c literature. We will present the novel semantic search engine "SCAIView" for knowledge discovery and retrieval and, additionally, discuss the most recent paradigm shifts in communication technologies, which leads to a completely new architecture that improves scalability, achieves better interoperability, and also increases fault-tolerance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Biological and medical researchers are interested in exploring the mechanisms of
living organisms and gaining a better understanding of underlying fundamental
biological processes of life. To tackle such complex tasks they constantly gather
and accumulate new knowledge by performing experiments, and also studying
scienti c literature that includes results of further experiments performed by
researchers. Existing solutions are mainly based on the methods of biomedical
text mining to extract key information from unstructured biomedical text (such
as publications, patents, and electronic health records).</p>
      <p>
        Especially in the eld of biomedical sciences, we have a long history of
developing applications that solve the above mentioned tasks. For instance,
SCAIView3 is an information retrieval system that allows semantic searches
in large textual collections by combining free text searches with the ontological
representations of automatic recognized biological entities (see Hodapp et al.
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). SCAIView was used in many recent research projects, for example
regarding neurodegenerative diseases [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or brain imaging features [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore, it
was also used for document classi cation and clustering [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Another important
      </p>
      <sec id="sec-1-1">
        <title>3 https://www.scaiview.com/ (an academia</title>
        <p>
          http://academia.scaiview.com/academia/)
version is freely
available
at
real-world task is the creation of biological knowledge graphs that is tackled by
the BELIEF environment [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It assists researchers during the curation process
by providing relationships extracted by automatic text mining solutions and
represented in a human-readable form [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. At the core of both technologies several
implementations of the methods of biomedical text mining are in place.
        </p>
        <p>In this poster we will present the recent development of SCAIView, and how
SCAIView (as well as BELIEF) evolved using the same core technologies to an
interoperable software system.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>SCAIView architecture</title>
      <p>
        To keep up with the state-of-the-art technologies and to be prepared for
integration of novel and game-changing developments, we migrated the SCAIView
ecosystem from a large monolith to microservice-based system. It allows us to
reuse parts for di erent purposes and the data itself can be easily processed,
shared and accessed. Additionally, the new system also allows us to focus on
FAIR (Findable, Accessible, Interoperable, and Reusable) principles, introduced
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], that are becoming a standard in the biological scienti c community.
      </p>
      <p>
        The microservice infrastructure of SCAIView is an ecosystem of three main
services: Core, API, and Indexer (Figure 1), which communicate through the
message broker (Apache ActiveMQ). The core ful lls various important tasks
to persist, retrieve, and process data. Beside further text mining microservices,
there are also specialized microservices such as BEL Commons Professional,
which allows to validate text-mined biological entities and relationships, that are
shared by BELIEF and SCAIView ecosystems. SCAIView's user interface itself is
a web-based microservice application running on Apache Tomcat communicating
via REST-API calls with the backend. The visualization of the document corpus
includes document elements that are stored and represented as semantic digital
assets (SDA) (Jacobs et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). The SDA represent various semantically-enriched
domain models that can be binary data like images or plain-text such as natural
language. The corpus itself is pre-processed and stored in a document store.
      </p>
      <p>
        The Document Store is based on Apache Accumulo and Apache Solr. The
rst one is used to persist raw results of the text mining pipelines. This allows us
to compare and validate the development of old and new text mining components
really fast, which is necessary in the research area. The latter one contains SDAs
such as the document text, recognized semantic concepts, and further metadata
that is needed for fast retrieval. SCAIView can also handle multiple text
mining and knowledge discovery pipelines by communicating through the message
broker. Common steps are the usage of a DocumentDecomposer, Lemmatizer,
JProMiner for named entity recognition. Other text processing components, such
as UIMA Ruta-based components (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) or ChemoCR (see [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]) can be used
on demand and be easily integrated into processing pipelines.
      </p>
      <p>
        Search queries and knowledge discovery in SCAIView is linked to ontology
and terminology data. Semantic searches are a combination of free text search
and entities represented in ontologies or terminologies. For instance, SCAIView
includes Alzheimer's Disease Ontology (ADO), BioMarker terminology, drug
names, the Hypothesis Finder and many more. These resources are displayed
in a tree format and can be used to make detailed, faceted search queries and
to perform statistical analysis on the retrieved document corpus. The access to
these resources is provided by our internal-hosted OLS service (Ontology Lookup
Service [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) and the upcoming TeMOwl (Terminology Management based on
OWL) service.
      </p>
      <p>
        In general, SCAIView is developed to handle any kind of document corpus but
currently we focus on the biomedical research area. Therefore, as input we use
databases such as PubMed 2017 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that contains around 27 million abstracts and
PMC 20174 that includes around 2 million biomedical-related full-text articles.
Following [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] the processing of huge data is not only possible, but also
very e cient and the microservice infrastructure is highly scalable.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>Although several risks and problems have to be faced, we are sure that
positive advantages of implementation of a microservice system do outweigh. For
both applications, SCAIView as well as BELIEF, several microservices are used
and shared for purpose of data retrieval, data persistence, and text mining.
The latter are classical microservices, whereas the retrieval and persistence
services are more general microservices. Additionally, the microservices in the data
layer can also be traditional webservices such as the terminology management
or authentication systems. We bene t from a highly scalable and fault-tolerant
environment for data processing. Furthermore, the system is exible enough to
easily add or remove microservices from the processing pipeline. The continuous</p>
      <sec id="sec-3-1">
        <title>4 https://www.ncbi.nlm.nih.gov/pmc/</title>
        <p>delivery process for externally-developed software like OLS or Keycloak is not
an issue anymore. An additional bene t is the safe and fast switching from one
technology to another: TeMOWl and OLS can be used at the same time for
multiple instances of SCAIView.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Coordinators</surname>
            ,
            <given-names>N.R.</given-names>
          </string-name>
          :
          <article-title>Database resources of the national center for biotechnology information</article-title>
          .
          <source>Nucleic acids research</source>
          45(Database issue),
          <source>D12</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Co^te, R.G.,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martens</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apweiler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hermjakob</surname>
          </string-name>
          , H.:
          <article-title>The ontology lookup service: more data and better tools for controlled vocabulary queries</article-title>
          .
          <source>Nucleic acids research 36(suppl 2)</source>
          ,
          <source>W372{W376</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Dorpinghaus, J.,
          <string-name>
            <surname>Schaaf</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fluck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Document clustering using a graph covering with pseudostable sets</article-title>
          .
          <source>In: Computer Science and Information Systems (FedCSIS)</source>
          , 2017 Federated Conference on. pp.
          <volume>329</volume>
          {
          <fpage>338</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Emon</surname>
            ,
            <given-names>M.A.E.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karki</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Younesi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann-Apitius</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Using drugs as molecular probes: A computational chemical biology approach in neurodegenerative diseases</article-title>
          .
          <source>Journal of Alzheimer's Disease</source>
          <volume>56</volume>
          (
          <issue>2</issue>
          ),
          <volume>677</volume>
          {
          <fpage>686</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hodapp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fluck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Integration of UIMA Text Mining Components into an Event-based Asynchronous Microservice Architecture</article-title>
          .
          <source>In: Proceedings of the LREC 2016 Workshop "Cross-Platform Text Mining and Natural Language Processing Interoperability"</source>
          . pp.
          <volume>19</volume>
          {
          <fpage>23</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          (ELRA), Portoroz, Slovenia (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Iyappan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Younesi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrooman</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khanna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frisoni</surname>
            ,
            <given-names>G.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann-Apitius</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Neuroimaging feature terminology: A controlled terminology for the annotation of brain imaging features</article-title>
          .
          <source>Journal of Alzheimer's Disease</source>
          <volume>59</volume>
          (
          <issue>4</issue>
          ),
          <volume>1153</volume>
          {
          <fpage>1169</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hodapp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Dorpinghaus, J.: SDA:
          <article-title>Towards a novel Knowledge Discovery Model for Information Systems</article-title>
          .
          <source>In: Proceedings of the 11th IADIS International Conference Information Systems 2018</source>
          . pp.
          <volume>300</volume>
          {
          <fpage>302</fpage>
          .
          <string-name>
            <surname>IADIS</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kluegl</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toepfer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>P.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puppe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Uima ruta: Rapid development of rule-based information extraction applications</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>22</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>40</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Madan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hodapp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senger</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ansari</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szostak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoeng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peitsch</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fluck</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The BEL information extraction work ow (BELIEF): evaluation in the BioCreative V BEL and IAT track</article-title>
          .
          <source>Database</source>
          <year>2016</year>
          , baw136 (oct
          <year>2016</year>
          ). https://doi.org/10.1093/database/baw136, http://database.oxfordjournals.org/lookup/doi/10.1093/database/baw136
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Szostak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ansari</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fluck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talikka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iskandar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Leon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann-Apitius</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peitsch</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoeng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Construction of biological networks from unstructured information based on a semi-automated curation workow</article-title>
          .
          <source>Database : the journal of biological databases and curation</source>
          <year>2015</year>
          (
          <year>2015</year>
          ). https://doi.org/10.1093/database/bav057
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalbersberg</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appleton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boiten</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          , da Silva Santos,
          <string-name>
            <given-names>L.B.</given-names>
            ,
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.E.</surname>
          </string-name>
          , et al.:
          <article-title>The fair guiding principles for scienti c data management and stewardship</article-title>
          .
          <source>Scienti c data 3</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Chemical structure reconstruction with chemocr</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>