<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Search Ontology Generator (SONG): Ontology-based Specification and Generation of Search Queries</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>, Robin SEIDEL</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>, Wolfram BARTUSSEK</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Christoph GOLLER</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Federal Institute for Drugs and Medical Devices (BfArM)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Heinrich HERRE</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of</institution>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The observation of medical devices during Post-Market Surveillance (PMS) for identifying safety-relevant incidents is not trivial. A wide range of sources has to be monitored in order to integrate all accessible data about the safety and performance of a medical device. PMS needs to be supported by a clever search strategy and the possibility to create complex search queries by domain experts. Ontologies can support the specification of search queries and can aid the preparation of the document corpus, which contains all relevant documents. In this paper we present the new version of the Search Ontology (SO2), the Excel template based specification of search queries and the Search Ontology Generator (SONG), which is useful for the generation of very complex queries out of the Excel-based specification. Based on our approach a service-oriented architecture was designed, which is able to support domain experts during PMS.</p>
      </abstract>
      <kwd-group>
        <kwd>b novineon Healthcare Technology Partners GmbH</kwd>
        <kwd>Germany</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>According to the provisions of the current European</title>
      </sec>
      <sec id="sec-1-2">
        <title>Medical Device</title>
        <p>
          Directive
93/42/EEC [
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ] and the new European Device Regulation (coming into effect in May
2020) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], each manufacturer of medical devices has to set up a comprehensive system
in order to identify, evaluate and integrate clinical data derived from the field application
of a medical device after market access during Post-Market Surveillance (PMS). Small
and medium-size enterprises in the field of medical devices are in need for operable
systems for post-market data retrieval in order to enhance their PMS strategies and to be
prepared for the growing requirements of the new European Medical Device Regulation.
        </p>
        <p>A wide range of both internal (own quality management and compliant system) and
external (scientific databases, medical congresses, internet-based knowledge &amp;
experiences, PMS by competent authorities) sources have to be monitored in order to
integrate all accessible data about a medical device’s safety and performance. Currently,
these detailed, continuous searches are still performed manually with a high input of time
and personnel resources, making PMS a daunting task.</p>
        <p>Literally, an employee has to type all search queries (e.g., “safety AND coronary
stent”) into the input fields of a large number of different databases, congress sites and
public search engines in order to gain a broad, unspecific hit list, interspersed with single,
relevant content. This strategy of data retrieval reaches its limits assuming that several
search strings have to be applied in order to monitor a whole range of medical devices,
each featured by a variety of decisive search questions.</p>
        <p>Additionally, each notable medical database uses its own, inherent syntax to specify
search queries. This incompatibility between databases and the amount of different
search tasks combined with the manual application to a multitude of databases results in
low efficiency and a high potential for human error.</p>
        <p>Setting up more complex queries in a simple manner by domain experts enables the
definition of the topic of interest in a more specific way and circumvents the problem of
retrieving irrelevant content.</p>
        <p>
          In the OntoVigilance project (predecessor project of OntoPMS) we developed the
Search Ontology (SO) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], a promising approach for an ontology-based specification of
complex search queries. The modular architecture of the SO enables the re-use of
ontology parts in different use cases as well as a quick and easy adaptation and extension
of the ontology according to the specific requirements.
        </p>
        <p>The developed SO and its application was evaluated in the OntoVigilance project
by domain experts. The SO supports the sets up of a PMS strategy in order to retrieve
internet-based information/clinical data on safety and performance of an exemplary
medical device, e.g., an endoscopic clipping system. The SO-based, continuous data
retrieval was compared to a conventional periodic, manual data search with good success.
Relevant content was identified by SO-based data retrieval in a more convenient and
comprehensive way.</p>
        <p>
          This previous study has proven that the SO is a suitable method for modeling
complex queries, but it should be optimized to face all requirements of domain experts.
On the one hand, the structure of the SO can be simplified in order to improve the
usability by domain experts. On the other hand, certain extensions are necessary to model
all relevant query types. Based on these findings, we further developed and optimized
the SO in the OntoPMS project. The improved version of SO is called SO2 in this paper.
The SO2 is generic and can be used in any domain. For the application of the SO2 in a
particular domain, it has to be extended by a domain ontology (in this paper a domain
ontology for the PMS domain), so that the classes of the domain ontology are subclasses
of the SO2 classes. We call such domain ontologies Domain-specific Search Ontologies
(DSO). In addition, we developed an Excel template to specify the information required
to create a DSO, which significantly simplifies the ontology development by domain
experts. For the automatic generation of a DSO from the Excel-based specification, we
implemented the Search Ontology Generator (SONG). In contrast to the
OntoQueryBuilder (OQB) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the SONG generates the complete DSO, including all
specified queries in the correct query syntax from the Excel template and provides it for
external tools (e.g., the search engine). In this way, the search engine can get the
complete DSO by accessing the SONG service without any requests or generating
queries at search time.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>Focus of this paper is the improvement of the SO and development of the SONG (see
Results) to support the search pipeline at various points:
1. Creating a corpus. The SONG provides special queries for building a corpus of relevant
documents for a domain of interest.
2. Searching for relevant documents. The SONG generates the DSO, including complex
search queries modeled by domain experts. The DSO can be displayed on the GUI of the
search engine; the queries are selected by the user and are executed by the search engine.
3. Classifying the search results. The SONG queries can also be used to classify the search
results.</p>
      <p>For a better understanding of our approach, this section introduces possible applications
of the SONG service (Figure 1).
The DSO Developer specifies the DSO in Excel or optimizes the generated OWL version. The SONG Manager
uploads/downloads files and tests the service. The SONG Service is responsible for the generation of OWL
and JSON from Excel, as well as JSON from OWL and offers different methods for possible applications (see
SONG section). The Searcher uses the Search Engine GUI, which represents the DSO, and can select desired
concepts or queries for the execution by the search engine.</p>
      <sec id="sec-2-1">
        <title>2.1. Search Engine</title>
        <p>The SONG can be used with any Lucene-based search engine for the generation of
queries in the Lucene query syntax out of the Excel-based specification. In addition, new
expressive query operators were implemented in the OntoPMS project, which
significantly extend the Lucene query syntax.</p>
        <p>
          To identify risks or complications (for PMS) in unstructured documents, very
complex patterns have to be detected. Such patterns go beyond the standard capabilities
of state-of-the-art search engines like Elasticsearch [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Therefore, we extended these
capabilities by creating our own search plugin, providing the required functionalities and
improving search quality. The extension was realized as an Elasticsearch plugin and
contains, among others, the following additional features: improved tokenization,
lemmatization and word decomposition; build-in support for several normal forms / term
types; improved quality for ambiguous searches; named entity, date and measurement
recognition; additional search modes; NEAR operators.
        </p>
        <p>In particular, the search modes and new search operators are extensively used in our
DSO to produce more precise queries. The search modes correspond to the different
types of terms (exact[E], diacritics-normalized[D], lemmatized baseforms[B],
compounds-parts[C]) that we use in our index. For example, with the query
“MODE/E(SafeSet)” we only search for SafeSet with two upper case S. Words within a
NEAR/n query must not have a token-distance greater than n. With NEAR/S and
NEAR/P words must occur within one sentence or one paragraph. These new
NEAROperators can be combined and nested with other queries in an arbitrary way.</p>
        <p>The search engine GUI accesses the SONG service und receives the DSO (getDSO).
The search concepts and multiple concept queries are displayed on the GUI as a tree with
checkable nodes. The user can select desired concepts or multiple concept queries by
checking the corresponding checkboxes. After pressing the submit button, selected
queries are transmitted to the search engine service and the received search results are
displayed. Optionally, the user can classify documents (e.g., new incident reports, see
Classification of search results) using the Document Classifier Service. The Document
Classifier Service stepwise sends all single concept queries to the Search Engine Service
and assigns the search results to the corresponding concepts. The user can then click on
a concept in the tree and get the documents classified by the concept.</p>
        <p>Additionally, the GUI supports the creation of new queries respectively query parts
(search terms, search concepts, etc.). For example, for including a new multiple concept
query to the ontology the user has to select the desired concepts and has to press the
“create query” button. Then, the GUI sends this information to the SONG service (using
add methods like addQuery). After that, the SONG adds the new entities to the ontology
und transmits the updated DSO back to the search engine GUI.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Corpus Builder</title>
        <p>PMS requires information from several types of sources including proprietary
manufacturer data, and information from the web. Getting information from the web
requires some knowledge how to look for it, how to access it, and which parts to extract.
Additionally, following the links found on a page identified by a given URL quickly
turned out to be a potential trap, since the pages may be completely out of scope.
Therefore, we concluded that an ontology would help to define the scope.</p>
        <p>For data acquisition, we developed the Corpus Builder. Its input component, the
Prospector, metaphorically speaking, “roams” the internet in order to identify suitable
data to “feed” the OntoPMS Corpus. To achieve this, it uses a special corpus query,
which is part of the DSO. Currently, the Prospector delivers its documents to the NLP
pipeline, which analyses the contents to identify documents that are important to the
respective projects and rejects (i.e. blacklist) documents that shall not be included into
the OntoPMS corpus. This processing is done by a kind of control circuit. The “plant” of
the feedback loop is controlled by a seed list, produced by the prospector and by the
feedback component. It does the crawling and gathering of new URLs by following
forward and backward links and reading the contents identified by the URLs. Then, the
output is checked by the “sensor”. The sensor is controlled by the corpus query and
allows a deep analysis of the content. Then, questionable content is fed back to a splitter
component which sorts out garbage (blacklist) and boosts domains with a high amount
of documents we want to include (whitelist). URLs, based on those white listed domains
are then included if not yet part of the seed list. If the output contains unwanted
documents, we have to improve the corpus query. Hence, we have a semi-supervised
learning component, where the manual part of supervision is made on the abstract level
of ontologies. This enables us to change the behavior of the corpus builder without
changing the software.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Classification of search results</title>
        <p>
          From the regulatory perspective of the German competent authority for risk assessment
for medical devices, the Federal Institute for Drugs and Medical Devices (BfArM), there
is an increasing demand to have the risk assessment process of critical incidents
supported by an intelligent IT solution. With respect to the exponential increase in
reported incidents with medical devices in Germany, specific DSOs for different aspects
of the incident must be developed. These specific DSOs will allow the identification of
similar incidents and thereby an automatic recommendation of classification of new
incidents will become possible. The aspects currently in focus include DSOs for the
resulting health problem, device problem, root cause and components, all of which are
at present being developed on the basis of the FDA coding system [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] by the IMDRF
working group on Adverse Event Terminology and Coding [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Using the SONG
approach allows domain experts to easily create and modify the subsequent search
concepts for each of the IMDRF Codes. Our approach accelerates the risk classification,
which in turn allows to create individual views of device specific problems as well as to
monitor the performance of different manufacturers within certain devices groups such
as hip implants, cardiac pacemakers or insulin pumps.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Improved Search Ontology (SO2)</title>
        <p>This section presents the optimized Search Ontology (SO2), which contains some
improvements compared to the SO. One of the advantages of the new version is that the
SO2-based DSOs can be specified in a specially developed Excel template rather than
with an ontology editor. The Excel template allows only the specification of such query
types, which have been determined as relevant in the above-mentioned study, while in
the previous version any combination and nesting of the query parts was allowed. In
addition, the keywords for the search (search terms) had to be defined as instances of the
search term classes (with different labels) and linked to the search concepts using the
property restrictions (based on the property described_by). In the new version, the search
terms are directly associated with search concepts as annotations. Both aspects simplify
the structure of the ontology. Other extensions include negated concepts and direct
storage of queries in the ontology.</p>
        <p>
          The SO2 models three types of entities: search concepts, search terms, and search
queries (Figure 2). The search concepts are concepts (in the sense of General Formal
Ontology, GFO [
          <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
          ]), whose descriptions or designations have to be found in texts. The
other two entities are symbolic structures (gfo:Symbolic_Structure) and serve to model
single keywords or phrases of the concept description as well as queries.
        </p>
        <p>Search Terms. The search terms are descriptions or designations of search concepts.
A distinction is made between simple and composite terms. The simple terms are either
single words (e.g., “clip”, “Nitinol”) or fixed (defined by the user) phrases (e.g.,
“endoscopic clipping system”). The composite terms are combinations (has_part) of
simple terms of two search concepts (has_terms_of_concept_as_part). They are defined
by the user by choosing the two concepts (e.g., Unexpected and Complication) and are
generated by the generator as an AND-connection of the OR-linked simple terms of the
selected concepts (e.g., "(unexpected OR unforeseeable OR unknown) AND
(complication OR failure OR incident)").</p>
        <p>Search Concepts. The search concepts are described or designated by search terms
(described_by, simple_term, composite_term). We distinguish between standard (e.g.,
Complication) and negated (e.g., No_Preclinical) concepts. While the terms of the
standard concepts (e.g., “complication”, “failure”, “incident”) have to be contained in
the resulting documents, the terms of the negated concepts (e.g., “animal”, “study”,
“preclinical”) have to be excluded/negated. Each concept is additionally associated with
a single concept query (see Search Queries), which is used for the search for descriptions
of the concept.</p>
        <p>Search Queries. The single concept query is an OR-connection of all terms (simple
and composite) for a standard concept or a negated OR-connection of all terms for a
negated concept. The query can additionally contain specified search operators and
brackets. The multiple concept query is an AND-connection of single concept queries of
selected concepts (has_query_of_concept_as_part). The query is specified by the user
through a selection of desired concepts and generated by the generator.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Specification of the DSO</title>
        <p>Market access of a medical device is granted after passing an extensive series of tests,
risks analyses and evaluations of clinical data on the medical device’s safety and efficacy.
Nevertheless, the behavior of a medical device over time in broad application can be
investigated a priori only in a limited manner. Thus, PMS strategies are set up in order
to summarize application data of, e.g., medical implants in order to identify residual risks.
For example, reports on unfavorable interactions between implant material and patient’s
tissue are identified and evaluated in order to a) control residual and/or unexpected risk,
b) to identify vulnerable patient subpopulations and c) to improve the respective medical
implant or material, respectively.</p>
        <p>For a better understanding, a practically relevant search query was specified (as part
of the DSO for the PMS domain) focusing on unexpected side effects of the metal alloy
Nitinol used for construction of endoscopic clipping systems. The search query was
constructed by defining concepts covering the different aspects of the PMS question like
“unexpected complication”, “type of medical device” (endoscopic clipping system) and
“used material” (Nitinol), as well as the associated search terms within the easy
applicable Excel template. Figure 3 illustrates this part of the DSO by screenshots of the
several sheets.</p>
        <p>
          Our developed Excel template consists, on the one hand, of the predefined data
sheets (Negated_Concept, Composite_Term and Multiple_Concept_Query) and, on the
other hand, of the user-defined sheets (facet sheets) for the specification and
classification of the search concepts and simple terms. We consider the facets, similarly
to the Colon Classification [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], as multiple dimensions for the categorization of the
Universe of Subjects. From an ontological point of view, these facets are top-level
categories that have a defined subclass hierarchy. Our template represents these top-level
categories by different facet sheets, which are evolving through time and which are freely
definable by the domain expert. For the observation of medical products during PMS,
we introduced among others the following facets: Medical_Device, Medical_Area,
Medical_Problem, Incident, Material and Risk. In our example, the different facets of
the search are represented by the Excel sheets Material, Medical_Device and Incident.
In every such facet a subclass structure represents a categorization of the knowledge
within the appropriate area. For instance, we subdivided medical devices in clip, stent,
occluder and implant; moreover, these devices categories can be subdivided into special
types, e.g., endoscopic clipping system, PFO occluder, PDA occluder, and so on. Next
to the nodes of the subclass hierarchy, the domain expert can enter simple terms; two
columns are used for enabling the separation of English and German simple terms.
Several types of query operators, like the wildcard or boost (e.g., incident^5) can be
applied to simple terms in order to refine the search query.
        </p>
        <p>For excluding documents that contain descriptions of certain concepts (e.g.,
complications in preclinical tests) the Negated_Concept sheet is used. The concept is
specified (e.g., No_Preclinical) and described by simple terms to be excluded (e.g.,
“animal”, “study”, “preclinical”). The Composite_Term sheet is used for the
specification of terms based on other concepts. The composite terms for describing the
search concept Unexpected_Complication, are combined from the simple terms of
Unexpected (part 1) and Complication (part 2). For the creation of complex (multiple
concept) queries, which are based on the conjunction of queries of multiple search
concepts (e.g., Unexpected_Complication, Nitinol, Endoscopic_Clipping_System and
No_Preclinical), the Multiple_Concept_Query sheet is used. The specification of the
improved MODE or NEAR (e.g., NEAR/S) operators is possible for both, composite
terms and multiple concept queries.</p>
        <p>Out of these spreadsheet-based specifications, it is possible to generate the DSO in
different formats like OWL or JSON using the SONG.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Search Ontology Generator (SONG)</title>
        <p>The SONG consists of a web service that provides various methods for the external tools
and a simple web app (SONG Config App) that is used to upload/download the files as
well as for testing the service (Figure 1).</p>
        <p>The SONG service generates the ontology in OWL and JSON format out of the
Excel-based specification, allows adding new entities to the ontology (add methods like
addSimpleTerms or addQuery), and provides the generated DSO for external tools,
especially for search apps (getDSO). The OWL format is used for a possible optimization
of the generated ontology with an ontology editor or for an integration of external
ontologies, while JSON is utilized for communication with external tools. The SONG
manages the generation of OWL and JSON from Excel, as well as JSON from OWL.
After each file upload (Excel or OWL) or after an adding of a new entity, the new
ontology (OWL and JSON) is generated.</p>
        <p>By the generation of the ontology, the model presented in Figure 2 is not applied
one-to-one, but rather simplified. For example, simple terms and single concept queries
are defined as annotations of the search concept classes.</p>
        <p>Firstly, the SONG generates the class hierarchy. The search concept trees from the
user-defined sheets (facet sheets) are placed under Search_Concept and the negated
concept tree under Negated_Concept. Next, the simple terms are linked to their concepts
using the annotation property simple_term.</p>
        <p>The composite term classes are generated as subclasses of Composite_Term. The
annotation properties has_term_of_concept_as_part_1 and
has_term_of_concept_as_part_2 (shortly: part_1 and part_2) are used to specify the two
search concepts whose simple terms have to be put together. In addition, the specified
MODE or NEAR operators are generated as annotations. Then, for each composite term
class, the corresponding term is generated as an AND-connection of the OR-linked
simple terms of the selected concepts, and is associated to the composite term class using
the annotation property query. The possibly specified query operators for composite
terms are taken into account in the correct syntax. Then, the composite term classes are
referenced in the search concept classes by the annotation property composite_term.</p>
        <p>After that, the single concept queries of all search concepts are generated as an
ORconnection for standard concepts or a negated OR-connection for negated concepts from
all their terms, and are associated with the respective concept using the annotation
property query. For negated concepts, the standard concepts can also be specified, whose
terms have to be excluded (excluded_concept).</p>
        <p>The multiple concept queries specified on the Multiple_Concept_Query sheet are
generated as subclasses of Multiple_Concept_Query. Then, the multiple concept query
classes are associated with the concepts whose single concept queries are to be combined
using the annotation property has_query_of_concept_as_part (shortly: search_concept).
Then, the single concept queries are AND-linked and stored using the annotation
property query. Similar to composite terms, the possibly specified operators for queries
are taken into account during the generation process.</p>
        <p>In the Figure 4, some parts of the generated DSO are illustrated. The upper part
shows the search concept Unexpected_Complication, which are described by composite
term Unexpected__Complication (with two underscore characters). In the lower left
corner the negated concept (No_Preclinical) is described, which is used for the exclusion
of several simple terms. In the right lower corner, the complete (multiple concept) query
is illustrated, which consists of different search concepts. In the annotation property
query is the generated query, which can be executed by a search engine.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Related work</title>
      <sec id="sec-4-1">
        <title>4.1. Ontology-based information retrieval</title>
        <p>
          Since finding meaningful and intelligent information is difficult, there are different
ontology information retrieval techniques methods available [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In the wide world of
semantic searches the approach of this paper can be classified as Research Search [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ],
because we denoted search queries by concepts. Semantic searches are usually executed
not on plain documents but on ontologies, which requires expensive manual annotation
or natural language processing steps (NLP) for extracting semantic data out of the
documents. After that step the information of the documents is stored in a semantic
knowledge base [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or in a semantically enriched enhanced document index [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], on
which semantic searches can be applied by using semantic retrieval languages like
SPARQL [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] or SeRQL [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The early TAMBIS project [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] provides a foresighted
semantic search approach for accessing multiple bioinformatics databases, using a
complex biological concept model for query formulation. Despite of semantic
knowledge bases or structured data sources the approach of this paper builds up on
indexed documents which can be retrieved by complex Boolean expressions, which are
difficult to construct [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Using ontologies as navigation tree structure in form of a
Concept-based Information Retrieval Interface (CIRI) seems to be more effective than a
direct interface (input field) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          GoPubMed [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] uses the Gene Ontology for search on PubMed. In contrast to our
approach, the user is not able to increase the precision of the search by simply developing
and using his own DSO, exactly tailored to his needs.
        </p>
        <p>
          Textpresso [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] is a text-mining system for scientific literature. It implements
categories of terms (an ontology) which can be used for a search on a database of articles.
Regular expressions have to be created for each category to match the corresponding
terms in the text and the documents have to be labeled according to the lexicon of the
ontology. The documents are then indexed with respect to labels and words. Our solution
does not use any in the ontology contained information for pre-processing or indexing
the documents. The ontology is constantly under development and is adapted by the
domain experts to meet their current needs. Our approach does not require any additional
pre-processing steps (e.g., labeling) as well as re-indexing the document collection when
ontology changes.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Excel-based ontology development</title>
        <p>
          Since ontology engineering is difficult for non-ontologists, there is a need for a rapid and
collaborative ontology engineering methodology and easy to use tools [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The
transformation of spreadsheets in OWL is already used in life science projects [
          <xref ref-type="bibr" rid="ref23 ref24">23,24</xref>
          ],
tools and plugins enable the population of OWL ontologies out of spreadsheet templates
[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. The template we developed differs in that way, that it is not intended for the
ontology development in general based on modelling certain OWL constructs. Instead,
our template is exactly tailored to the SO2-based development of DSOs in order to make
its use by domain exerts as intuitive as possible.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>The specification of complex search queries is a recurring task in different domains. The
search for complications in the usage of medical devices within the Post-Market
Surveillance or the classification of the incident reports are only two examples in this
area.</p>
      <p>We presented the improved Search Ontology (SO2), a promising
domainindependent approach to specify complex search queries. Our solution allows advanced
search for relevant documents in different domains using suitable DSOs and supports an
automatic classification of search results. The second version of the Search Ontology
include enhancements like the inclusion of new search operators, negated concepts and
direct storage of generated queries in the ontology. For easier handling by
nonontologists, we developed an Excel template, which facilitates a SO2-based specification
of DSOs without the usage of ontology editors or knowing the query syntax. A
serviceoriented architecture was introduced; in the core of the architecture stands the Search
Ontology Generator (SONG), which provides methods for an access by search engines
as well as DSO administration methods. By the enhancement of the SO, the
spreadsheetbased specification and the service-oriented architecture, we improved the access to the
Search Ontology for domain experts and external tools.</p>
      <p>The service-oriented architecture of the SONG is already intended for the access of
the DSO by external tools. The future work includes the further development of the
described software components like search engine GUI and Document Classifier Service.
After the definition of a sufficiently extensive knowledge base in form of DSOs,
ontology learning can be exploited for supporting a semi-automatic query creation. In
addition, applications will be developed to modify external terminologies/ontologies for
their usage in or as DSOs.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This work has been funded by the German Federal Ministry of Education and Research
(BMBF) in the “KMU-innovativ” funding program as part of the “OntoPMS” project
(reference number 01IS15056B).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] Council Directive</source>
          <volume>93</volume>
          /42/EEC of 14 June 1993 concerning medical devices, OJ. L (
          <year>1993</year>
          )
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[2] Council Directive</source>
          <volume>90</volume>
          /385/EEC of 20
          <article-title>June 1990 on the approximation of the laws of the Member States relating to active implantable medical devices</article-title>
          , OJ. L (
          <year>1990</year>
          )
          <fpage>17</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Regulation</surname>
          </string-name>
          (EU)
          <year>2017</year>
          /
          <article-title>745 of the European Parliament and of the Council of 5 April 2017 on medical devices, OJ</article-title>
          . L (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Uciteli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Burek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Siemoleit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Galanzina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weiland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Drechsler-Hake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bartussek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Herre</surname>
          </string-name>
          , Search Ontology,
          <article-title>a new approach towards Semantic Search</article-title>
          , in: E.
          <string-name>
            <surname>Plödereder</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Grunske</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ull</surname>
          </string-name>
          (Eds.),
          <source>FoRESEE: Future Search Engines</source>
          <year>2014</year>
          -
          <volume>44</volume>
          .
          <article-title>Annual Meeting of the GI</article-title>
          ,
          <string-name>
            <surname>Stuttgart - GI Edition Proceedings</surname>
            <given-names>LNI</given-names>
          </string-name>
          , Köllen, Bonn,
          <year>2014</year>
          : pp.
          <fpage>667</fpage>
          -
          <lpage>672</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Elasticsearch</surname>
          </string-name>
          , (n.d.). https://www.elastic.co/products/elasticsearch.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>IMDRF</given-names>
            <surname>Presentation</surname>
          </string-name>
          <article-title>: Adverse Event Terminology</article-title>
          and Coding Working Group, (
          <year>2017</year>
          ). http://www.imdrf.org/docs/imdrf/final/meetings/imdrf-meet-170314
          <string-name>
            <surname>-</surname>
          </string-name>
          canada
          <article-title>-presentation-wg-aet</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>MEDWATCH</given-names>
            <surname>Medical Device Reporting Code</surname>
          </string-name>
          <string-name>
            <surname>Instructions</surname>
          </string-name>
          , (n.d.). https://www.fda.gov/MedicalDevices/ucm106737.htm.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Herre</surname>
          </string-name>
          ,
          <article-title>General Formal Ontology (GFO): A Foundational Ontology for Conceptual Modelling</article-title>
          , in: R. Poli,
          <string-name>
            <given-names>M.</given-names>
            <surname>Healy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Kameas</surname>
          </string-name>
          (Eds.),
          <source>Theory and Applications of Ontology: Computer Applications</source>
          , Springer, Netherlands,
          <year>2010</year>
          : pp.
          <fpage>297</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Herre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Heller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Burek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoehndorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Michalek</surname>
          </string-name>
          ,
          <article-title>General Formal Ontology (GFO): A Foundational Ontology Integrating Objects and Processes</article-title>
          .
          <source>Part I: Basic Principles (Version 1.0)</source>
          , Research Group Ontologies in
          <source>Medicine (Onto-Med)</source>
          , University of Leipzig,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.R.</given-names>
            <surname>Ranganathan</surname>
          </string-name>
          , Colon Classification, Ed.
          <volume>7</volume>
          (
          <year>1971</year>
          )
          <article-title>: A Preview, Sarada Ranganathan Endowment for Library Science</article-title>
          ,
          <year>1969</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.K.</given-names>
            <surname>Sahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.P.</given-names>
            <surname>Mahapatra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.C.</given-names>
            <surname>Balabantaray</surname>
          </string-name>
          ,
          <article-title>Analytical study on intelligent information retrieval system using semantic network</article-title>
          ,
          <source>in: 2016 International Conference on Computing, Communication and Automation (ICCCA)</source>
          ,
          <year>2016</year>
          : pp.
          <fpage>704</fpage>
          -
          <lpage>710</lpage>
          . doi:
          <volume>10</volume>
          .1109/CCAA.
          <year>2016</year>
          .
          <volume>7813813</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McCool</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Miller</surname>
          </string-name>
          , Semantic Search,
          <source>in: Proceedings of the 12th International Conference on World Wide Web, ACM</source>
          , New York, NY, USA,
          <year>2003</year>
          : pp.
          <fpage>700</fpage>
          -
          <lpage>709</lpage>
          . doi:
          <volume>10</volume>
          .1145/775152.775250.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vallet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <article-title>An Ontology-Based Information Retrieval Model</article-title>
          ,
          <source>in: The Semantic Web: Research and Applications</source>
          , Springer, Berlin, Heidelberg,
          <year>2005</year>
          : pp.
          <fpage>455</fpage>
          -
          <lpage>470</lpage>
          . https://link.springer.com/chapter/10.1007/11431053_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.B.</given-names>
            <surname>Marchisio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Koperski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tusk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.S.</given-names>
            <surname>Dhillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pochman</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.E. Brown,</surname>
          </string-name>
          <article-title>Method and system for extending keyword searching to syntactically and semantically annotated data</article-title>
          ,
          <source>US7526425 B2</source>
          ,
          <year>2009</year>
          . http://www.google.ch/patents/US7526425.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goudar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <article-title>Domain ontology based semantic search for efficient information retrieval through automatic query expansion</article-title>
          ,
          <source>in: 2013 International Conference on Intelligent Systems and Signal Processing (ISSP)</source>
          ,
          <year>2013</year>
          : pp.
          <fpage>397</fpage>
          -
          <lpage>402</lpage>
          . doi:
          <volume>10</volume>
          .1109/ISSP.
          <year>2013</year>
          .
          <volume>6526942</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Uren</surname>
          </string-name>
          , and E. Motta,
          <article-title>SemSearch: A Search Engine for the Semantic Web</article-title>
          ,
          <source>in: Managing Knowledge in a World of Networks</source>
          , Springer, Berlin, Heidelberg,
          <year>2006</year>
          : pp.
          <fpage>238</fpage>
          -
          <lpage>245</lpage>
          . https://link.springer.com/chapter/10.1007/11891451_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.G.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Paton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          , TAMBIS--Transparent Access to Multiple Bioinformatics Information Sources,
          <source>Proc Int Conf Intell Syst Mol Biol</source>
          .
          <volume>6</volume>
          (
          <year>1998</year>
          )
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.B.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Boolean queries and term dependencies in probabilistic retrieval models</article-title>
          ,
          <source>J. Am. Soc. Inf. Sci</source>
          .
          <volume>37</volume>
          (
          <year>1986</year>
          )
          <fpage>71</fpage>
          -
          <lpage>77</lpage>
          . doi:
          <volume>10</volume>
          .1002/(SICI)
          <fpage>1097</fpage>
          -
          <lpage>4571</lpage>
          (
          <issue>198603</issue>
          )37:
          <fpage>2</fpage>
          &lt;
          <fpage>71</fpage>
          :
          <article-title>:AID-ASI3&gt;3.0</article-title>
          .CO;
          <fpage>2</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suomela</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          ,
          <article-title>Ontology as a Search-Tool: A Study of Real Users' Query Formulation With and Without Conceptual Support</article-title>
          , in: Advances in Information Retrieval, Springer, Berlin, Heidelberg,
          <year>2005</year>
          : pp.
          <fpage>315</fpage>
          -
          <lpage>329</lpage>
          . https://link.springer.com/chapter/10.1007/978-3-
          <fpage>540</fpage>
          -31865-1_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doms</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schroeder</surname>
          </string-name>
          ,
          <article-title>GoPubMed: exploring PubMed with the Gene Ontology</article-title>
          ,
          <source>Nucleic Acids Res</source>
          .
          <volume>33</volume>
          (
          <year>2005</year>
          )
          <fpage>W783</fpage>
          -
          <lpage>786</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gki470.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>H.-M. Müller</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          <string-name>
            <surname>Kenny</surname>
            , and
            <given-names>P.W.</given-names>
          </string-name>
          <string-name>
            <surname>Sternberg</surname>
          </string-name>
          ,
          <article-title>Textpresso: an ontology-based information retrieval and extraction system for biological literature</article-title>
          ,
          <source>PLoS Biol</source>
          .
          <volume>2</volume>
          (
          <year>2004</year>
          )
          <article-title>e309</article-title>
          . doi:
          <volume>10</volume>
          .1371/journal.pbio.
          <volume>0020309</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>A. De Nicola</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Missikoff</surname>
          </string-name>
          ,
          <article-title>A Lightweight Methodology for Rapid Ontology Engineering, Commun</article-title>
          . ACM.
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . doi:
          <volume>10</volume>
          .1145/2818359.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schaaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kücherer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Paech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Herre</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <article-title>An Approach to Support Collaborative Ontology Construction, Stud Health Technol Inform</article-title>
          .
          <volume>228</volume>
          (
          <year>2016</year>
          )
          <fpage>369</fpage>
          -
          <lpage>373</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blfgeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Warrender</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.M.U. Hilkens</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Lord</surname>
          </string-name>
          ,
          <article-title>A document-centric approach for developing the tolAPC Ontology</article-title>
          ,
          <source>in: Proceedings of the 7th Workshop on Ontologies and Data in Life Sciences, CEUR Workshop Proceedings</source>
          , Halle (Saale), Germany,
          <year>2016</year>
          : pp.
          <source>B.1-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jupp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iannone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schanstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Populous: a tool for building OWL ontologies from templates</article-title>
          ,
          <source>BMC Bioinformatics</source>
          .
          <volume>13</volume>
          (
          <year>2012</year>
          )
          <article-title>S5</article-title>
          . doi:
          <volume>10</volume>
          .1186/
          <fpage>1471</fpage>
          -2105-13-S1-S5.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>