<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Matching for Patent Classication</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Quix</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandra Geisler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rihan Hai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanchit Alekh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Databases and Information Systems, RWTH Aachen University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer-Institute for Applied Information Technology FIT</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Interdisciplinary research and development projects in medical engineering bene t from well selected collaboration partners. The process of nding such partners from often unfamiliar elds is di cult, but can be supported by an expert pro le that is based on patent analysis and classifying the patents to competence elds in medical engineering. Patent analysis and categorization are di cult and require the analysis of the semantic content. Hence, we propose a twofold approach using a large controlled vocabulary, a smaller competence eld ontology, and an alignment between them to assign patents to a certain competence eld. The approach has two parts: a Topic Map approach and a Publication approach. We evaluate these approaches and its components in several ways. Furthermore, we compare four di erent ways to assign a patent to a competence eld and show that the semantic wealth of a large biomedical ontology is bene cial to the classi cation task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Ontology matching has been an active research area for more than 10 years
[
        <xref ref-type="bibr" rid="ref17 ref18">17,18</xref>
        ]. Ontologies are used to describe a domain of interest by concepts and
relationships between them, and to provide a formal description of these
relationships. Thus, although the aim of ontology matching seems to be the matching
of classes and properties, usually its actual intention is to match elements of the
domains described by the ontologies . An example for such a ‘domain matching’
task is patent classication in which patents should be assigned to a class in a
classication [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        While a classication scheme or taxonomy can be easily represented as an
ontology, representing the content of a patent as an ontology or describing the
patent with elements of an ontology is more challenging. Patents have their own
specic language and use a terminology that is dierent from a typical research
publication. Patents are classied using the International Patent Classication
(IPC) system; however, this is too general for a detailed patent analysis [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
On the other hand, patent data is available in a structured form (usually XML)
from patent oces, which simplies the pre-processing and extraction of basic
information such as title, abstract, and authors. Furthermore, they are often
also available in multiple languages; at least, the bibliographic information and
abstract is available in English, which solves the problem of multi-lingual
documents.
      </p>
      <p>
        We are aiming at building a recommender system for research projects in
medical engineering (ME) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in the context of the mi-Mappa project 3. In ME
researchers from several disciplines (e.g., biology, medicine, mechanical
engineering, computer science) work jointly on a research project. Furthermore, ME is a
highly innovative domain with short product cycles requiring a fast translation
of research results into applicable products [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While on the one hand, a
publication list of a researcher provides a good basis for creating an author prole [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
on the other hand a list of patents allows to characterize the ability of a
researcher to develop inventions and market-ready products. Hence, we concentrate
mainly on the analysis of patents.
      </p>
      <p>To address the problem of patent terminology, we exploit explicit references
to scientic publications and their semantic annotations. In ME, most of the
publications appear in journals or conferences that are indexed by PubMed 4.
PubMed uses MeSH5, a rich controlled vocabulary with a hierarchical structure,
to annotate the publications. Thus, to retrieve a MeSH annotation for a
patent, we lookup the references to research articles in PubMed and retrieve the
corresponding MeSH terms.</p>
      <p>Using references to scientic publications is only one aspect in our approach
for patent classication. The overall approach, depicted in Figure 1 consists
of two complementary sub-approaches: the Topic Map Approach (TMA) and
the Publications Approach (PBA). Both approaches utilize two ontologies - a
competence eld (CF) ontology and an ontology with comprehensive medical
knowledge (MeSH) - and an alignment between them.</p>
      <p>
        For the Publication Approach, excerpts of publication databases, as well
as their associated MeSH terms are imported into our Data Lake (DL) system
Constance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The data lake can then be queried on-the-y for publications cited
by the currently processed patent, as well as the MeSH terms that are pertinent
to each of these publications. For the categorization of the input patent with
the TMA, the topic with the highest probability in the topic map (or multiple
topics if they have the same probability) is retrieved. Each term characterizing
the topic is compared with all concepts in the MeSH ontology resulting in a set
of matching concepts.
      </p>
      <p>Thus, for both approaches, we have a list of related concepts from the MeSH
ontology. To establish a link to the competence eld ontology, which we have
created to describe the innovation areas in medical engineering (see section 2),
we use ontology matching.</p>
      <p>There are several questions arising when we analyze the presented approach.
Creating an alignment between ontologies and the use of a huge medical ontology
in this context require a high amount of resources in terms of memory and CPU
power. Hence, we need to know if the eort using it is worth it. Furthermore,</p>
    </sec>
    <sec id="sec-2">
      <title>3 http://www.dbis.rwth-aachen.de/mi-Mappa 4 https://www.ncbi.nlm.nih.gov/pubmed/ 5 Medical Subject Headings, https://www.nlm.nih.gov/mesh/</title>
      <p>Topic Map
Patent</p>
      <p>Data Lake System
Patent Categorization</p>
      <p>Labeled Patent</p>
      <p>MeSH Ontology</p>
      <p>Competence</p>
      <p>Field Ontology</p>
      <p>Alignment
Ontology Matching
Topic Map Approach
Publication Approach
it is of interest if the quality and size of the alignment between the ontologies
have an impact on the results. A special problem is to rate the quality of the
alignment without a reference alignment. To answer these questions we present
the following contributions in this paper:</p>
      <p>We analyze and select medical ontologies to use them as a basis for the
creation of the CF ontology and as a single point of entry to identify the
semantics of patents and publications.</p>
      <p>We describe the process of designing the competence eld ontology and rate
its quality based on approved methodologies.</p>
      <p>We create dierent alignments between the CF ontology and the medical
ontology with dierent matcher congurations and compare their quality.
We compare the results of four dierent approaches to categorize a patent:
(1) Topic Map Approach with direct comparison of terms with concepts of
the CF ontology (i.e., using no ontology matching techniques), (2)
Publication Approach, (3) Topic Map Approach, (4) combination of Topic Map
Approach and Publication Approach. Approach (2) and (3) use the
alignment computed by ontology matching.</p>
      <p>
        The rest of this paper is structured as follows. In Section 2 we explain the
design of the CF ontology. Furthermore, the selection process of the utilized
medical ontologies is explained (rst results about these issues were reported in
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). In Section 3 we describe the approaches to establish a link between patents
and competence elds. In Section 4 the four approaches to categorize patents into
competence elds will be evaluated. Finally, we discuss related work in Section 5
and conclude the paper in Section 6.
      </p>
      <sec id="sec-2-1">
        <title>Modeling and Selection of Ontologies</title>
        <p>Our assumption is that a huge medical ontology (or a set of them) and mappings
to a smaller competence eld ontology (CFO) will help to more easily classify
patents into competence elds. The idea is somehow similar to a smart
multilevel lter. First we retrieve terms describing the content of a patent (either from
the topic map or the cited publications). These terms are compared to concept
names in a huge medical taxonomy using string similarity measures. The most
similar ones are selected, which results in a potentially long list of concepts.
Afterwards we lter further and search for mappings from the concepts and
their predecessors to concepts of the smaller competence eld ontology using
more intelligent matchers. This leads to scores which identify the membership
condence to the competence elds.</p>
        <p>
          To implement this approach two foremost things have to be done: (1) we
have to model the competence eld ontology and (2) we need to evaluate and
select comprehensive medical ontologies. For the design of ontologies there exist
several acknowledged methodologies, such as METHONTOLOGY [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], TOVE,
or the work by Noy and McGuinness [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The NeOn methodology [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is a more
recent approach which combines ideas of the former methods. The methodology
describes nine scenarios for building ontologies and ontology networks [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          To create the CFO, we started from the descriptions in [
          <xref ref-type="bibr" rid="ref15 ref3">15,3</xref>
          ] and also used an
extended description of ME domain experts. As the six competence elds are the
categories we want to assign to the patents, we use these (and only these) as rst
level concepts in the ontology. All further concepts will be subconcepts of these.
This approach corresponds to the reusing and reengineering non-ontological
resources scenario of the NeOn methodology. To nd subconcepts, we had analyzed
the detailed description of the CFs by the domain experts. Firstly, we extracted
a preliminary selection of 174 terms which we used to make a rst draft of a
preliminary ontology on which domain experts commented using a custom web
front end for the review of ontologies.
        </p>
        <p>
          In parallel we searched for one or multiple large biomedical taxonomies. We
need these taxonomies for two things. First, we want to extend the basic CFO
we created before with more terms to describe the competence elds in more
detail. Second, we need the large ontology as entry point to nd terms describing
the patents and with the alignment to the CFO we can determine the
corresponding competence elds. This corresponds to the sixth scenario of the NeOn
methodology, namely reusing, merging and reengineering ontological resources .
The rst step in this scenario is the ontological resource reuse process , starting
with the Ontology Search [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Hence, we searched for ontologies with domain
specic search engines as described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We used the Bioportal 6 search
engine, the Ontology Lookup Service 7, and the Ontobee8 search engine using the
preliminary list of terms to have a broad overview. Afterwards we carried out
6 http://bioportal.bioontology.org
7 http://www.ebi.ac.uk/ontology-lookup
8 http://www.ontobee.org
        </p>
        <p>Imaging
Techniques</p>
        <p>NCIT</p>
        <p>NCIT + MeSH
Prostheses &amp; Telemedicine Operative &amp; In-Vitro Special
Implants Interventional Diagnostics Therapies &amp;</p>
        <p>Dev. and Sys. Diagnosis Sys.</p>
        <p>NCIT + MeSH + SNOMEDCT NCIT + MSH + SNOMEDCT + RHMeSH</p>
        <p>
          Complete
the Ontology Assessment and Comparison steps [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The most promising four
ontologies found are the National Cancer Institute (NCIT) Thesaurus, the
Systematized Nomenclature of Medicine - Clinical Terms (SNOMEDCT), MeSH,
and the Robert Hoehndorf Version of MeSH (RHMeSH). To identify if they
satisfy our needs, we did a coverage analysis, where the coverage is the percentage
of the competence eld terms present in each of the ontologies. No single
ontology covered all competence elds to a satisfying degree; some reached more
than 60% for one competence eld but only about 20% for the other elds (e.g.,
NCIT covers ‘Imaging Techniques’ well, but not the other elds).
        </p>
        <p>Hence, we decided to analyze the coverage by adding one ontology after
another to see the gain of adding further ontologies. We used the most promising
ontologies identied before and started with the NCI Thesaurus. Figure 2 shows
the results.</p>
        <p>It can be noted, that we gain about 10% coverage using all ontologies. The
biggest gain is achieved by adding the MeSH ontology. Thus, we decided to use
the NCIT and the MeSH ontologies to extend the CFO, as this was a good
compromise between coverage and complexity. For the matching of the biomedical
ontology to the CFO we rst picked only one ontology to keep the
computational overhead during runtime low. If it does not give us satisfying results, we will
add more ontologies and also align them with the CFO. One possibility would
be also to use the UMLS which is a superset of many medical ontologies, but it
is really large, which could lead to performance problems. For now, we selected
the Robert Hoehndorf MeSH 9 as it has a good coverage and is available in the
OWL format.</p>
        <p>The next steps to develop the CFO are the ontology aligning and ontology
merging step and the ontological resource engineering process . We proceeded in
these steps as follows. We took the extracted terms, the so far found concepts
from the coverage analysis, and the detailed description of the innovation elds,
and carried out an extended search in the MeSH Browser 10 and the NCIT
Brow</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>9 https://bioportal.bioontology.org/ontologies/RH-MESH 10 https://meshb.nlm.nih.gov/search</title>
      <p>
        ser11 for these and related concepts. We analyzed the hierarchical structure of
each of the found concepts and decided for each concept if it is adopted into
the CFO. Where applicable we also adopted the inheritance relationship of
concepts. We extended and restructured the CFO in cycles, i.e., according to [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] we
did a re-conceptualization on dierent levels for the CFO and for the concepts
from the biomedical ontologies. For the upper levels of the CFO we designed
categories which t better to our purposes for categorizing terms for medical
engineering. We used a mind mapping technique and a bottom-up approach
as for example described by Noy and McGuinness [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to rene the design. As
an example, the Imaging Techniques concepts and the concepts of the concept
Imaging_Technology (2nd level) are visualized in Figure 3.
      </p>
      <p>The ontology has been implemented in OWL using the NeOn toolkit 12. We
evaluated the CFO also in tests in the complete process of patent categorization.
We noticed that the initial results were not satisfying because some competence
elds were not represented well in the CFO. Hence, we did a frequency analysis
of the MeSH terms from the Publications Approach. We made a ranked list of
MeSH concepts based on how often they have been searched for, but did not
lead to matches in the CFO. Based on this list we added more useful concepts
to the CFO (no trivial, misleading terms, such as Human, but for example Gene
Expression Regulation ). The current CFO consists of 529 concepts and can be
downloaded at http://dbis.rwth-aachen.de/cms/projects/mi-mappa/CFO.owl .
3</p>
      <sec id="sec-3-1">
        <title>Matching of Ontologies and Topic Maps</title>
        <p>As explained above, we are using three dierent basic approaches and one
combined approach to classify patents. Figure 4 gives an overview of the dierent
approaches.
#1: TMD (Topic Map with Direct Mapping): In this approach, we
match the terms extracted from the topic maps directly with the competence
11 https://ncit.nci.nih.gov/ncitbrowser/
12 http://neon-toolkit.org
eld ontology. This can be seen as a base line as it does not use a semantically
rich ontology as intermediate component, but only uses string matching to
match terms and ontology elements.
#2: PBA (Publication-Based Approach): This approach uses the MeSH
terms attached to publications which are referenced by a patent. Then, we
use an alignment between the CFO and MeSH to compute a score for the
relationship between a patent and a competence eld.
#3: TMA (Topic Map Approach): Here, we also use topic mapping (as in
approach #1) to create initial clusters of patents and extract terms occurring
frequently in these clusters. These terms are then matched with the concepts
of the MeSH ontology. Using the same alignment as in the second approach,
a relationship to the CFO is established.
#4: COM (Combined Approach of #2 &amp; #3): This is a combination of
PBA and TMA, with an emphasis on the results of PBA.</p>
        <p>As the approaches TMD and TMA are based on topics, we rst briey explain
this part, before we present how we did the alignment between of CFO and
MeSH, and describe the publication-based approach.
3.1</p>
        <p>
          Topic Mapping
A basic set of patents is used to build a topic map. Firstly, the corpus of
documents is preprocessed (stemming, removing stop words, etc.) and a
DocumentTerm-Matrix (DTM) is created. The matrix is input to a Latent Dirichlet
Allocation (LDA) algorithm with the Gibbs sampling algorithm for estimation and
variational expectation maximization [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The LDA determines a xed number
of topics which are each described by a xed number of stemmed terms. To each
patent in the basic patent set topics are assigned with a probability. The topic
map and the assignments are stored in a database.
        </p>
        <p>
          We evaluated dierent numbers of topics and dierent numbers of terms
extracted for each topic (e.g., 10, 30, 50, etc.). As computation of the subsequent
steps increases with a higher number of topics and terms, we used 50 topics and
50 terms for our evaluation in Section 4. As the TMD approach matches the
terms directly with the CFO, no further processing on the extracted terms is
done in this case. We just do a similarity calculation using a normalized Longest
Common Subsequence [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] algorithm. In our tests, we found that a threshold
value of 0:5 for the string similarity provides the best compromise.
        </p>
        <p>For the categorization of the input patent with the TMA, the topic with the
highest probability in the topic map (or multiple topics if they have the same
probability) is retrieved. Each term characterizing the topic is compared with
all concepts in the medical ontology resulting in a set of matching concepts. For
each of the concepts in the set direct mappings and mappings of parent concepts
are collected from the alignment and it is determined to which competence eld
the matching concept in the CF ontology belongs. From the similarities average
scores are calculated for each term and each competence eld. Based on this, an
average score is calculated from all terms for the topic(s) of the patent. Hence,
for each patent we have a score for each of the competence elds and normalize
these, such that all scores add up to 1.
3.2</p>
        <p>
          Ontology Matching
To rate how strong a patent or publication is related to a certain competence
eld, we need to match the describing terms either extracted from publications
or from the topic map to terms describing the competence elds. In preparation
to this step, we create an alignment between the selected MeSH ontology and the
CFO. The alignment constitutes of a set of mappings between the concepts of the
two ontologies. This means, for each mapping we have a pair of concepts and a
similarity value. As we do not try to re-invent the wheel, we used
AgreementMakerLight [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] as it produced constantly good results in the recent OAEI campaigns
and also performs well for large biomedical ontologies. AgreementMakerLight is
able to combine dierent matchers to create an alignment. We used the string
matcher, the word matcher, the structural matcher, the lexical matcher, the
cardinality lter, and the coherence lter. As a similarity threshold we used a value
of 0.6. The matchers have been combined in a hierarchical way and the default
settings for each matcher have been used.
        </p>
        <p>Currently, we are also testing other settings and their impact on the quality
of patent classication results. First experiments show, that slightly relaxed lter
settings (e.g., not using a cardinality lter) increases the number of mappings
and therefore, also improves the classication result.
3.3</p>
        <p>Publication-based Approach
We queried the web service of EPMC 13 to retrieve the metadata of the
papers referenced in our patent dataset. To extract the references from the patent
13 European PubMed Central, https://europepmc.org/
data, we use a pattern-based approach similar to the FreeCite citation parser 14.
Luckily, the patent data is semi-structured such that the citations can be clearly
identied. Nevertheless, for a large fraction of the patents, we are not able to
retrieve MeSH terms from referenced publications (either because the referenced
publication does not appear in PubMed or the citation is incorrect).</p>
        <p>
          The retrieved metadata for each referenced publication is then stored in
our Data Lake system Constance [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] from which it is accessed during patent
processing.
        </p>
        <p>Subsequently, we use a process which is similar to the TMA. In both cases,
we have a list of MeSH terms as input. For each of the terms in the list, the
mappings are determined as before and average scores per competence eld are
calculated and normalized for each patent.
3.4</p>
        <p>Combined Approach
In the combined approach (COM), if both approaches TMA and PBA deliver
results, the results are combined and overall scores for each competence eld
are determined. In all cases, we assign at most three competence elds to a
patent. In most cases, only one competence eld is assigned to a patent as the
other competence elds do not exceed a certain threshold. Thus, we take the
intersection of competence elds computed by TMA and PBA. If this is not
empty, we take this result (because both approaches are sure about a result). If
the intersection is empty, we take the competence elds with the highest scores
from TMA and PBA.
4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation</title>
        <p>
          In our experimental setup, we compare the aforementioned approaches. For the
analysis of patents, we need a comprehensive data basis with high data quality.
In the course of the mi-Mappa project, a subset of the PATSTAT database (2016
Spring edition, version 5.07) published by the European Patent Oce (EPO) is
used. For our purposes, we selected patents issued by a German (DE) or British
(UK) authority after 2004, which are from the medical domain (CPC class A61),
and which have an English abstract and title. This results in a set of 26,814
patents. For about 4,500 patents of this set, we are able to retrieve MeSH terms
for the referenced publications. From this set, we randomly selected 59 patents
to do a manual assignment to competence elds to evaluate our approaches. A
more extensive expert evaluation is currently being setup. In addition, we plan
also to evaluate our approach to the results of our project partners who apply a
supervised learning approach using Support Vector Machines [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>For TMA, we experimented with various congurations for the number of
topics and their associated terms. We observe that with a relatively small number
of topics and terms, e.g. 10 or 20, the terms are extremely broad-based and do
14 http://freecite.library.brown.edu/
not provide meaningful matches with the MeSH ontology or the CFO. Therefore,
based on the results, we chose the number of topics, as well as the number of
terms to be 50 for our default test conguration.</p>
        <p>Fig. 5 summarizes the ndings from our experiments for the aforementioned
approaches. It is obvious that all three of our proposed approaches #2, #3,
and #4 perform signicantly better than the baseline approach #1. All three
evaluation parameters, i.e., precision, recall, and the f-measure are worse for the
baseline approach. In contrast, when the MeSH ontology is used for matching the
ontology terms (#3 ), the precision and f-score are 0:375 and 0:38, respectively,
which are more than the doubled values of corresponding values produced by
#1. The PBA performs even better, resulting in precision, recall, and f-score
values of 0:46, 0:47 and 0:44, respectively. However, the combined approach #4
signicantly outperforms all the others, and results in precision, recall, and
fmeasure values of 0:53, 0:55 and 0:53, respectively. Indeed, we found that in the
case of the TMD-approach #1, there were a lot of erroneous matches, which
led to non-distinctive results for the CFO assignment. These results arm the
superiority of techniques which use a comprehensive biomedical ontology and
ontology matching for patent classication tasks.
5</p>
      </sec>
      <sec id="sec-3-3">
        <title>Related Work</title>
        <p>
          There are only few works that apply ontology matching in the context of
patent analysis. Semantic similarities (based on ontology matching) and case-based
reasoning have been applied in the design of invention processes which use
patent analysis to study related works. Patent analysis using ontologies has been
applied especially for patent search [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. A patent search request can be
represented as an ontology or as a set of concepts of an existing ontology, which is
then matched with the ontologies representing the knowledge of patents [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
Another example is the PatExpert system which uses a network of ontologies
and knowledge bases to enable patent search, classication, and clustering [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
Trappey et al. propose a system that calculates the conditional probability that,
given a specic text chunk is present in the document, the chunk is mapped to
a specic concept of a given ontology [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Patent similarity is then based on
the number of common matched concepts. This approach restricts the clustering
to the terms of the ontology which might lead to missing important terms not
present in the ontology.
6
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Conclusion</title>
        <p>Patent analysis is a complex topic as patents use their own language and
terminology. Even for humans used to research publications, patents are dicult to
understand. Thus, typical approaches for classifying patents might fail.</p>
        <p>In this paper, we investigated an ontology-based approach to assign patents
to competence elds in medical engineering. We developed two dierent
approaches and a combined approach that are based on a large biomedical ontology, its
alignment to the competence eld ontology designed by us, and other ontology
matching techniques. We have shown that these more elaborated approaches
outperform an approach that directly matches terms of patents with the
competence eld ontology.</p>
        <p>However, the overall f-measure of about 55% for the combined approach is not
yet satisfying. One problem is the small set of patents for which we have assigned
competence elds that we can use as a ground truth. This will be extended with
a larger expert evaluation in which patents will be classied by several experts.
Even humans might disagree on the assignment of a patent to a competence
eld; therefore, we will have multiple expert opinions for one patent. We will
also work on ne tuning and optimizing our approach. So far, we focused on the
quality of the result, and did not worry too much about the performance. Still,
we think that the area of patent classication is an interesting eld which could
benet more from the results in ontology matching.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Acknowledgements</title>
        <p>This work has been supported by the Klaus Tschira Stiftung gGmbH in the
context of the mi-Mappa project ( http://www.dbis.rwth-aachen.de/mi-Mappa/ ,
project no. 00.263.2015). We thank our project partners from the Institute of
Applied Medical Engineering at the Helmholtz Institute of RWTH Aachen
University &amp; Hospital, especially Dr. Robert Farkas, for the fruitful discussions of
the approach and for providing the patent data.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Bonino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ciaramella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Corno</surname>
          </string-name>
          .
          <article-title>Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics</article-title>
          .
          <source>World Patent Information</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>30</fpage>
          <lpage>38</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. BVMed.
          <source>Branchenbericht Medizintechnologien</source>
          <year>2015</year>
          . www.bvmed.de/branchenbericht,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Deutsche Gesellschaft f r Biomed.
          <source>Technik im VDE</source>
          .
          <article-title>Empfehlungen zur Verbesserung der Innovationsrahmenbedingungen f r Hochtechnologie-Medizin</article-title>
          .
          <source>Tech. rep., VDE</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fall</surname>
          </string-name>
          , A. T rcsvÆri, K. Benzineb,
          <string-name>
            <surname>G. Karetka.</surname>
          </string-name>
          <article-title>Automated categorization in the international patent classi cation</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , pp.
          <fpage>10</fpage>
          <lpage>25</lpage>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Balasubramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Curado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          .
          <article-title>OAEI 2016 results of AML</article-title>
          .
          <source>In Proc. 11th Intl. Workshop on Ontology Matching</source>
          , pp.
          <fpage>138</fpage>
          <lpage>145</lpage>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>M. FernÆndez-L pez</surname>
          </string-name>
          , A. G
          <string-name>
            <surname>mez-PØrez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Juristo</surname>
          </string-name>
          .
          <article-title>Methontology: from ontological art towards ontological engineering</article-title>
          .
          <source>In Proc. Symposium on Ontological Engineering of AAAI</source>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          .
          <article-title>An Ontology-based Collaboration Recommender System using Patents</article-title>
          .
          <source>In Proc. Intl. Conf. on Knowledge Engineering and Ontology Development (KEOD)</source>
          , pp.
          <fpage>389</fpage>
          <lpage>394</lpage>
          . Lisbon, Portugal,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>R.</given-names>
            <surname>Hai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          .
          <article-title>Constance: An Intelligent Data Lake System</article-title>
          . In F. zcan, G. Koutrika, S. Madden (eds.),
          <source>Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pp.
          <fpage>2097</fpage>
          <lpage>2100</lpage>
          . ACM, San Francisco, CA, USA,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>N.</given-names>
            <surname>Hamadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bukowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmitz-Rode</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farkas</surname>
          </string-name>
          .
          <article-title>Cooperative Patent Classi cation as a mean of validation for Support Vector Machine Learning:</article-title>
          <source>Case Study in Biomedical Emerging Fields of Technology. In 51. Jahrestagung der Biomedizinischen Technik (BMT)</source>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          .
          <article-title>Algorithms for the longest common subsequence problem</article-title>
          .
          <source>Journal of the ACM (JACM)</source>
          ,
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <fpage>664</fpage>
          <lpage>675</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>K.</given-names>
            <surname>Hornik</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Gr n. topicmodels: An R package for tting topic models</article-title>
          .
          <source>Journal of Statistical Software</source>
          ,
          <volume>40</volume>
          (
          <issue>13</issue>
          ):
          <fpage>1</fpage>
          <lpage>30</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>K.-K. Lai</surname>
            ,
            <given-names>S.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Using the patent co-citation approach to establish a new patent classi cation system</article-title>
          .
          <source>Information Processing &amp; Mgmt.</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>313</fpage>
          <lpage>330</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          .
          <article-title>Ontology Development 101: A Guide to Creating Your First Ontology</article-title>
          . Tutorial, Stanford University,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>J. Portenoy</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>West</surname>
          </string-name>
          .
          <article-title>Visualizing Scholarly Publications and Citations to Enhance Author Pro les</article-title>
          .
          <source>In Proc. WWW</source>
          , pp.
          <fpage>1279</fpage>
          <lpage>1282</lpage>
          . Perth, Australia,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. C. Schl telburg, C. Wei ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Becks</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. C.</surname>
          </string-name>
          <article-title>M hlbacher. Identi zierung von Innovationsh rden in der Medizintechnik</article-title>
          .
          <source>Tech. rep., Bundesministeriums f r Bildung und Forschung</source>
          ,
          <year>October 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>A.</given-names>
            <surname>Segev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kantola</surname>
          </string-name>
          .
          <article-title>Patent Search Decision Support Service</article-title>
          .
          <source>In 7th Intl. Conf. on Information Technology: New Generations (ITNG)</source>
          , pp.
          <fpage>568</fpage>
          <lpage>573</lpage>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>A Survey of Schema-Based Matching Approaches</article-title>
          .
          <source>Journal on Data Semantics, IV:146 171</source>
          ,
          <year>2005</year>
          . LNCS 3730.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          . Ontology Matching:
          <article-title>State of the Art and Future Challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>158</fpage>
          <lpage>176</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          SuÆrez-Figueroa.
          <article-title>NeOn Methodology for building ontology networks: specication, scheduling and reuse</article-title>
          .
          <source>Ph.D. thesis</source>
          , Univ. Politecnica de Madrid,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>A. J. Trappey</surname>
            ,
            <given-names>C. V.</given-names>
          </string-name>
          <string-name>
            <surname>Trappey</surname>
            ,
            <given-names>F.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          <string-name>
            <surname>Hsiao</surname>
          </string-name>
          .
          <article-title>A fuzzy ontological knowledge document clustering methodology</article-title>
          .
          <source>IEEE Trans. on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>806</fpage>
          <lpage>814</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. L.
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Br gmann</article-title>
          , J.
          <string-name>
            <surname>Codina</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Diallo</surname>
            , E. Escorsa,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Giereth</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Papadopoulos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Pianta</surname>
          </string-name>
          , et al.
          <article-title>Towards content-oriented patent document processing</article-title>
          .
          <source>World Patent Information</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <fpage>21</fpage>
          <lpage>33</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>