=Paper= {{Paper |id=Vol-2032/om2017_Tpaper4 |storemode=property |title=Ontology matching for patent classification |pdfUrl=https://ceur-ws.org/Vol-2032/om2017_Tpaper4.pdf |volume=Vol-2032 |authors=Christoph Quix,Sandra Geisler,Rihan Hai,Sanchit Alekh |dblpUrl=https://dblp.org/rec/conf/semweb/QuixGHA17 }} ==Ontology matching for patent classification== https://ceur-ws.org/Vol-2032/om2017_Tpaper4.pdf
    Ontology Matching for Patent Classication
             Christoph Quix1,2 , Sandra Geisler2 , Rihan Hai1 , Sanchit Alekh1
    1
            Databases and Information Systems, RWTH Aachen University, Germany
        2
             Fraunhofer-Institute for Applied Information Technology FIT, Germany
                               1
                                 lastname@dbis.rwth-aachen.de
                          2
                            firstname.lastname@fit.fraunhofer.de



            Abstract.    Interdisciplinary research and development projects in me-
            dical engineering benet from well selected collaboration partners. The
            process of nding such partners from often unfamiliar elds is dicult,
            but can be supported by an expert prole that is based on patent analysis
            and classifying the patents to competence elds in medical engineering.
            Patent analysis and categorization are dicult and require the analysis
            of the semantic content. Hence, we propose a twofold approach using
            a large controlled vocabulary, a smaller competence eld ontology, and
            an alignment between them to assign patents to a certain competence
            eld. The approach has two parts: a Topic Map approach and a Publi-
            cation approach. We evaluate these approaches and its components in
            several ways. Furthermore, we compare four dierent ways to assign a
            patent to a competence eld and show that the semantic wealth of a
            large biomedical ontology is benecial to the classication task.


1   Introduction
Ontology matching has been an active research area for more than 10 years
[17,18]. Ontologies are used to describe a domain of interest by concepts and
relationships between them, and to provide a formal description of these relati-
onships. Thus, although the aim of ontology matching seems to be the matching
of classes and properties, usually its actual intention is to match elements of the
domains described by the ontologies. An example for such a `domain matching'
task is patent classication in which patents should be assigned to a class in a
classication [4].
    While a classication scheme or taxonomy can be easily represented as an
ontology, representing the content of a patent as an ontology or describing the
patent with elements of an ontology is more challenging. Patents have their own
specic language and use a terminology that is dierent from a typical research
publication. Patents are classied using the International Patent Classication
(IPC) system; however, this is too general for a detailed patent analysis [12].
On the other hand, patent data is available in a structured form (usually XML)
from patent oces, which simplies the pre-processing and extraction of basic
information such as title, abstract, and authors. Furthermore, they are often
also available in multiple languages; at least, the bibliographic information and
abstract is available in English, which solves the problem of multi-lingual docu-
ments.
    We are aiming at building a recommender system for research projects in
medical engineering (ME) [7] in the context of the mi-Mappa project3 . In ME
researchers from several disciplines (e.g., biology, medicine, mechanical engineer-
ing, computer science) work jointly on a research project. Furthermore, ME is a
highly innovative domain with short product cycles requiring a fast translation
of research results into applicable products [2]. While on the one hand, a publica-
tion list of a researcher provides a good basis for creating an author prole [14],
on the other hand a list of patents allows to characterize the ability of a rese-
archer to develop inventions and market-ready products. Hence, we concentrate
mainly on the analysis of patents.
    To address the problem of patent terminology, we exploit explicit references
to scientic publications and their semantic annotations. In ME, most of the
publications appear in journals or conferences that are indexed by PubMed4 .
PubMed uses MeSH5 , a rich controlled vocabulary with a hierarchical structure,
to annotate the publications. Thus, to retrieve a MeSH annotation for a pa-
tent, we lookup the references to research articles in PubMed and retrieve the
corresponding MeSH terms.
    Using references to scientic publications is only one aspect in our approach
for patent classication. The overall approach, depicted in Figure 1 consists
of two complementary sub-approaches: the Topic Map Approach (TMA) and
the Publications Approach (PBA). Both approaches utilize two ontologies - a
competence eld (CF) ontology and an ontology with comprehensive medical
knowledge (MeSH) - and an alignment between them.
    For the Publication Approach, excerpts of publication databases, as well
as their associated MeSH terms are imported into our Data Lake (DL) system
Constance [8]. The data lake can then be queried on-the-y for publications cited
by the currently processed patent, as well as the MeSH terms that are pertinent
to each of these publications. For the categorization of the input patent with
the TMA, the topic with the highest probability in the topic map (or multiple
topics if they have the same probability) is retrieved. Each term characterizing
the topic is compared with all concepts in the MeSH ontology resulting in a set
of matching concepts.
    Thus, for both approaches, we have a list of related concepts from the MeSH
ontology. To establish a link to the competence eld ontology, which we have
created to describe the innovation areas in medical engineering (see section 2),
we use ontology matching.
    There are several questions arising when we analyze the presented approach.
Creating an alignment between ontologies and the use of a huge medical ontology
in this context require a high amount of resources in terms of memory and CPU
power. Hence, we need to know if the eort using it is worth it. Furthermore,
3
  http://www.dbis.rwth-aachen.de/mi-Mappa
4
  https://www.ncbi.nlm.nih.gov/pubmed/
5
  Medical Subject Headings, https://www.nlm.nih.gov/mesh/
                                                                  Publication
      Topic Modeling                          Patent
                          Patent Set                               Data-
                                             Database
                                                                    bases


                                              Patent
        Topic Map                                              Data Lake System



                                                                                              Competence
                                                                       MeSH Ontology
                                                                                             Field Ontology

                                       Patent Categorization




                                          Labeled Patent                             Alignment


          Topic Map Approach                                                      Ontology Matching
          Publication Approach

                                 Fig. 1. The Overall Architecture




it is of interest if the quality and size of the alignment between the ontologies
have an impact on the results. A special problem is to rate the quality of the
alignment without a reference alignment. To answer these questions we present
the following contributions in this paper:

  We analyze and select medical ontologies to use them as a basis for the
   creation of the CF ontology and as a single point of entry to identify the
   semantics of patents and publications.
  We describe the process of designing the competence eld ontology and rate
   its quality based on approved methodologies.
  We create dierent alignments between the CF ontology and the medical
   ontology with dierent matcher congurations and compare their quality.
  We compare the results of four dierent approaches to categorize a patent:
   (1) Topic Map Approach with direct comparison of terms with concepts of
   the CF ontology (i.e., using no ontology matching techniques), (2) Publi-
   cation Approach, (3) Topic Map Approach, (4) combination of Topic Map
   Approach and Publication Approach. Approach (2) and (3) use the align-
   ment computed by ontology matching.

    The rest of this paper is structured as follows. In Section 2 we explain the
design of the CF ontology. Furthermore, the selection process of the utilized
medical ontologies is explained (rst results about these issues were reported in
[7]). In Section 3 we describe the approaches to establish a link between patents
and competence elds. In Section 4 the four approaches to categorize patents into
competence elds will be evaluated. Finally, we discuss related work in Section 5
and conclude the paper in Section 6.
2   Modeling and Selection of Ontologies
Our assumption is that a huge medical ontology (or a set of them) and mappings
to a smaller competence eld ontology (CFO) will help to more easily classify
patents into competence elds. The idea is somehow similar to a smart multi-
level lter. First we retrieve terms describing the content of a patent (either from
the topic map or the cited publications). These terms are compared to concept
names in a huge medical taxonomy using string similarity measures. The most
similar ones are selected, which results in a potentially long list of concepts.
Afterwards we lter further and search for mappings from the concepts and
their predecessors to concepts of the smaller competence eld ontology using
more intelligent matchers. This leads to scores which identify the membership
condence to the competence elds.
    To implement this approach two foremost things have to be done: (1) we
have to model the competence eld ontology and (2) we need to evaluate and
select comprehensive medical ontologies. For the design of ontologies there exist
several acknowledged methodologies, such as METHONTOLOGY [6], TOVE,
or the work by Noy and McGuinness [13]. The NeOn methodology [19] is a more
recent approach which combines ideas of the former methods. The methodology
describes nine scenarios for building ontologies and ontology networks [19].
    To create the CFO, we started from the descriptions in [15,3] and also used an
extended description of ME domain experts. As the six competence elds are the
categories we want to assign to the patents, we use these (and only these) as rst
level concepts in the ontology. All further concepts will be subconcepts of these.
This approach corresponds to the reusing and reengineering non-ontological re-
sources scenario of the NeOn methodology. To nd subconcepts, we had analyzed
the detailed description of the CFs by the domain experts. Firstly, we extracted
a preliminary selection of 174 terms which we used to make a rst draft of a
preliminary ontology on which domain experts commented using a custom web
front end for the review of ontologies.
    In parallel we searched for one or multiple large biomedical taxonomies. We
need these taxonomies for two things. First, we want to extend the basic CFO
we created before with more terms to describe the competence elds in more de-
tail. Second, we need the large ontology as entry point to nd terms describing
the patents and with the alignment to the CFO we can determine the corre-
sponding competence elds. This corresponds to the sixth scenario of the NeOn
methodology, namely reusing, merging and reengineering ontological resources.
The rst step in this scenario is the ontological resource reuse process, starting
with the Ontology Search [19]. Hence, we searched for ontologies with domain
specic search engines as described in [7]. We used the Bioportal6 search en-
gine, the Ontology Lookup Service7 , and the Ontobee8 search engine using the
preliminary list of terms to have a broad overview. Afterwards we carried out
6
  http://bioportal.bioontology.org
7
  http://www.ebi.ac.uk/ontology-lookup
8
  http://www.ontobee.org
            70

            60

            50

            40

            30

            20

            10

            0
                  Imaging      Prostheses & Telemedicine Operative &        In-Vitro       Special       Complete
                 Techniques      Implants                Interventional   Diagnostics    Therapies &
                                                          Dev. and Sys.                 Diagnosis Sys.
                   NCIT       NCIT + MeSH    NCIT + MeSH + SNOMEDCT        NCIT + MSH + SNOMEDCT + RHMeSH

                   Fig. 2. Coverage based on Combination of Ontologies




the Ontology Assessment and Comparison steps [19]. The most promising four
ontologies found are the National Cancer Institute (NCIT) Thesaurus, the Sy-
stematized Nomenclature of Medicine - Clinical Terms (SNOMEDCT), MeSH,
and the Robert Hoehndorf Version of MeSH (RHMeSH). To identify if they sa-
tisfy our needs, we did a coverage analysis, where the coverage is the percentage
of the competence eld terms present in each of the ontologies. No single on-
tology covered all competence elds to a satisfying degree; some reached more
than 60% for one competence eld but only about 20% for the other elds (e.g.,
NCIT covers `Imaging Techniques' well, but not the other elds).
    Hence, we decided to analyze the coverage by adding one ontology after
another to see the gain of adding further ontologies. We used the most promising
ontologies identied before and started with the NCI Thesaurus. Figure 2 shows
the results.
    It can be noted, that we gain about 10% coverage using all ontologies. The
biggest gain is achieved by adding the MeSH ontology. Thus, we decided to use
the NCIT and the MeSH ontologies to extend the CFO, as this was a good com-
promise between coverage and complexity. For the matching of the biomedical
ontology to the CFO we rst picked only one ontology to keep the computatio-
nal overhead during runtime low. If it does not give us satisfying results, we will
add more ontologies and also align them with the CFO. One possibility would
be also to use the UMLS which is a superset of many medical ontologies, but it
is really large, which could lead to performance problems. For now, we selected
the Robert Hoehndorf MeSH9 as it has a good coverage and is available in the
OWL format.
    The next steps to develop the CFO are the ontology aligning and ontology
merging step and the ontological resource engineering process. We proceeded in
these steps as follows. We took the extracted terms, the so far found concepts
from the coverage analysis, and the detailed description of the innovation elds,
and carried out an extended search in the MeSH Browser10 and the NCIT Brow-
 9
     https://bioportal.bioontology.org/ontologies/RH-MESH
10
     https://meshb.nlm.nih.gov/search
                      Fig. 3. The Imaging Technique Concept




ser11 for these and related concepts. We analyzed the hierarchical structure of
each of the found concepts and decided for each concept if it is adopted into
the CFO. Where applicable we also adopted the inheritance relationship of con-
cepts. We extended and restructured the CFO in cycles, i.e., according to [19] we
did a re-conceptualization on dierent levels for the CFO and for the concepts
from the biomedical ontologies. For the upper levels of the CFO we designed
categories which t better to our purposes for categorizing terms for medical
engineering. We used a mind mapping technique and a bottom-up approach
as for example described by Noy and McGuinness [13] to rene the design. As
an example, the Imaging Techniques concepts and the concepts of the concept
Imaging_Technology (2nd level) are visualized in Figure 3.
    The ontology has been implemented in OWL using the NeOn toolkit12 . We
evaluated the CFO also in tests in the complete process of patent categorization.
We noticed that the initial results were not satisfying because some competence
elds were not represented well in the CFO. Hence, we did a frequency analysis
of the MeSH terms from the Publications Approach. We made a ranked list of
MeSH concepts based on how often they have been searched for, but did not
lead to matches in the CFO. Based on this list we added more useful concepts
to the CFO (no trivial, misleading terms, such as Human, but for example Gene
Expression Regulation ). The current CFO consists of 529 concepts and can be
downloaded at http://dbis.rwth-aachen.de/cms/projects/mi-mappa/CFO.owl.

3      Matching of Ontologies and Topic Maps
As explained above, we are using three dierent basic approaches and one com-
bined approach to classify patents. Figure 4 gives an overview of the dierent
approaches.
#1: TMD (Topic Map with Direct Mapping): In this approach, we ma-
   tch the terms extracted from the topic maps directly with the competence
11
     https://ncit.nci.nih.gov/ncitbrowser/
12
     http://neon-toolkit.org
                Fig. 4. The dierent approaches used for Evaluation



   eld ontology. This can be seen as a base line as it does not use a semantically
   rich ontology as intermediate component, but only uses string matching to
   match terms and ontology elements.
#2: PBA (Publication-Based Approach): This approach uses the MeSH
   terms attached to publications which are referenced by a patent. Then, we
   use an alignment between the CFO and MeSH to compute a score for the
   relationship between a patent and a competence eld.
#3: TMA (Topic Map Approach): Here, we also use topic mapping (as in
   approach #1) to create initial clusters of patents and extract terms occurring
   frequently in these clusters. These terms are then matched with the concepts
   of the MeSH ontology. Using the same alignment as in the second approach,
   a relationship to the CFO is established.
#4: COM (Combined Approach of #2 & #3): This is a combination of
   PBA and TMA, with an emphasis on the results of PBA.

    As the approaches TMD and TMA are based on topics, we rst briey explain
this part, before we present how we did the alignment between of CFO and
MeSH, and describe the publication-based approach.

3.1   Topic Mapping

A basic set of patents is used to build a topic map. Firstly, the corpus of docu-
ments is preprocessed (stemming, removing stop words, etc.) and a Document-
Term-Matrix (DTM) is created. The matrix is input to a Latent Dirichlet Allo-
cation (LDA) algorithm with the Gibbs sampling algorithm for estimation and
variational expectation maximization [11]. The LDA determines a xed number
of topics which are each described by a xed number of stemmed terms. To each
patent in the basic patent set topics are assigned with a probability. The topic
map and the assignments are stored in a database.
    We evaluated dierent numbers of topics and dierent numbers of terms
extracted for each topic (e.g., 10, 30, 50, etc.). As computation of the subsequent
steps increases with a higher number of topics and terms, we used 50 topics and
50 terms for our evaluation in Section 4. As the TMD approach matches the
terms directly with the CFO, no further processing on the extracted terms is
done in this case. We just do a similarity calculation using a normalized Longest
Common Subsequence [10] algorithm. In our tests, we found that a threshold
value of 0.5 for the string similarity provides the best compromise.
    For the categorization of the input patent with the TMA, the topic with the
highest probability in the topic map (or multiple topics if they have the same
probability) is retrieved. Each term characterizing the topic is compared with
all concepts in the medical ontology resulting in a set of matching concepts. For
each of the concepts in the set direct mappings and mappings of parent concepts
are collected from the alignment and it is determined to which competence eld
the matching concept in the CF ontology belongs. From the similarities average
scores are calculated for each term and each competence eld. Based on this, an
average score is calculated from all terms for the topic(s) of the patent. Hence,
for each patent we have a score for each of the competence elds and normalize
these, such that all scores add up to 1.

3.2     Ontology Matching
To rate how strong a patent or publication is related to a certain competence
eld, we need to match the describing terms either extracted from publications
or from the topic map to terms describing the competence elds. In preparation
to this step, we create an alignment between the selected MeSH ontology and the
CFO. The alignment constitutes of a set of mappings between the concepts of the
two ontologies. This means, for each mapping we have a pair of concepts and a
similarity value. As we do not try to re-invent the wheel, we used AgreementMa-
kerLight [5] as it produced constantly good results in the recent OAEI campaigns
and also performs well for large biomedical ontologies. AgreementMakerLight is
able to combine dierent matchers to create an alignment. We used the string
matcher, the word matcher, the structural matcher, the lexical matcher, the car-
dinality lter, and the coherence lter. As a similarity threshold we used a value
of 0.6. The matchers have been combined in a hierarchical way and the default
settings for each matcher have been used.
    Currently, we are also testing other settings and their impact on the quality
of patent classication results. First experiments show, that slightly relaxed lter
settings (e.g., not using a cardinality lter) increases the number of mappings
and therefore, also improves the classication result.

3.3     Publication-based Approach
We queried the web service of EPMC13 to retrieve the metadata of the pa-
pers referenced in our patent dataset. To extract the references from the patent
13
     European PubMed Central, https://europepmc.org/
data, we use a pattern-based approach similar to the FreeCite citation parser14 .
Luckily, the patent data is semi-structured such that the citations can be clearly
identied. Nevertheless, for a large fraction of the patents, we are not able to
retrieve MeSH terms from referenced publications (either because the referenced
publication does not appear in PubMed or the citation is incorrect).
    The retrieved metadata for each referenced publication is then stored in
our Data Lake system Constance [8] from which it is accessed during patent
processing.
    Subsequently, we use a process which is similar to the TMA. In both cases,
we have a list of MeSH terms as input. For each of the terms in the list, the
mappings are determined as before and average scores per competence eld are
calculated and normalized for each patent.

3.4      Combined Approach
In the combined approach (COM), if both approaches TMA and PBA deliver
results, the results are combined and overall scores for each competence eld
are determined. In all cases, we assign at most three competence elds to a
patent. In most cases, only one competence eld is assigned to a patent as the
other competence elds do not exceed a certain threshold. Thus, we take the
intersection of competence elds computed by TMA and PBA. If this is not
empty, we take this result (because both approaches are sure about a result). If
the intersection is empty, we take the competence elds with the highest scores
from TMA and PBA.

4      Evaluation
In our experimental setup, we compare the aforementioned approaches. For the
analysis of patents, we need a comprehensive data basis with high data quality.
In the course of the mi-Mappa project, a subset of the PATSTAT database (2016
Spring edition, version 5.07) published by the European Patent Oce (EPO) is
used. For our purposes, we selected patents issued by a German (DE) or British
(UK) authority after 2004, which are from the medical domain (CPC class A61),
and which have an English abstract and title. This results in a set of 26,814
patents. For about 4,500 patents of this set, we are able to retrieve MeSH terms
for the referenced publications. From this set, we randomly selected 59 patents
to do a manual assignment to competence elds to evaluate our approaches. A
more extensive expert evaluation is currently being setup. In addition, we plan
also to evaluate our approach to the results of our project partners who apply a
supervised learning approach using Support Vector Machines [9].
    For TMA, we experimented with various congurations for the number of
topics and their associated terms. We observe that with a relatively small number
of topics and terms, e.g. 10 or 20, the terms are extremely broad-based and do
14
     http://freecite.library.brown.edu/
Fig. 5. Comparison of the precision, recall and f-measure for the dierent approaches



not provide meaningful matches with the MeSH ontology or the CFO. Therefore,
based on the results, we chose the number of topics, as well as the number of
terms to be 50 for our default test conguration.
    Fig. 5 summarizes the ndings from our experiments for the aforementioned
approaches. It is obvious that all three of our proposed approaches #2, #3,
and #4 perform signicantly better than the baseline approach #1. All three
evaluation parameters, i.e., precision, recall, and the f-measure are worse for the
baseline approach. In contrast, when the MeSH ontology is used for matching the
ontology terms (#3 ), the precision and f-score are 0.375 and 0.38, respectively,
which are more than the doubled values of corresponding values produced by
#1. The PBA performs even better, resulting in precision, recall, and f-score
values of 0.46, 0.47 and 0.44, respectively. However, the combined approach #4
signicantly outperforms all the others, and results in precision, recall, and f-
measure values of 0.53, 0.55 and 0.53, respectively. Indeed, we found that in the
case of the TMD-approach #1, there were a lot of erroneous matches, which
led to non-distinctive results for the CFO assignment. These results arm the
superiority of techniques which use a comprehensive biomedical ontology and
ontology matching for patent classication tasks.

5   Related Work
There are only few works that apply ontology matching in the context of pa-
tent analysis. Semantic similarities (based on ontology matching) and case-based
reasoning have been applied in the design of invention processes which use pa-
tent analysis to study related works. Patent analysis using ontologies has been
applied especially for patent search [1]. A patent search request can be repre-
sented as an ontology or as a set of concepts of an existing ontology, which is
then matched with the ontologies representing the knowledge of patents [16].
Another example is the PatExpert system which uses a network of ontologies
and knowledge bases to enable patent search, classication, and clustering [21].
Trappey et al. propose a system that calculates the conditional probability that,
given a specic text chunk is present in the document, the chunk is mapped to
a specic concept of a given ontology [20]. Patent similarity is then based on
the number of common matched concepts. This approach restricts the clustering
to the terms of the ontology which might lead to missing important terms not
present in the ontology.

6   Conclusion
Patent analysis is a complex topic as patents use their own language and termi-
nology. Even for humans used to research publications, patents are dicult to
understand. Thus, typical approaches for classifying patents might fail.
    In this paper, we investigated an ontology-based approach to assign patents
to competence elds in medical engineering. We developed two dierent approa-
ches and a combined approach that are based on a large biomedical ontology, its
alignment to the competence eld ontology designed by us, and other ontology
matching techniques. We have shown that these more elaborated approaches
outperform an approach that directly matches terms of patents with the com-
petence eld ontology.
    However, the overall f-measure of about 55% for the combined approach is not
yet satisfying. One problem is the small set of patents for which we have assigned
competence elds that we can use as a ground truth. This will be extended with
a larger expert evaluation in which patents will be classied by several experts.
Even humans might disagree on the assignment of a patent to a competence
eld; therefore, we will have multiple expert opinions for one patent. We will
also work on ne tuning and optimizing our approach. So far, we focused on the
quality of the result, and did not worry too much about the performance. Still,
we think that the area of patent classication is an interesting eld which could
benet more from the results in ontology matching.

Acknowledgements
This work has been supported by the Klaus Tschira Stiftung gGmbH in the
context of the mi-Mappa project (http://www.dbis.rwth-aachen.de/mi-Mappa/,
project no. 00.263.2015). We thank our project partners from the Institute of
Applied Medical Engineering at the Helmholtz Institute of RWTH Aachen Uni-
versity & Hospital, especially Dr. Robert Farkas, for the fruitful discussions of
the approach and for providing the patent data.

References
 1. D. Bonino, A. Ciaramella, F. Corno. Review of the state-of-the-art in patent
    information and forthcoming evolutions in intelligent patent informatics. World
    Patent Information, 32(1):3038, 2010.
 2. BVMed.                    Branchenbericht        Medizintechnologien      2015.
    www.bvmed.de/branchenbericht, June 2015.
 3. Deutsche Gesellschaft für Biomed. Technik im VDE. Empfehlungen zur Verbesse-
    rung der Innovationsrahmenbedingungen für Hochtechnologie-Medizin. Tech. rep.,
    VDE, 2012.
 4. C. J. Fall, A. Törcsvári, K. Benzineb, G. Karetka. Automated categorization in
    the international patent classication. In ACM SIGIR Forum, pp. 1025. 2003.
 5. D. Faria, C. Pesquita, B. S. Balasubramani, C. Martins, J. Cardoso, H. Curado,
    F. M. Couto, I. F. Cruz. OAEI 2016 results of AML. In Proc. 11th Intl. Workshop
    on Ontology Matching, pp. 138145. 2016.
 6. M. Fernández-López, A. Gómez-Pérez, N. Juristo. Methontology: from ontological
    art towards ontological engineering. In Proc. Symposium on Ontological Engineer-
    ing of AAAI. 1997.
 7. S. Geisler, R. Hai, C. Quix. An Ontology-based Collaboration Recommender Sy-
    stem using Patents. In Proc. Intl. Conf. on Knowledge Engineering and Ontology
    Development (KEOD), pp. 389394. Lisbon, Portugal, 2015.
 8. R. Hai, S. Geisler, C. Quix. Constance: An Intelligent Data Lake System. In
    F. Özcan, G. Koutrika, S. Madden (eds.), Proc. Intl. Conf. on Management of
    Data (SIGMOD), pp. 20972100. ACM, San Francisco, CA, USA, 2016.
 9. N. Hamadeh, M. Bukowski, T. Schmitz-Rode, R. Farkas. Cooperative Patent Clas-
    sication as a mean of validation for Support Vector Machine Learning: Case Study
    in Biomedical Emerging Fields of Technology. In 51. Jahrestagung der Biomedizi-
    nischen Technik (BMT). 2017.
10. D. S. Hirschberg. Algorithms for the longest common subsequence problem. Jour-
    nal of the ACM (JACM), 24(4):664675, 1977.
11. K. Hornik, B. Grün. topicmodels: An R package for tting topic models. Journal
    of Statistical Software, 40(13):130, 2011.
12. K.-K. Lai, S.-J. Wu. Using the patent co-citation approach to establish a new pa-
    tent classication system. Information Processing & Mgmt., 41(2):313330, 2005.
13. N. F. Noy, D. L. McGuinness. Ontology Development 101: A Guide to Creating
    Your First Ontology. Tutorial, Stanford University, 2001.
14. J. Portenoy, J. D. West. Visualizing Scholarly Publications and Citations to En-
    hance Author Proles. In Proc. WWW, pp. 12791282. Perth, Australia, 2017.
15. C. Schlötelburg, C. Weiÿ, P. Hahn, T. Becks, A. C. Mühlbacher. Identizierung
    von Innovationshürden in der Medizintechnik. Tech. rep., Bundesministeriums für
    Bildung und Forschung, October 2008.
16. A. Segev, J. Kantola. Patent Search Decision Support Service. In 7th Intl. Conf.
    on Information Technology: New Generations (ITNG), pp. 568573. 2010.
17. P. Shvaiko, J. Euzenat. A Survey of Schema-Based Matching Approaches. Journal
    on Data Semantics, IV:146171, 2005. LNCS 3730.
18. P. Shvaiko, J. Euzenat. Ontology Matching: State of the Art and Future Challenges.
    IEEE Transactions on Knowledge and Data Engineering, 25(1):158176, 2013.
19. M. C. Suárez-Figueroa. NeOn Methodology for building ontology networks: speci-
    cation, scheduling and reuse. Ph.D. thesis, Univ. Politecnica de Madrid, 2010.
20. A. J. Trappey, C. V. Trappey, F.-C. Hsu, D. W. Hsiao. A fuzzy ontological kno-
    wledge document clustering methodology. IEEE Trans. on Systems, Man, and
    Cybernetics, Part B, 39(3):806814, 2009.
21. L. Wanner, R. Baeza-Yates, S. Brügmann, J. Codina, B. Diallo, E. Escorsa, M. Gie-
    reth, Y. Kompatsiaris, S. Papadopoulos, E. Pianta, et al. Towards content-oriented
    patent document processing. World Patent Information, 30(1):2133, 2008.