<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Learning from User Feedback for Ontology-based Information Extraction</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>German Aerospace Center, Institute of Data Science</institution>
          ,
          <addr-line>Malzerstra e 3, 07745 Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many engineering projects involve the integration of various hardware parts from di erent suppliers. In preparation, parts that are best suited for the project requirements have to be selected. Information on these parts' characteristics is published in so called data sheets usually only available in textual form, e.g. as PDF les. To realize the automated processing, these characteristics have to be extracted into a machine-interpretable format. Such a process requires a lot of manual intervention and is prone to errors. Domain ontologies, among other approaches, can be used to implement the automated information extraction from the data sheets. However, ontologies rely solely on the experiences and perspectives of their creators at the time of creation. To automate the evolution of ontologies, we developed ConTrOn - Continuously Trained Ontology - that automatically extracts information from data sheets to augment an ontology created by domain experts. The evaluation results of ConTrOn show that the enriched ontology can help improve the information extraction from technical documents. Nonetheless, the extracted information should be reviewed by experts before using it in the integration process. We want to provide an intuitive way of reviewing, in which the extracted information will be highlighted on the data sheets. The experts will be able to accept, reject, or correct the extracted data via a graphical interface. This process of revision and correction can be leveraged by the system to improve itself: learning from its own mistakes and identifying common patterns to adapt in the next extraction iteration. This paper presents ideas how to use machine learning based on user feedback to improve the information extraction process.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology-based information extraction edge representation</kwd>
        <kwd>Pattern recognition</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Knowl-</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The emerging of Industry 4.01 triggered an automation process in engineering
projects from development to production, sometimes including customer
feedback. Such digitalized processes demand the automatic exchange of data that is
machine-interpretable data. Meanwhile, component parts are described in data
sheets provided in textual format, such as PDF les. This enforces engineers to
manually extract the data required by engineering applications, which is not only
time and energy consuming, but also error-prone. Here, automated extraction of
this information can mitigate such tedious tasks and enable engineers to focus
on the actual product design.</p>
      <p>
        To realize a machine-interpretable description of parts' model, we represent
the description as ontologies. An ontology, as de ned by Noy and McGuinness
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], is a machine-interpretable de nition of basic concepts in a speci c domain
and relations between them. Its prime use case is information sharing/exchange.
Since ontologies provide formal speci cations of concepts, they can be used to
guide the information extraction process. However, most ontologies were created
based on a human's personal experience and perspective at some point in time
and thus can be biased or become outdated. Moreover, during the
ontologybased information extraction from domain speci c data sheets, new concepts and
relations that the ontologies do not cover might appear. Hence, to represent a
more complete view of the domain, ontologies constantly need to be augmented
with new concepts, relations, or labels for existing concepts. These enriched
ontologies now in turn improve the information extraction process and allow
discovery of more information from the unstructured text.
      </p>
      <p>
        We developed ConTrOn (Continuously Trained Ontology), a system that
automatically extends ontologies with information extracted from data sheets and
knowledge bases [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Based on classes de ned in an initial ontology, ConTrOn
extracts textual information from data sheets. Meanwhile, guided by ontology
classes, ConTrOn retrieves semantic knowledge from external data sources, i.e.
WordNet [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Wikidata [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], to enrich the incomplete classes. The initial
ontology is then augmented with the concepts retrieved from those external
knowledge bases. The process can be executed as soon as new data sheets are
available to automatically enrich the ontology over time.
      </p>
      <p>According to the evaluation results from our rst prototype, when compared
to keyword-based information extraction, ontologies provide more relevant
concepts, including subclasses and superclasses, and thus increase the amount of
discovered information. Nevertheless, the automatically extracted information
from our approach still requires human revision before archiving into a database.
During the review process, a human can identify mistakes and correct them.
Patterns of mistakes and corrections can then be analyzed using Natural Langauage
Processing (NLP) and Machine Learning (ML) techniques. The previous
functions can form a model to improve the information extraction process further.
1 https://www.plattform-i40.de/I40/Navigation/EN</p>
      <p>In this paper, we present our vision to improve ConTrOn with ML techniques
based on user feedback processes. The related work and techniques will be
reviewed in the next section. In Section 3, we elaborate on ConTrOn's work ow
and present an approach to improve it. Finally, the conclusion of this paper and
ideas for future work are described in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In this paper, we focus on the improvement of ConTrOn using ML and NLP
techniques. First, we review the existing work on Ontology-Based Information
Extraction (OBIE), which is one of ConTrOn's applications. Then, we elaborate
on promising approaches for learning key-value patterns from unstructured text.
2.1</p>
      <sec id="sec-2-1">
        <title>Ontology-Based Information Extraction</title>
        <p>
          Baclawski et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] summarized the current research tracks that combine ML,
information extraction, and ontologies techniques to solve complex problems,
such as OBIE. OBIE, as described by Wimalasuriya and Dou [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], is a system
that processes unstructured or semi-structured text to extract certain types of
information guided by ontologies and present the output as instances of those
ontologies. The extracted information from an OBIE system is used not only to
populate and enrich ontologies, but also to improve NLP work ows.
        </p>
        <p>
          Maynard et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] described NLP techniques for ontology population using
an OBIE. XONTO [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] proposed an OBIE system for semantic extraction of data
from PDF documents with the guide of ontologies. In contrast, Dal and Maria [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
suggested an ontology creation method using ML and external knowledge. They
extract concepts from documents using latent semantic analysis and clustering
techniques. Meanwhile, properties, axioms, and restrictions are retrieved from
WordNet.
        </p>
        <p>
          Barkschat [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] proposed an OBIE work ow that exploit technical data sheets
to populate ontologies using a classi er model and regular expressions. Likewise,
Smart-dog [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] extracts data from data sheets of spacecraft parts to populate an
ontology. It features an ontology enrichment, but relies on domain experts.
Meanwhile, Rizvi et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] included irrelevant terms and probably-relevant terms in
their ontology so that they can calculate the con dence score of the extracted
information.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Key-Value Patterns Extraction</title>
        <p>
          The dominant technique for extracting key-value pairs from unstructured text
is to use regular expressions. ReLIE [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] presented automatic approach of
regular expressions learning based on text from web pages and emails. However, it
requires a man-made regular expression to start the learning process. The full
automatic regular expressions generation is addressed by Brauer et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. They
used di erent features, which are word level and character level features, to form
regular expressions that are easily understandable and con gurable by experts.
        </p>
        <p>
          DeepDive [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] presented a knowledge-base construction system by performing
deep NLP to extract entities and relationships from web pages and ontology.
The extraction of entities is done using the external knowledge base, Freebase
(later Wikidata). To extract relationships between two entities, an SQL script is
needed. However, the extraction of entities and corresponding numeric literals
is not addressed.
        </p>
        <p>
          Chakraborty et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] proposed unsupervised (graph based) and supervised
(conditional random eld based) algorithms for extracting key-value pairs data
from advertisements. The unstructured advertising text is similar to data sheets
in the way that they both lack inherent grammar or a well-de ned dictionary.
        </p>
        <p>
          Machine learning techniques have been used by many studies on text
processing such as XSYSTEM [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and a study by Wang et al. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. XSYSTEM
extracts text pattern from structured text, i.e. text from databases. It is an
automated technique for extracting text pattern by incrementally learning on
di erent text features. Wang et al. focuses on a text classi cation task by using
Deep Convolutional Neural Networks combining with NLP techniques.
        </p>
        <p>
          Recently, the combination of regular expressions and machine learning
approaches are studied, e.g. by Locascio et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and Luo et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Locascio et
al. use a Recurrent Neural Network to generate regular expressions from text.
They also generate synthetic descriptions for the generated regular expressions.
However, the descriptions still requires human e ort to rephase them into more
natural descriptions. Luo et al. cope with the question-answering task by using
regular expressions combined with neural networks. They did not specify the
source of regular expressions, but their application is used to extract key-value
pairs from unstructured text.
        </p>
        <p>
          Another method to extract key-value patterns is to use Entity Matching
(EM). EM takes two collections of text as inputs, then matches the entities that
refer to a similar concept, e.g. \Big Apple" and \New York". Mudgal et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
presented Deep Learning (DL) solutions for EM. Their results show that DL
solutions outperform state-of-the-art learning-based EM solutions like Magellan
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] on textual data at the cost of training time. Although DL solutions became
popular recently, they still depend on human supervision, at least in the training
phase, as Doan et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] pointed out in their report.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>ConTrOn Overview</title>
      <p>ConTrOn o ers a solution to extract information from data sheets guided by
ontologies. In the process, the used ontologies are continuously enriched with
information from external semantic knowledge bases, thus adapting the
foundation of the extraction process to unforeseen terminologies. Figure 1 gives an
overview of ConTrOn's architecture. The remainder of this section will give an
overview of its modules and their relations.</p>
      <sec id="sec-3-1">
        <title>System</title>
        <p>Design Tools
(e.g., CATIA)
n
i
g
u
l
P
I
P
A</p>
        <p>Highlight
Detected Values</p>
      </sec>
      <sec id="sec-3-2">
        <title>Highlighted PDF</title>
      </sec>
      <sec id="sec-3-3">
        <title>Data Sheets</title>
        <p>Review
Extracted
Information
Corrected
Extracted
Information
Domain
Expert</p>
      </sec>
      <sec id="sec-3-4">
        <title>Parts</title>
      </sec>
      <sec id="sec-3-5">
        <title>Database</title>
        <p>Extracted Values
from Data Sheets</p>
      </sec>
      <sec id="sec-3-6">
        <title>Information</title>
      </sec>
      <sec id="sec-3-7">
        <title>Extractor  (IE)</title>
        <p>Extraction
Patterns</p>
      </sec>
      <sec id="sec-3-8">
        <title>Key-Value</title>
      </sec>
      <sec id="sec-3-9">
        <title>Pattern</title>
      </sec>
      <sec id="sec-3-10">
        <title>Learner (KPL)</title>
      </sec>
      <sec id="sec-3-11">
        <title>Ontologies</title>
        <p>Class Names
Augmented</p>
        <p>Entities
Domain Representing
Concepts</p>
      </sec>
      <sec id="sec-3-12">
        <title>External Semantic</title>
      </sec>
      <sec id="sec-3-13">
        <title>Knowledge Base (e.g., Wikidata)</title>
      </sec>
      <sec id="sec-3-14">
        <title>Ontology</title>
      </sec>
      <sec id="sec-3-15">
        <title>Enricher  (OE)</title>
      </sec>
      <sec id="sec-3-16">
        <title>Domain</title>
      </sec>
      <sec id="sec-3-17">
        <title>Knowledge</title>
      </sec>
      <sec id="sec-3-18">
        <title>Extractor  (DKE)</title>
      </sec>
      <sec id="sec-3-19">
        <title>ConTrOn PDF</title>
      </sec>
      <sec id="sec-3-20">
        <title>Data Sheets</title>
      </sec>
      <sec id="sec-3-21">
        <title>Lexical Database (e.g. WordNet)</title>
        <p>
          Domain Knowledge Extractor (DKE). The DKE extracts all terms from
all data sheets that might represent concepts and ranks them according to their
TF-IDF2 score. Subsequently, the terms are mapped to concepts whenever
possible employing WordNet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for disambiguation. Finally, high-ranked concepts
are considered domain representing concepts and are returned alongside their
WordNet de nitions.
        </p>
        <p>Ontology Enricher (OE). Classes in the ontologies may lack a description,
relations to other concepts, and alternative names. The OE retrieves the missing
information from external, semantic knowledge bases like Wikidata. For this,
it will match entities from the local ontologies to their counterparts in those
knowledge bases.</p>
        <p>
          If multiple candidate entities are found, their descriptions, including the
terms extracted by DKE, are represented using Doc2Vec [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] algorithm.
Using a Vector Space Model (VSM) and cosine similarity, the OE will now pick the
most similar candidate to a vector that represents the terms extracted by DKE
as a match.
2 Term Frequency-Inverse Document Frequency
        </p>
        <p>If no matching entity is found, OE retrieves synonyms and relevant terms of
the original terms from WordNet. These new terms are then used to retrieve a
new set of candidates from Wikidata and repeat the entity selection process.</p>
        <p>Information Extractor (IE). Using labels, alternative labels and
synonyms obtained from the DKE and OE as keys, the IE scans the data sheets for
associated values. Here, the assumption is that a value is most likely preceded by
the respective term such as \temperature 40 C" or \Output data: MIL1553B".
If no value can be found for a term this way, sentence or list patterns are applied
to widen the search scope.</p>
        <p>After the scan, all discovered terms and values are highlighted within the data
sheet and are annotated with a reason for the highlighting like \The highlighted
text (Life span: 5 Years) is corresponding to the Lifetime property".</p>
        <p>
          This base system consisting of DKE, OE, and IE was previously implemented,
integrated, and evaluated in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The proposed addition of a Key-Value Pattern
Learner (KPL) will be described in the following section.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Key-Value Pattern Learner (KPL)</title>
      <p>Based on the evaluation result of the aforementioned modules, the IE process can
be improved further if we involve domain experts in providing feedback on the
extracted concepts and their values. These experts are presented with data sheets
including the highlighted pieces of extracted information as shown in Figure 2.
They are then able to accept, reject, or edit each occurrence individually.</p>
      <p>Consider the example of an annotated data sheet in Figure 2(a). Here,
ConTrOn identi ed the phrase \(&gt; 20; 000)" as the value for the term \star catalog".
As this is incorrect, reviewers can intervene in one of several ways: The
annotation can be removed, or the annotation can be replaced by a xed value like
the string \available" or a boolean \true". Furthermore, there is the option to
preserve the original value phrase as a remark to this entry.</p>
      <p>
        Some manufacturers also use di erent terms for an entity, such as a property
\Mass" in Figure 2(b) is sometimes mentioned as \Weight". In a domain of space
system, these two terms di er due to the gravitational eld. However, we can
use ML techniques to solve the entity linking problem as suggested by Mudgal
et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>In Figure 2(c) ConTrOn missed highlighting a fact. Reviewers can now
manually add this entry by highlighting the respective phrases and annotate them
with the corresponding concepts.</p>
      <p>If reviewers adjust the extracted information in any way, then the Key-Value
Pattern Learner (KPL) will analyze the change. Rejected or edited entries are
passed through a part-of-speech (POS) tagger to identify a syntactic pattern.
For the example of the rejected phrase \(&gt; 20; 000)" this would return a pattern
of bracket+symbol+number+ noun+bracket. This can be interpreted as: 1) text
that is surrounded by brackets should be considered as a remark rather than a
value, and 2) a value for a keyword \star catalog" should not have a tag that
contains number+noun. Both interpretations will be translated into two regular
expressions, which will be fed into IE. The results obtained from IE will then be
used to decide on which regular expression, or the combination of both, yields
the most accurate result.</p>
      <p>Similarly to the example of missed information in Figure 2(c), the KPL would
learn the new term \Volume" and the pattern of its value. For the value the POS
tagger identi es a pattern of number+"x"+number+"x"+number+noun. An
entity recognizer will then extract the unit of measurement (\cm"), while a regular
expression generator translates the POS pattern into a regular expression like
[\d]+.?[\d]*\sx[\d]+.?[\d]*\sx\s[\d]+.?[\d]*\s cm. The learned patterns
will be used by IE to search for terms and their values.</p>
      <p>However, such patterns cannot be generated based on only one data sheet. We
aim to train a model that takes similar key-value pairs over multiple data sheets
as input and is able to generate similar regular expressions that the system did
not encounter so far. These generated expressions are then applied to the existing
corpus to validate them and extract further knowledge. Again, the extracted
key-value pairs resulting from these automatically generated patterns have to
be validated by a human expert following the general work ow as presented in
Section 3.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented our vision to automatically improve the information
extraction from data sheets by learning from user feedback. We discussed the
additions needed for ConTrOn, a system to semi-automatically build a knowledge
base for engineering parts from parsing data sheets with the help of domain
ontologies. In an ever changing eld ConTrOn continuously adapts these ontologies
based on user feedback by using external knowledge bases. Until a completely
automated yet su ciently robust work ow is reached, we have to rely on
expert users to review the extraction results. Using both NLP and ML techniques
these reviews themselves can be used to learn from past mistakes and over time
improve the extraction process.</p>
      <p>Our next step is to implement and evaluate the Key-Value Pattern Learner
module within the ConTrOn work ow. We expect this self-improving process to
decrease the number of extraction errors and thus lower the reviewing e orts
needed. Although our approach is created as a part of ConTrOn, the basic ideas
are domain-independent and can therefore be re-used in other applications that
require automatic information extraction from unstructured text.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baclawski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berg-Cross</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fritzsche</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sriram</surname>
          </string-name>
          , R. D., and
          <string-name>
            <surname>Westerinen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Ontology summit 2017 communique - ai, learning, reasoning and ontologies</article-title>
          .
          <source>Applied Ontology</source>
          <volume>13</volume>
          (
          <year>2017</year>
          ),
          <volume>3</volume>
          {
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barkschat</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Semantic information extraction on domain speci c data sheets</article-title>
          .
          <source>In ESWC</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brauer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rieger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mocan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Barczynski</surname>
          </string-name>
          , W. M.
          <article-title>Enabling information extraction by inference of regular expressions from sample entities</article-title>
          .
          <source>In CIKM</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nyarko</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Extraction of (key, value) pairs from unstructured ads</article-title>
          .
          <source>In AAAI Fall Symposia</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Maria</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Simple method for ontology automatic extraction from documents</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>3</volume>
          ,
          <issue>12</issue>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ardalan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballard</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Govind</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mudgal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , PaulSuganthanG.,
          <string-name>
            <surname>C. S. G.</surname>
          </string-name>
          , and Zhang, H.
          <article-title>Human-in-the-loop challenges for entity matching: A midterm report</article-title>
          .
          <source>In HILDA@SIGMOD</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Wordnet : an electronic lexical database</article-title>
          . vol.
          <volume>76</volume>
          , JSTOR, p.
          <fpage>706</fpage>
          . Available at https://doi.org/10.2307/417141.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ilyas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>da Trindade</surname>
            ,
            <given-names>J. M. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Extracting syntactic patterns from databases</article-title>
          .
          <source>CoRR abs/1710</source>
          .11528 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Konda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , PaulSuganthanG.,
          <string-name>
            <given-names>C. S. G.</given-names>
            ,
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ardalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ballard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Panahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Naughton</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deep</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Raghavendra</surname>
          </string-name>
          , V. Magellan:
          <article-title>Toward building entity matching management systems</article-title>
          .
          <source>PVLDB 9</source>
          (
          <year>2016</year>
          ),
          <volume>1197</volume>
          {
          <fpage>1208</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>CoRR abs/1405</source>
          .4053 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnamurthy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Jagadish</surname>
          </string-name>
          , H. V.
          <article-title>Regular expression learning for information extraction</article-title>
          .
          <source>In EMNLP</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Locascio</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeLeon</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Kushman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Barzilay</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Neural generation of regular expressions from natural language with minimal domain knowledge</article-title>
          .
          <source>In EMNLP</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Marrying up regular expressions with neural networks: A case study for spoken language understanding</article-title>
          .
          <source>In ACL</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <article-title>Nlp techniques for term extraction and ontology population</article-title>
          .
          <source>In Ontology Learning and Population</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mudgal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rekatsinas</surname>
            ,
            <given-names>T. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deep</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcaute</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Raghavendra</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Deep learning for entity matching: A design space exploration</article-title>
          .
          <source>In SIGMOD Conference</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Murdaca</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berquand</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riccardi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerene</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brauer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Knowledge-based information extraction from datasheets of space parts</article-title>
          .
          <source>In 8th International Systems &amp; Concurrent Engineering for Space Applications Conference (September</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Re</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shavlik</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Deepdive</surname>
          </string-name>
          :
          <article-title>Web-scale knowledgebase construction using statistical learning and inference</article-title>
          .
          <source>In VLDS</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          <article-title>Ontology development 101: A guide to creating your rst ontology</article-title>
          .
          <source>Tech. rep., March</source>
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Opasjumruskit</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schindler</surname>
          </string-name>
          , S. Contron:
          <article-title>Continuously trained ontology based on technical data sheets and wikidata</article-title>
          . Available at http: //arxiv.org/pdf/
          <year>1906</year>
          .06752.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Oro</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ruffolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>XONTO: An ontology-based system for semantic information extraction from PDF documents</article-title>
          .
          <source>2008 20th IEEE International Conference on Tools with Arti cial Intelligence</source>
          <volume>1</volume>
          (nov
          <year>2008</year>
          ),
          <volume>118</volume>
          {
          <fpage>125</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rizvi</surname>
            ,
            <given-names>S. T. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agne</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erkel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dengel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Ontology-based information extraction from technical documents</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Agents and Arti cial Intelligence</source>
          (
          <year>2018</year>
          ), SCITEPRESS - Science and Technology Publications.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and Krotzsch,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          ,
          <issue>10</issue>
          (Sept.
          <year>2014</year>
          ),
          <volume>78</volume>
          {
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Combining knowledge with deep convolutional neural networks for short text classi cation</article-title>
          .
          <source>In IJCAI</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Wimalasuriya</surname>
            ,
            <given-names>D. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Ontology-based information extraction: An introduction and a survey of current approaches</article-title>
          .
          <source>Journal of Information Science</source>
          <volume>36</volume>
          (
          <year>2010</year>
          ),
          <volume>306</volume>
          {
          <fpage>323</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>