<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Open Information Extraction on German Wikipedia Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Klose</string-name>
          <email>christian.klose@fau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhou Gui</string-name>
          <email>zhou.gui@fau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Harth</string-name>
          <email>andreas.harth@fau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nürnberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>[1] Friedrich-Alexander-Universität Erlangen-Nürnberg, Chair of Technical Information Systems</institution>
          ,
          <addr-line>Lange Gasse 20, 90403</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Knowledge Graphs are becoming a fundamental building block for semantic search and voice assistants. This paper deals with the automated Knowledge Graph Construction from unstructured data. Predominantly, the focus is on Open Information Extraction (Open IE), an unsupervised learning approach that attempts to extract triples from plain text independent of their domain. Hence, it is the first step towards automated Knowledge Graph Construction. Previous work mainly applied Open IE to English texts. In this paper, the focus is on German texts. Due to the lack of German Open Information Extraction datasets, a dataset on the basis of Wikipedia is created. Two Open Information Extraction Systems for German are introduced. Finally, the performance of the systems are evaluated.</p>
      </abstract>
      <kwd-group>
        <kwd>Open Information Extraction</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Knowledge Graph Construction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In his vision of the Semantic Web [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Tim Berners-Lee described a change from the Web of
documents for and by people to a Web of information. According to his vision, information on
the Web should not only be manipulable by humans, but also by machines. Most documents in
the World Wide Web consist to a large extent of text and are still dificult for machines to process
today. For this reason, the W3C1 has developed a universal language Resource Description
Framework (RDF), which makes information for machines on the Web accessible. Information
in RDF can be serialized in multiple formats. One common format is Turtle. Turtle is a text
representation of an RDF Graph which allows to store RDF triples in a compact and human
readable form. A large collection of RDF Graphs in a specific domain can construct a Knowledge
Graph (KG). Virtual assistants in particular can make use of facts, events and abstract concepts
stored in Knowledge Graphs to bring insights to people during semantic search or question
answering.
      </p>
      <p>
        In order to build Knowledge Graphs, knowledge can be extracted in the form of triples from
documents that are available in natural language. The transformation from text into a machine
readable form is, therefore, a core task for building Knowledge Graphs. It can be broadly
summarized as the goal of Machine Reading [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the field of AI, Machine Reading is a long
CEUR
Workshop
Proceedings
standing goal and is discussed in the research community under the term Information Extraction
(IE). IE includes downstream tasks such as Named Entity Recognition (NER), Relation Extraction
(RE) or Entity Linking (EL). In recent years, an unsupervised approach to Relation Extraction,
namely Open Information Extraction, has shown promising results and is therefore, the main
subjective of this paper. The paper is structured as follows: In Section 2 the previous work is
outlined. Thereafter, in Section 3, the scientific approach applied for our research is described.
In Section 4 the results are presented and discussed. Finally, in Section 5 a conclusion is drawn.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Open Information Extraction is about extracting all possible triples from text, without knowing
the relations or entities occurring in it a priori. The first Open IE system ever created is Text
Runner [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a learning-based system developed by a group of researchers at the University
of Washington. In the years to follow, other systems were introduced, each attempting to
improve on the results of the state-of-the-art by overcoming identified weaknesses and flaws
in the systems. In addition, other types than learning-based system emerged, namely
rulebased, clause-based and systems making use of inter-proposition relationships [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Rule-based
systems entirely depend on hand-crafted rules or patterns. Systems that make use of this
approach are, for example, KrakeN [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or Exemplar [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In order to improve the precision of the
systems described above, the idea of breaking down complex sentences into smaller components
(clauses) came up. Two Clause-based Systems in particular are worth mentioning: ClausIE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and Stanford Open IE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The system types mentioned so far have one common weakness.
None of them is using the context and, therefore, a correct extraction cannot be guaranteed.
Inter-Proposition-based Systems are trying to bridge this gap. Systems of this category, for
example, are RelNoun [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], OpenIE4 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], NestIE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and MinIE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        One of the first to apply neural networks to Open IE were [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] with RnnOIE. The scientists
formulated the problem as a sequence labeling task. Recently proposed models that follow a
sequence labeling approach are SenseOIE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], SpanOIE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and iRankOIE [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. A downside of
this discovered by [17] is, however, that sequence labeling models are not able to change the
sentence structure or use new auxiliary words in the extraction. [18] used a diferent neural
approach called sequence generation to develop CopyAttention and overcome that downside.
Furthermore, [19, 20, 17] describe end-to-end approaches using seq2seq models based on the
encoder-decoder framework alleviating the downsides and the need for hand-crafted patterns.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Method</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Creation</title>
        <p>One of the main challenges within Open IE is to verify the quality of extractions made by the
system. A solution to this is to have a dataset, that is to find qualitative training and testing data
where triples are mapped to sentences. Currently, two approaches to automatically generate
training data are considered particularly useful among researchers. First, the infobox-matching
approach [21] where Wikipedia infobox values are linked to sentences in the corpus and second,
the distantly supervised approach [22] where existing knowledge bases are used to heuristically
map triples to sentences.</p>
        <p>For the creation of the German Wikipedia dataset, the infobox-matching approach is used. In
total, 5 steps were executed to create a clean dataset including 1) finding and downloading
a Wikipedia dataset 2) prepossessing and cleaning the text 3) matching all infobox triples to
the correspond page text, 4) matching the triples on a sentence level and 5) filtering out noisy
training examples. After the last step, a number of 6, 453 triples mapped to 5, 372 sentences
containing 1, 324 relation types was derived. Furthermore, the average number of words used
in each part was calculated. On average, the subject has 2.0, the predicate 1.0 and the object 1.5
words. Last but not least, the average length of a sentence was computed and amounts to 22.8
words. The dataset and the code used to create the dataset are published and freely accessible.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Open IE Systems</title>
        <p>
          In total, two systems were implemented and used for our research. The first system is turCy
and we implemented turCy as a spaCy3 pipeline component to leverage the POS Tagger and
Dependency Parser (DP). The second system uses an encoder-decoder seq2seq neural model
that we call NeuralGerOIE.
3.2.1. TurCy
Research by [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and [23] implies that with a decent amount of POS and DP patterns, a large
variety of triples can be extracted - independently of any other constraints. TurCy is following a
similar approach. In fact, it is a pattern learning system for binary extractions that is assembled
of two essential components. The first is the Pattern Builder, the second is the Triple Extractor.
A pattern consists of nodes that represent the POS-tags of a sentence. The relations between
the nodes reflect the dependencies parse tree. A pattern with respect to a sentence can be used
to represent exactly one triple. The pattern itself consists of subpatterns. Each subpattern
represents a node in a tree and maps a word with left and right child nodes. For the sentence:
”Im Jahr 2019 zählte Nürnberg 518370 Bewohner.” the POS and dependency tree is shown in
Figure 1.
        </p>
        <p>The Triple Extractor of turCy is - at its core - a recursive sub-tree search using the Pattern List.
The algorithm starts at the root node, traverses each path up to the leaf of the tree and checks
whether a node and its edge match a sub-pattern of a respective pattern. If all sub-patterns
match, the triple is assembled and stored with respect to a sentence. Notice, a match implies
that at last one token of all three parts (subject, predicate, object) of a triple were found during
the recursive sub-tree search in the sentence. However, the true nature of the algorithm is more
complicated. If the interested reader wants to fully understand the working mechanism, we
recommend diving into the code. Therefore, and also to ensure full transparency of our research,
turCy has been packaged as a python library and is released under an open-source license.4</p>
        <sec id="sec-3-2-1">
          <title>2https://github.com/ChrisDelClea/WikiGerman4OIE 3Library for advanced natural language processing: https://spacy.io 4https://github.com/ChrisDelClea/turCy</title>
          <p>3.2.2. NeuralGerOIE
5 The latest state-of-the-art Open IE systems use seq2seq neural networks. Therefore, the
second system developed follows the sequence generation approach. and was created using the
Simple Transformers6 library. For training of the model, the WikiGerman4OIE dataset (3.1)
was utilized. In addition, a pre-trained BART model for German was used.7 It is important
to mention that the model was trained to output multiple extractions within the scope of one
subject.</p>
          <p>We fine-tuned the model for 10 epochs using a batch size of 8. The maximum length of the
input sentence was set to 300, i.e. all words thereafter were truncated. A diference between the
seq2seq models discussed earlier and the approach described here, is the type of separator used.
While [18] used start and end tags for each part of a triple (&lt;arg1&gt; Deep Learning &lt;/arg1&gt;&lt;rel&gt; is
a subfield of &lt;/rel&gt;&lt;arg2&gt; Machine Learning &lt;/arg2&gt;), we found that one token before each
part along with a final end token was suficient. In addition, the model struggled to output the
same separator token multiple times within a sequence. Therefore, we added a number to each
separator token. Lastly, we noticed that the names of the separator tokens afected the quality
of the outputs. We proceeded with the following triple input: &lt;sub&gt; Deep Learning &lt;rel0&gt; is
a subfield of &lt;obj0&gt; Machine Learning &lt;end&gt;. In total, the fine-tuning took about 2 hours
25 minutes on a Nvidia Tesla V100 32GB GPU to complete. The results are discussed in the
subsequent section.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>5https://github.com/ChrisDelClea/NeuralGermanOIE 6https://simpletransformers.ai/ 7https://huggingface.co/Shahm/bart-german</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Each of the Open IE systems is evaluated against a gold dataset. The reason for high-quality
annotations originates from the need to obtain more accurate insights regarding the quality of
the extractions. Therefore, a subset of the WikiGerman4OIE dataset (3.1) was annotated by two
of the authors. In total, the gold dataset consists of 47 sentences and 175 triples. On average,
a sentence contains 3.8 triples. This subset is the basis for the evaluation process. Regarding
the quantitative metrics, precision, recall, and  1-score is computed using the token-based
evaluation method introduced by [24] in a slightly adjusted manner to fit the systems outputs.
The evaluation process is as follows: First, the full WikiGermanOIE dataset for turCy-large and
the gold dataset for turCy-small was used to create the patterns with the Pattern Builder. The
result were two pattern lists with sizes of 6,453 and 175 patterns, respectively, corresponding
to the number of triples in each dataset. Second, the 47 sentences were fed as the only input
into the Triple Extractor and the NeuralGerOIE prediction function. Lastly, precision, recall
and  1-score were calculated.</p>
      <p>In general, we found that turCy-small achieves a better  1-score as NeuralGerOIE due to its
ability to extract many triples. At the same time, we noticed that the NeuralGerOIE obtained a
very high precision for sentences annotated with a single triple.</p>
      <p>A comparison between turCy-small and turCy-large (the only diference is in the number of
patterns and their dataset of origin) indicates that, the quality of the automatically generated
dataset is lower than initially expected. The reason for this assumption is that while trucy-large
contains a high number of patterns build from the automatically generated dataset, only 29
triples were extracted. TurCy-small, on the other hand, yielded significantly more extractions
with a lower number of patterns created from the gold dataset. In fact, the result is very
counter-intuitive, as one would expect the number of extractions to be linearly correlated with
the number of patterns. Moreover, when comparing turCy and NeuralGerOIE, we found that
while turCy can only output words from the text in the extractions, the neural model can learn
a direct representation between the words used in the text and the corresponding words in the
gold dataset.</p>
      <p>In addition to the quality analysis, we examined the run-time as Open IE systems are intended
to process large amounts of text data at rapid pace. In order to make a judgment about the
run-time, basically two metrics are taken into consideration. First, is the number sentences
processed per second (# sent./sec.). Second, is the number of triples yielded per second (#
triples/sec.).</p>
      <p>Furthermore, the ratio of these two metrics with respect to the number of stored patterns is of
interest. Table 2 shows that turCy-small is the best performing system, followed by
NeuralGerOIE. As expected, the number of patterns has a major impact on the run-time - as we can
see with turCy-large - leading to the conclusion that one of the main objectives of rule-based
Open IE systems is to keep the number of patterns as small as possible, but as large as necessary
to maximize the number of extractions.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion &amp; Outlook</title>
      <p>In this paper the contribution is twofold. First, a dataset for Open Information Extraction based
on German Wikipedia texts were created and published. Second, two diferent approaches for
Open IE were implemented and evaluated. Several interesting research directions for future
works can be recommended. We firmly believe that there is still potential for improvement
in terms of dataset quality and quantity. For instance, in order to improve the quality, a
crowd-sourcing platform such as Amazon Mturk could be leveraged. In addition, the distantly
supervised approach for automated training data generation can be explored. In doing so, it
would also help to determine what impact the applied approach for automated training data
generation has on the quality of extractions made by the Open IE system.</p>
      <p>Finally, the two Open IE systems can be further optimized to yield better results. For example,
for the rule-based system turCy, tree pruning approaches can be explored to reduce the overall
number of patterns and therefore, improve the extraction speed. Regarding NeuralGerOIE
multi-subject extractions, diferent architectures and the utilization of more recent language
models might be considered.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research paper was created within the scope of the project: Software Campus 2.0 (FAU)
Grant number 01IS17045. The project was funded by the German government, therefore, we
would kindly thank them for their sponsorship.
learning, arXiv preprint arXiv:1905.13413 (2019).
[17] K. Kolluru, S. Aggarwal, V. Rathore, S. Chakrabarti, et al., Imojie: Iterative memory-based
joint open information extraction, arXiv preprint arXiv:2005.08178 (2020).
[18] L. Cui, F. Wei, M. Zhou, Neural open information extraction, arXiv preprint
arXiv:1805.04270 (2018).
[19] P.-L. H. Cabot, R. Navigli, Rebel: Relation extraction by end-to-end language generation,
in: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp.
2370–2381.
[20] K. Kolluru, V. Adlakha, S. Aggarwal, S. Chakrabarti, et al., Openie6: Iterative grid labeling
and coordination analysis for open information extraction, arXiv preprint arXiv:2010.03147
(2020).
[21] F. Wu, D. S. Weld, Open information extraction using wikipedia, in: Proceedings of the
48th annual meeting of the association for computational linguistics, 2010, pp. 118–127.
[22] M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extraction without
labeled data, in: Proceedings of the Joint Conference of the 47th Annual Meeting of the
ACL and the 4th International Joint Conference on Natural Language Processing of the
AFNLP, 2009, pp. 1003–1011. URL: https://www.aclweb.org/anthology/P09-1113.
[23] Mausam, M. Schmitz, S. Soderland, R. Bart, O. Etzioni, Open language learning for
information extraction, in: Proceedings of the 2012 Joint Conference on Empirical Methods
in Natural Language Processing and Computational Natural Language Learning, 2012, pp.
523–534. URL: https://www.aclweb.org/anthology/D12-1048.
[24] W. Lechelle, F. Gotti, P. Langlais, Wire57: A fine-grained benchmark for open information
extraction (2019) 6–15.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. HENDLER</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. LASSILA</surname>
          </string-name>
          ,
          <article-title>The semantic web</article-title>
          ,
          <source>Scientific American</source>
          (
          <year>2001</year>
          )
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          , Machine reading.,
          <year>2007</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadhead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <article-title>Open information extraction from the web</article-title>
          ,
          <source>IJCAI International Joint Conference on Artificial Intelligence</source>
          (
          <year>2007</year>
          )
          <fpage>2670</fpage>
          -
          <lpage>2676</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Niklaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Freitas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Handschuh</surname>
          </string-name>
          ,
          <article-title>A survey on open information extraction</article-title>
          , arXiv preprint arXiv:
          <year>1806</year>
          .
          <volume>05599</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Löser,
          <article-title>KrakeN: N-ary facts in open information extraction</article-title>
          ,
          <source>in: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>56</lpage>
          . URL: https://www.aclweb.org/anthology/ W12-3010.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Barbosa</surname>
          </string-name>
          ,
          <article-title>Efectiveness and eficiency of open relation extraction (</article-title>
          <year>2013</year>
          )
          <fpage>447</fpage>
          -
          <lpage>457</lpage>
          . URL: https://www.aclweb.org/anthology/D13-1043.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Del Corro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          ,
          <article-title>Clausie: clause-based open information extraction</article-title>
          ,
          <source>in: Proceedings of the 22nd international conference on World Wide Web</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>355</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Angeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J. J.</given-names>
            <surname>Premkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Leveraging linguistic structure for open domain information extraction</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pal</surname>
          </string-name>
          , et al.,
          <article-title>Demonyms and compound relational nouns in nominal open ie</article-title>
          ,
          <source>in: Proceedings of the 5th Workshop on Automated Knowledge Base Construction</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mausam</surname>
          </string-name>
          ,
          <article-title>Open information extraction systems and downstream applications</article-title>
          ,
          <source>in: Proceedings of the twenty-fith international joint conference on artificial intelligence</source>
          ,
          <source>2016</source>
          , pp.
          <fpage>4074</fpage>
          -
          <lpage>4077</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhutani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radev</surname>
          </string-name>
          ,
          <article-title>Nested propositions in open information extraction</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gashteovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          , L. d. Corro,
          <article-title>Minie: minimizing facts in open information extraction</article-title>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stanovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dagan</surname>
          </string-name>
          , Supervised open information extraction,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>885</fpage>
          -
          <lpage>895</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Supervising unsupervised open information extraction models</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>728</fpage>
          -
          <lpage>737</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Span model for open information extraction on accurate corpus</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>9523</fpage>
          -
          <lpage>9530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          , G. Neubig,
          <article-title>Improving open information extraction via iterative rank-aware</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>