<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dependency Parser on Open Information Extraction for Portuguese Texts - DptOIE and DependentIE on IberLEF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael Glauber</string-name>
          <email>rglauber@dcc.ufba.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Barreiro Claro B[</string-name>
          <email>dclaro@ufba.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leandro Souza de Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Formalisms and Semantic Applications Research Group (FORMAS) LASiD/DCC/IME Federal University of Bahia</institution>
          ,
          <addr-line>Salvador, Bahia</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>442</fpage>
      <lpage>448</lpage>
      <abstract>
        <p>This paper describes the participation of the DependentIE and DptOIE systems in the Iberian Languages Evaluation Forum 2019. Our activities have focused on the \General Open Relation Extraction" task of relation extraction for Portuguese texts. We describe the choices adopted during the challenge, as well as the systems performed and their results.</p>
      </abstract>
      <kwd-group>
        <kwd>Shared Task Open Information Extraction Relation Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Extract information from large repositories of texts within di erent domains is a
hard task for humans. While the quantity and diversity of textual content grow
on the Web, the traditional IE tools have low coverage in this scenario [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In
the study conducted by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] the authors proposed a new approach called Open
Information Extraction (Open IE) that extracts facts from a sentence in the
following triple format:
triple = (arg1; rel; arg2)
(1)
where arg1 and arg2 are nominal phrases in a sentence and rel establishes
a relationship between arg1 and arg2 through a verb phrase. Open IE systems
are useful in web-scale issues such as question answering and document ltering
systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The Iberian Languages Evaluation Forum (IberLEF 2019) organized
a Portuguese named entity recognition (NER) and relation extraction (RE) tasks
      </p>
    </sec>
    <sec id="sec-2">
      <title>R. Glauber et al.</title>
      <p>
        which included Open IE task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Participants to this task should apply their
systems/methods in activities related to NER or RE in Portuguese sentences.
We applied two di erent Open IE systems in one task for the RE problem:
{ Task 3: General Open Relation Extraction
      </p>
      <p>In this work, we describe our Open IE systems and their results, as well as
the choices and problems faced to perform this task. Our systems were based
on dependency analysis and handcrafted rules to extract facts from Portuguese
sentences. We participate with two of our systems: DependentIE1 and DptOIE2.</p>
      <p>This paper is organized as follows: section 2 describes the problem statement;
Section 3 presents our methods DependentIE and DptOIE; Section 4 describes
our setup and section 5 presents our evaluation. Section 6 presents our results
and we conclude in Section 7.
2</p>
      <sec id="sec-2-1">
        <title>Problem Statement</title>
        <p>The organization of the IberLEF (Iberian Languages Evaluation Forum) forum
proposes a task that involves the automatic extraction of any relation descriptor
expressing any semantic relation between a pair of entities or concepts mentioned
in Portuguese sentences. In this task, the coordinators consider a relation
description as a text chunk that describes the explicit semantic relation, occurring
between two entities or noun phrases in a sentence.</p>
        <p>The task was divided into two di erent tests. The rst one, the participants
must extract the relation descriptors between NP pairs from data provided by
the coordinators. This data was annotated with NP information, and as a
consequence, do not need to employ a NER system by participants. The second one,
the data provided was not annotated with NP information. The goal of the task
was to extract and classify the NPs from the test sentences, and then extract
the relation descriptors between pairs of the NPs. We submitted our methods to
both Test 1 and Test 2 of Task 3.
3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Our methods</title>
        <p>
          We participate in the IberLEF 2019 with two Open IE systems. The rst of
them, the DependentIE is an Open IE system for Portuguese sentences [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. As
well as ArgOE [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and ClausIE [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] we use a Dependence Parser (DP) to identify
clauses3 (useful parts of a sentence). In this work, a clause is one of the following
parts of a sentence: subject (S), direct and indirect objects (O), verb (V), adverb
(A), complement (C) and modi er (M). Our method extracts facts using clauses
based on the standard SV (Subject - Verb). The arguments are detected through
1 http://formas.ufba.br/dclaro/tools.html#dependentie
2 http://formas.ufba.br/dclaro/tools.html#dptoie
3 The clauses consist of a subject and a verb and their constituents, such as objects
(direct and indirect), adverbs and others.
a deep-search in the sentence dependency. It uses Malt Parser as the Dependence
Parser.
        </p>
        <p>
          The second system DptOIE is an evolution of DependentIE system [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It
uses Stanford's Dependency Parser, speci c rules for extracting facts in
Portuguese sentences, it adapts the depth- rst search to explore the dependency
tree, and it handles particular cases in sentences with coordinate conjunctions,
subordinate clauses, and appositives. Furthermore, DptOIE is open to other
dependency parsers, since sentences are in CoNLL-U and Universal Dependencies
v2.1 Brazilian treebank format.
4
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Setup</title>
        <p>Both systems, DependentIE4 and DptOIE5, used to perform this task are
available for download on FORMAS website. Our systems generate an output le in
comma-separated values (CSV) format. For Test 1, each system extracted the
facts contained in the test sentences. Then, each pair of NP contained in the test
le is compared with the arguments of the facts extracted by both systems. For
the comparison between the arguments of the extracted facts and the NPs of the
test le, the following characters were ignored: \ , . ( ) [ ] ? !. Moreover, to avoid
minor divergences in the comparison of strings we removed a set of stopwords6.
When identifying a pair of arguments in the output le of systems similar to an
NPs pair of the test le, the text fragment corresponding to the relationship is
selected as a result of Test 1.</p>
        <p>Test 2 follows the free form suggested by the Open IE task. After running
both systems for the set of test sentences, the next step is to convert our output
format from CSV to the required format of IberLEF 2019.
5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Evaluation</title>
        <p>
          Two scores were considered for the evaluation of Task 3: a completely correct
relations score and a partially correct relations score [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Completely Correct
Relations (CCR) occurs when all terms that make up the relation descriptors in
the key are equal to the relations descriptors of the system's output. The score
for each completely correct relation is 1, which represents a full hit. Partially
correct relationships (PCR) occurs when at least one of the terms in the
relation descriptors of the system's output corresponds to a term in the relation
descriptors of the key.
4 http://formas.ufba.br/dclaro/tools.html#dependentie
5 http://formas.ufba.br/dclaro/tools.html#dptoie
6 List of stopwords at https://github.com/stopwords-iso/stopwords-pt/blob/
master/stopwords-pt.txt
Since Open Relation Extraction identi es all possible information, and the
sentences adopted in the evaluation of Test 2 are the same as Test 1 and training
datasets, we did four di erent evaluations to provide a full panorama of the
performance of our systems:
{ Considering only the relationships in Test 2 golden dataset;
{ Considering the relationships in Test 2 golden dataset and disregarding the
relationship in the training dataset;
{ Considering the relationships in Test 2 golden and Test 1 golden dataset and
disregarding the relationship in the training dataset;
{ Considering the relationships in all three datasets;
        </p>
        <p>
          All datasets used are available at http://www.inf.pucrs.br/linatural/
wordpress/iberlef-2019/. The details of the performed measures and datasets
are described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
6
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Results</title>
        <p>We organized the results of Task 3, considering the values obtained in Tests 1 and
2. Table 1 presents the results obtained by both systems performing the exact
measures in Test 1. The values for all measures are not very expressive. Still,
DptOIE has a slight advantage in comparing both systems. Next, we present
the results for the partial measures in Table 2. Although the values obtained
with the partial measures are better for both systems, Test 1 proved to be
challenging to solve. The values obtained were very low for both systems in any
of the experimental setup. The activity of identifying entities that are part of
the arguments of a fact extracted by an Open IE system was the cause of part
of the errors introduced. The arguments of the facts are NPs that contains other
fragments of the sentence. Even when removing stopwords, other lters should
be considered.</p>
        <p>Another critical aspect in Test 1 is that the attempt to improve the measures,
with a partial score, generated a little impact on the outcome. The increase in
the values of Precision, Recall, and F-measures was small by the scale of the
values presented.</p>
        <p>Figures 1 and 2 present the results obtained for Test 2 in the four setups
proposed by the coordinators. In this test, we were able to identify the best
performance of DptOIE in performing the task. The di erence is more signi
cant when comparing the values for the partial measures between both systems.
DptOIE presents higher values for precision and Recall, in addition to greater
harmony between these measures, which generated higher values of F-measure.</p>
        <p>For Test 2, the execution of partial scores generates a signi cant impact on
the outcome. There's an improvement in the results for the DptOIE. In addition,
the DependentIE precision result gets a large increase. Although this type of
evaluation approach generates better results, one aspect should be considered:
the absence of some terms in the arguments of the facts extracted by the Open
IE systems may indicate invalid facts, and this cannot be discarded from a more
fair evaluation.
7</p>
      </sec>
      <sec id="sec-2-6">
        <title>Conclusions</title>
        <p>This paper described the participation of the DependentIE and DptOIE systems
in IberLEF 2019. Both systems were submitted to the \General Open
Information Extraction" task through Test 1 and Test 2. In particular, the DptOIE
system presented the best results. When the values for Test 2 are analyzed, it
becomes more evident.</p>
        <p>In general, the Open IE task presents the trade-o of approaches that
prioritize greater coverage. While discovering a more signi cant number of
relationships, the precision values obtained by Open IE systems are low. The results
obtained in the participation of the DependentIE and DptOIE systems con rm
this problem.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Acknowledgement</title>
        <p>This study was nanced in part by the Coordenaca~o de Aperfeicoamento de
Pessoal de Nivel Superior - Brasil (CAPES) - Finance Code 001.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>R. Glauber et al.</title>
      <sec id="sec-3-1">
        <title>DEPENDENTIE-EP</title>
      </sec>
      <sec id="sec-3-2">
        <title>DEPENDENTIE-EDREPENDENTIE-EF DPTOIE-EP DPTOIE-ER</title>
        <p>(a) Exact - Evaluation 1
DPTOIE-EF</p>
      </sec>
      <sec id="sec-3-3">
        <title>DEPENDENTIE-EP</title>
      </sec>
      <sec id="sec-3-4">
        <title>DEPENDENTIE-EDREPENDENTIE-EF DPTOIE-EP DPTOIE-ER</title>
        <p>(b) Exact - Evaluation 2
DPTOIE-EF
0.25
0.20
0.15
0.10
0.05
0.00
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00</p>
      </sec>
      <sec id="sec-3-5">
        <title>DEPENDENTIE-EP</title>
      </sec>
      <sec id="sec-3-6">
        <title>DEPENDENTIE-EDREPENDENTIE-EF DPTOIE-EP DPTOIE-ER</title>
        <p>(c) Exact - Evaluation 3
DPTOIE-EF</p>
      </sec>
      <sec id="sec-3-7">
        <title>DEPENDENTIE-EP</title>
      </sec>
      <sec id="sec-3-8">
        <title>DEPENDENTIE-EDREPENDENTIE-EF DPTOIE-EP DPTOIE-ER</title>
        <p>(d) Exact - Evaluation 4
DPTOIE-EF
Precision
Recal
F-measure
Precision
Recal
F-measure
0.25
0.20
0.15
0.10
0.05
0.00
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00</p>
      </sec>
      <sec id="sec-3-9">
        <title>DEPENDENTIE-PDREPENDENTIE-PF DPTOIE-PP DPTOIE-PR</title>
        <p>(a) Partial - Evaluation 1
Precision
Recal
F-measure</p>
      </sec>
      <sec id="sec-3-10">
        <title>DEPENDENTIE-PDREPENDENTIE-PF DPTOIE-PP DPTOIE-PR</title>
        <p>(c) Partial - Evaluation 3</p>
      </sec>
      <sec id="sec-3-11">
        <title>DEPENDENTIE-PP</title>
      </sec>
      <sec id="sec-3-12">
        <title>DEPENDENTIE-PDREPENDENTIE-PF DPTOIE-PP DPTOIE-PR</title>
        <p>(d) Partial - Evaluation 4
DPTOIE-PF
0.30
0.25</p>
      </sec>
      <sec id="sec-3-13">
        <title>DEPENDENTIE-PP</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broadhead</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Open information extraction from the web</article-title>
          .
          <source>In: Proceedings of IJCAI. vol. 7</source>
          , pp.
          <volume>2670</volume>
          {
          <issue>2676</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , a Xavier,
          <string-name>
            <surname>C.C.</surname>
          </string-name>
          :
          <article-title>Portuguese named entity recognition and relation extraction tasks at iberlef</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Del</given-names>
            <surname>Corro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Gemulla</surname>
          </string-name>
          , R.:
          <article-title>Clausie: clause-based open information extraction</article-title>
          .
          <source>In: Proceedings of WWW</source>
          . pp.
          <volume>355</volume>
          {
          <fpage>366</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In: Proceedings of EMNLP</source>
          . pp.
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multilingual open information extraction</article-title>
          .
          <source>In: Proceedings of EPIA</source>
          . pp.
          <volume>711</volume>
          {
          <fpage>722</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
          </string-name>
          , D.B.:
          <article-title>A systematic mapping study on open information extraction</article-title>
          .
          <source>Expert Systems with Applications</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. de Oliveira,
          <string-name>
            <given-names>L.S.</given-names>
            ,
            <surname>Claro</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.B.</surname>
          </string-name>
          :
          <article-title>DPTOIE: Um Metodo para Extraca~o de Informaca~o Anerta na L ngua Portuguesa baseado em Analise de Depend^encia</article-title>
          .
          <source>Master's thesis</source>
          ,
          <source>Universidade Federal da Bahia</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. de Oliveira,
          <string-name>
            <given-names>L.S.</given-names>
            ,
            <surname>Glauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Claro</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.B.</surname>
          </string-name>
          :
          <article-title>Dependentie: An open information extraction system on portuguese by a dependence analysis</article-title>
          .
          <source>In: Proceedings of ENIAC</source>
          . pp.
          <volume>271</volume>
          {
          <fpage>282</fpage>
          .
          <string-name>
            <surname>FC-UFU</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>