<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Pragmatic Open Information Extraction for Portuguese Text - ICEIS17, InferPortOIE and PragmaticOIE on IberLEF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael Glauber</string-name>
          <email>rglauber@dcc.ufba.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Barreiro Claro B[</string-name>
          <email>dclaro@ufba.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cleiton Fernando Lima Sena</string-name>
          <email>cflsena2@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FORMAS Research Group, LaSiD/DCC/UFBA Federal University of Bahia</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>449</fpage>
      <lpage>456</lpage>
      <abstract>
        <p>This paper describes the participation of the FORMAS research group with the systems ICEIS17, InferPortOIE, and PragmaticOIE in the Iberian Languages Evaluation Forum 2019. Our activities have focused on the \General Open Relation Extraction" task of relation extraction for Portuguese texts. We present our choices on this challenge, as well as the performance of our systems and their results.</p>
      </abstract>
      <kwd-group>
        <kwd>Shared Task Open Information Extraction Relation Extraction Pragmatic Open IE Inference Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Information Extraction (IE) emerged as a research area to identify relevant
patterns in large quantities of textual documents [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The tasks employed by IE
were carried out in speci c, homogeneous, and previously established domains.
As a consequence, a rst challenge was to scale traditional IE to the Web [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, some drawbacks were considered, such as low coverage of relations and
human intervention for new relations. Open Information Extraction (Open IE)
comes up to extract information freely from texts and scales for the Web [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
While the quantity and diversity of textual content grow on the Web, the
traditional IE tools have low coverage in this scenario [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In the study conducted by
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the authors proposed a new approach called Open Information Extraction
(Open IE) that extracts facts from a sentence in the following triple format:
triple = (arg1; rel; arg2)
(1)
      </p>
      <p>R. Glauber et al.</p>
      <p>
        where arg1 and arg2 are nominal phrases in a sentence and rel establishes
a relationship between arg1 and arg2 through a verb phrase. Open IE
systems are useful in web-scale issues such as question answering and document
ltering systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The Iberian Languages Evaluation Forum (IberLEF 2019)
organized a Portuguese named entity recognition (NER) and relation extraction
(RE) task which included Open IE task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Participants should apply their
systems/methods to this task related to NER or RE in Portuguese sentences. We
applied three di erent Open IE systems to RE problem:
{ Task 3: General Open Relation Extraction
      </p>
      <p>We describe our Open IE systems and their results, as well as the choices
and problems faced to perform this task. Our systems were implemented based
on machine learning, inference, and handcrafted rules to extract facts from
Portuguese sentences. We participate with three of our systems: ICEIS17,
InferPortOIE, and PragmaicOIE.</p>
      <p>This paper is organized as follows: section 2 describes the problem statement;
Section 3 presents our methods ICEIS17, InferPortOIE, and PragmaticOIE;
Section 4 describes our Setup, and section 5 presents our evaluation. Section 6
presents the results, and we conclude in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>The organization of the IberLEF (Iberian Languages Evaluation Forum) forum
proposes a task that involves the automatic extraction of any relation descriptor
expressing any semantic relation between a pair of entities or concepts mentioned
in Portuguese sentences. In this task, the coordinators consider a relation
description as a text chunk that describes the explicit semantic relation, occurring
between two entities or noun phrases in a sentence.</p>
      <p>The task was divided into two di erent tests. In the rst test, participants
extract the relation descriptors between NP pairs from data provided by the
coordinators. This data was annotated with NP information, and as a consequence,
do not need to employ a NER system by participants. In the second test, the
data provided was not annotated with NP information. The goal of the task
was to extract and classify the NPs from the test sentences, and then extract
the relation descriptors between pairs of the NPs. We submitted our methods to
both Test 1 and Test 2 of Task 3.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Our methods</title>
      <p>We participate in Task 3 with three systems:
3.1</p>
      <sec id="sec-3-1">
        <title>ICEIS17</title>
        <p>
          Our method called ICEIS17 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] modi ed the approach described in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and re ned
through the inference approach. Within ICEIS17 method, we are interested in
new facts arising from inference, especially the identi cation of transitive and
symmetric issues. We divided our method into four-folds: Syntactic Constraint,
Inference Classi er, Transitivity Constraint, and Symmetric Constraint[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 InferPortOIE</title>
        <p>
          InferPortOIE [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] takes into advance the structure of writing, especially asyndetic
coordination sentences. In addition, the methodology of Reverb [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was adapted
to the Portuguese language [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. InferPortOIE proposes two new rules that
generalize both the inference by transitivity and by symmetry, thus increasing the
number of extractions in a sentence. A new speci c rule for symmetric
reasoning is proposed based on a list of symmetric verbs reported in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We divided
InferPortOIE into six-folds: Pre-processing, Syntactic Constraint, Treatment of
Particular Cases, Inference Detection, Transitivity Constraint and Symmetric
Constraint [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>PragmaticOIE</title>
        <p>
          Our PragmaticOIE method achieves a rst pragmatic level. Our rst pragmatic
level copes with inferential, contextual, and intentional aspects. The inferential
module in PragmaticOIE has inherited from our previous work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and
guarantees a semantic interpretation [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The new contextual layer of our
PragmaticOIE system enhanced the method proposed by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and broadened it by the use
of subordinate conjunctions, adverbs, prepositions, and adversative coordination
sentences. Finally, the new intentional approach incorporated into our
PragmaticOIE can extract implicit facts from a sentence, through verbs in Conditional
Tense.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Setup</title>
      <p>All systems, ICEIS171, InferPortOIE2 and PragmaticOIE3, employed to perform
the Task 3 are available for download on FORMAS website. Our systems
generate an output le in comma-separated values (CSV) format. For Test 1, each
system extracted the facts contained in the test sentences. Then, each pair of NP
contained in the test le is compared with the arguments of the facts extracted
by all systems. For the comparison between the arguments of the extracted facts
and the NPs of the test le, the following characters were ignored: \ , . ( ) [ ] ? !.
Moreover, to avoid minor divergences in the comparison of strings, we removed a
set of stopwords4. The text fragment corresponding to the relationship is chosen
1 http://formas.ufba.br/dclaro/tools.html\#sgs\_iceis
2 http://formas.ufba.br/dclaro/tools.html\#inferportoie
3 http://formas.ufba.br/dclaro/tools.html\#pragmaticoie
4 List of stopwords at https://github.com/stopwords-iso/stopwords-pt/blob/
master/stopwords-pt.txt
as a result of Test 1 when the pair of arguments in the output le is similar to
those NP pair of the test le.</p>
      <p>Test 2 follows the free form suggested by the Open IE task. After running
all systems through the test sentences, the next step was to convert our output
format from CSV to the required format of IberLEF 2019.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>
        Two scores were considered for the evaluation of Task 3: a completely correct
relations score and a partially correct relations score [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Completely Correct
Relations (CCR) occurs when all terms that make up the relation descriptors in
the key are equal to the relation descriptors of the system's output. The score
for each completely correct relation is 1, which represents a full hit. Partially
correct relationships (PCR) occurs when at least one of the terms in the
relation descriptors of the system's output corresponds to a term in the relation
descriptors of the key.
5.1
      </p>
      <sec id="sec-5-1">
        <title>Test 1 Evaluation</title>
        <p>The extractions of the systems were matched against the relationship in Test 1
golden dataset, and metrics of exact Precision (EP), exact Recall (ER), partial
Precision (PP), and partial Recall (PR) were calculated. Exact and partial
Fmeasure are identi ed by (EF) and (PF).
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Test 2 Evaluation</title>
        <p>Since Open Relation Extraction recognizes all possible information, and the
sentences adopted in Test 2 are the same as Test 1, we did four di erent evaluations
to provide a full panorama of the performance of our systems:
{ Considering only the relationships in Test 2 golden dataset;
{ Considering the relationships in Test 2 golden dataset and disregarding the
relationship in the training dataset;
{ Considering the relationships in Test 2 golden and Test 1 golden dataset and
disregarding the relationship in the training dataset;
{ Considering the relationships in all three datasets;</p>
        <p>
          All datasets used are available at http://www.inf.pucrs.br/linatural/
wordpress/iberlef-2019/. The details of the performed measures and datasets
are described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>We organized the results of Task 3, considering the values obtained in both
Tests 1 and 2. Table 1 exhibits the results achieved by all systems considering
the exact measure in Test 1. Values for all measures are not very expressive. It is
noteworthy that ICEIS17 has a slight advantage when compared with the other
systems. Both InferPortOIE and PragmaticOIE systems obtained null values for
the exact score.
R. Glauber et al.</p>
      <sec id="sec-6-1">
        <title>ICEIS17-EPICEIS17-ERICEIS17IN-EFFERPORTINOFIEE-REPPORTINOFIEE-REPRORPTROAIEG-EMFATIPCROAIEG-MEPATICPROAIEG-MERATICOIE-EF</title>
        <p>(a) Exact - Evaluation 1
ICEIS17-EPICEIS17-ERICEIS17IN-EFFERPORTINOFIEE-REPPORTINOFIEE-REPRORPTROAIEG-EMFATIPCROAIEG-MEPATICPROAIEG-MERATICOIE-EF
(b) Exact - Evaluation 2
0.04
0.03
0.02
0.01
0.00
0.08
0.06
0.04
0.02
0.00</p>
      </sec>
      <sec id="sec-6-2">
        <title>ICEIS17-EPICEIS17-ERICEIS17IN-EFFERPORTINOFIEE-REPPORTINOFIEE-REPRORPTROAIEG-EMFATIPCROAIEG-MEPATICPROAIEG-MERATICOIE-EF</title>
        <p>(c) Exact - Evaluation 3</p>
      </sec>
      <sec id="sec-6-3">
        <title>ICEIS17-EPICEIS17-ERICEIS17IN-EFFERPORTINOFIEE-REPPORTINOFIEE-REPRORPTROAIEG-EMFATIPCROAIEG-MEPATICPROAIEG-MERATICOIE-EF</title>
        <p>(d) Exact - Evaluation 4</p>
        <p>Open Information Extraction" task through Test 1 and Test 2. In particular, the
ICEIS17 system presented the best results (especially, when we isolate precision).
When the values for Test 2 are analyzed, it becomes more evident.</p>
        <p>The approach used in the evaluated systems demonstrated a low performance
for the proposed task. While Open IE systems prioritize the identi cation of a
high number of facts in sentences, our methods, that utilize shallow analyzers,
have been little e cient.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>This study was nanced in part by the Coordenaca~o de Aperfeicoamento de
Pessoal de Nivel Superior - Brasil (CAPES) - Finance Code 001.</p>
      <sec id="sec-7-1">
        <title>ICEIS17-PPICEIS17-PRICEIS17IN-PFFERPORTINOFIEE-RPPPORTINOFIEE-RPPRORPTROAIEG-PMFATIPCROAIEG-MPPATICPROAIEG-MPRATICOIE-PF</title>
        <p>(a) Partial - Evaluation 1
ICEIS17-PPICEIS17-PRICEIS17IN-PFFERPORTINOFIEE-RPPPORTINOFIEE-RPPRORPTROAIEG-PMFATIPCROAIEG-MPPATICPROAIEG-MPRATICOIE-PF
(b) Partial - Evaluation 2
0.25
0.20
0.15
0.10
0.05
0.00
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00</p>
      </sec>
      <sec id="sec-7-2">
        <title>ICEIS17-PPICEIS17-PRICEIS17IN-PFFERPORTINOFIEE-RPPPORTINOFIEE-RPPRORPTROAIEG-PMFATIPCROAIEG-MPPATICPROAIEG-MPRATICOIE-PF</title>
        <p>(c) Partial - Evaluation 3</p>
      </sec>
      <sec id="sec-7-3">
        <title>ICEIS17-PPICEIS17-PRICEIS17IN-PFFERPORTINOFIEE-RPPPORTINOFIEE-RPPRORPTROAIEG-PMFATIPCROAIEG-MPPATICPROAIEG-MPRATICOIE-PF</title>
        <p>(d) Partial - Evaluation 4
Precision
Recal
F-measure
Precision
Recal
F-measure
0.30
0.25
0.20
0.15</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broadhead</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Open information extraction from the web</article-title>
          .
          <source>In: Proceedings of IJCAI. vol. 7</source>
          , pp.
          <volume>2670</volume>
          {
          <issue>2676</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , a Xavier,
          <string-name>
            <surname>C.C.</surname>
          </string-name>
          :
          <article-title>Portuguese named entity recognition and relation extraction tasks at iberlef</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In: Proceedings of EMNLP</source>
          . pp.
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
          </string-name>
          , D.B.:
          <article-title>A systematic mapping study on open information extraction</article-title>
          .
          <source>Expert Systems with Applications</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>GODOY</surname>
          </string-name>
          , L.:
          <article-title>Os verbos rec procos no PB: interface sintaxe-sema^ntica lexical</article-title>
          .
          <source>2008. Ph.D. thesis</source>
          , Dissertaca~
          <article-title>o (Mestrado em Estudos Lingu sticos</article-title>
          )-Faculdade de Letras, UFMG, Belo
          <string-name>
            <surname>Horizonte</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mausam</surname>
            , Schmitz,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Open language learning for information extraction</article-title>
          .
          <source>In: Proceedings of the EMNLP{CoNLL</source>
          . pp.
          <volume>523</volume>
          {
          <fpage>534</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
          </string-name>
          , D.B.
          <string-name>
            <surname>: EXTRACAO DE RELAC O~ES PRAGMATICAS EM DOCUMENTOS DO PORTUGU E^S DO BRASIL.</surname>
          </string-name>
          <article-title>Master's thesis</article-title>
          ,
          <source>Universidade Federal da Bahia</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Inferportoie: A portuguese open information extraction system with inferences</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>25</volume>
          (
          <issue>2</issue>
          ),
          <volume>287</volume>
          {
          <fpage>306</fpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1017/S135132491800044X
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sena</surname>
            .,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            <given-names>.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Claro</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.B.</surname>
          </string-name>
          :
          <article-title>Inference approach to enhance a portuguese open information extraction</article-title>
          .
          <source>In: Proceedings of the 19th International Conference on Enterprise Information Systems - Volume</source>
          <volume>1</volume>
          : ICEIS,. pp.
          <volume>442</volume>
          {
          <fpage>451</fpage>
          . INSTICC,
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          (
          <year>2017</year>
          ). https://doi.org/10.5220/0006338204420451
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning information extraction rules for semi-structured and free text</article-title>
          .
          <source>Machine learning 34(1-3)</source>
          ,
          <volume>233</volume>
          {
          <fpage>272</fpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>