<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of FACT at IberLEF 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aiala Rosa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Castellon</string-name>
          <email>icastellon@ub.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Chiruzzo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hortensia Curell</string-name>
          <email>Hortensia.Curell@uab.cat</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Etcheverry</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Fernandez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gloria Vazquez</string-name>
          <email>gvazquez@dal.udl.cat</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Wonsever</string-name>
          <email>wonseverg@fing.edu.uy</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Autonoma de Barcelona</institution>
          ,
          <addr-line>Espan~a</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Barcelona</institution>
          ,
          <addr-line>Espan~a</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad de la Republica</institution>
          ,
          <country country="UY">Uruguay</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>105</fpage>
      <lpage>110</lpage>
      <abstract>
        <p>Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 September 2019, Bilbao, Spain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In this paper we describe the FACT shared task (Factuality Annotation and
Classi cation Task), included in the First Iberian Languages Evaluation Forum
(IberLEF).</p>
      <p>
        Factuality is understood, following [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as the category that determines the
factual status of events, that is, whether events are presented or not as certain. In
order to analyze event references in texts, it is crucial to determine whether they
are presented as having taken place or as potential or not accomplished events.
This information can be used for di erent applications like Question Answering,
Information Extraction, or Incremental Timeline Construction.
      </p>
      <p>
        Despite its centrality for Natural Language Understanding, this task has
been underresearched, with the work by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as a reference for English and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for
Spanish. For Italian, a task similar to FACT has been proposed in the past [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
The bottleneck to advance on this task has usually been the lack of annotated
resources, together with its inherent di culty. Currently PLN-InCo and GRIAL
both have ongoing research projects on this topic, which are producing and will
produce such annotated resources. This makes the proposal of this task even
more interesting.
      </p>
      <p>
        The main objective of this task is to advance in the study of the factuality
of the events mentioned in texts, seeking to contrast di erent approaches. To
accomplish this task a corpus annotated with factuality information is available
allowing experimentation with supervised machine learning techniques.
A number of categories have been proposed to classify di erent modes of
(non)accomplishment of events. For Spanish factuality, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposes a six value scheme:
Accomplished, Not Accomplished, Scheduled Future, Denied Future, Possible,
and Unde ned. The rst four categories represent a high degree of certainty,
but only Accomplished and Not Accomplished categories represent events that
actually happened or not. On the other hand, Possible and Unde ned categories
are used for events whose occurrence is uncertain (Possible for uncertain future
events and Unde ned for uncertain past events).
      </p>
      <p>
        Even though this scheme provides a detailed model for factuality, the
categories are too ne-grained and some of them are underrepresented in texts,
making automatic recognition di cult. For this reason a simpli ed scheme has
been used for a corpus annotation task, reducing the categories to three values:
Accomplished, Not Accomplished, and Unde ned [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This corpus is made up of
Uruguayan texts and contains 2,080 events (1392 Accomplished events, 121 Not
Accomplished events, and 567 Unde ned events).
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>In this task facts are not veri ed in regard to the real world, just assessed with
respect to how they are presented by the source (in this case the writer), that is,
the commitment of the source to the truth-value of the event. In this sense, the
task could be conceived as a core procedure for other tasks such as fact-checking
and fake-news detection, making it possible, in future tasks, to compare what
is narrated in the text (fact tagging) to what is happening in the world
(factchecking and fake-news).</p>
      <p>We established three possible categories:
{ Facts: current and past situations in the world that are presented as real.
{ Counterfacts: current and past situations that the writer presents as not
having happened.
{ Possibilities, future situations, predictions, hypothesis and other options:
situations presented as uncertain since the writer does not commit openly
to the truth-value either because they have not happened yet or because the
author does not know.</p>
      <p>And their respective tags:
{ F: Factual
{ CF: Counterfactual
{ U: Unde ned</p>
      <p>The participating systems had to automatically propose a factual tag for
each event in the text. Since event identi cation is not the scope of this task,
the events are already annotated in the texts. The structure of the tags used in
the annotation is the following:
&lt;event factuality="F"&gt;verb&lt;/event&gt;
For example, in a sentence such as:
El n de semana &lt;event factuality=""&gt;llego&lt;/event&gt; a Uruguay el
segundo avion de la aerol nea.
(The second plane of the airline arrived in Uruguay on the weekend.)</p>
      <sec id="sec-2-1">
        <title>The systems outcome should be:</title>
        <p>El n de semana &lt;event factuality="F"&gt;llego&lt;/event&gt; a Uruguay el
segundo avion de la aerol nea.</p>
        <p>The performance of this task was measured against the evaluation corpus
using these metrics:
{ Precision, Recall and F1 score for each category.
{ Macro-F1.
{ Global accuracy.</p>
      </sec>
      <sec id="sec-2-2">
        <title>The main score for evaluating the submissions is Macro-F1.</title>
        <p>3.1</p>
        <sec id="sec-2-2-1">
          <title>Corpus</title>
          <p>Starting from the Uruguayan corpus with 2,000 events mentioned above, prior
to the start of this shared task, an annotation process was carried out in order
to extend the corpus, and to include texts from Spain and more documents from
Uruguay. An annotation guideline was provided in order to explain the meaning
of the tags and the scope of the annotation.</p>
          <p>The resulting corpus contains Spanish texts with more than 5,000 verbal
events classi ed as F (Fact), CF (Counterfact), U (Unde ned). The corpus was
divided in two subcorpora: the training corpus (80%), and the testing corpus
(20%). The texts belong to the journalistic register and most of them are from
the political sections from Spanish and Uruguayan newspapers. An excerpt of
the corpus is shown below:</p>
          <p>Y otras generaciones que &lt;event factuality="F"&gt;han&lt;/event&gt;
&lt;event factuality="F"&gt;vivido&lt;/event&gt; mas relajadamente no
&lt;event factuality="CF"&gt;estan&lt;/event&gt; &lt;event factuality="CF"&gt;
viendo&lt;/event&gt; la importancia de &lt;event factuality="U"&gt;luchar
&lt;/event&gt; para &lt;event factuality="U"&gt;mantener &lt;/event&gt; esa
libertad de expresion...</p>
          <p>And other generations that have lived more relaxed are not seeing the
importance of ghting to maintain that freedom of expression.</p>
          <p>Categories distribution and the sizes of train and test corpora are shown in
Table 1.</p>
          <p>As can be seen, the categories are highly unbalanced in the corpus, which can
di cult the recognition of the least represented class (counterfactual events).</p>
          <p>Category</p>
          <p>Train Test Total
There were ve participating teams, one of them (garain) did not send us any
description, so it is not included in this section. The systems presented by the
remaining four teams are described below:</p>
          <p>
            1) Amrita CEN (Premjith, Soman and Prabaharan [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]) proposes a system
based on word embeddings using a Random Forest classi er. Taking into account
the di erences in the number of appearances of the di erent factual labels in the
corpus, the implementation assigns a higher weight to the minority label (CF)
and a lower one to the more frequent labels in order to improve the prediction
of the less frequent categories .
          </p>
          <p>
            2) jimblair (Mao and Zhong [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]) proposes the use of BERT[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], a
multilayer bidirectional transformer encoder. For this task, a BERT-Base, multilingual
cased model was choosed. For the training process, the corpus was divided in
two parts, Uruguayan texts and Spanish texts. The training was made for the
two models independently and predicts the categories for each subcorpus. In the
last step, both outputs, the Spanish and Uruguayan subcorpora, are combined
in order to create the nal annotation.
          </p>
          <p>
            3) Aspie96 (Giudice [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]) system is a character-level convolutional recurrent
neural network which makes no use of pre-trained features (such as word
embeddings), nor of additional knowledge or intuition about the task, but takes
advantage of tokenization to classify individual words within the text. Each
word is represented as a xed-size list of vectors, each of which represents an
individual character with left and right context characters added, eventually not
belonging to the word. An event ag is added to indicate whether the word is
an event or not. In a nal step a dense layer is applied to get, for each word, its
classi cation in one of the three classes.
          </p>
          <p>4) macro128 (Pastorini) uses SentencePiece4, a language independent
character based tokenizer, in a pre-processing phase. It is used on the training corpus
to generate tokens related to the task and then to tokenize the validation /
testing corpus. The pre-trained BERT language model is used, with one end layer
that classi es each token among the three possible categories (F, CF or U). As
not all words were initially classi ed, each token is randomly assigned a category.
The size of the input layer (responsible for generating embeddings) is reduced to
make it compatible with the amount of tokens generated in the pre-processing
4 https://github.com/google/sentencepiece
stage. The model was rst trained without the classi cation layer, until
convergence, for a maximum of 100 epochs (using early stopping). The entire model
was then trained to converge for a maximum of 100 epochs using F1 measure
for early stopping.
3.3</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Global Results</title>
          <p>
            The best results were obtained by Amrita CEN, whose approach is based on
Random Forest and word embeddings. The remaining approaches are based on
di erent Deep Learning models, jimblair and macro128 apply variants of BERT
model, Aspie96 uses a recurrent CNN. Previous work reached higher Macro-F1
(80%) and Accuracy (87%) with a sequential version of the SVM model, training
on the previous version of the corpus [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. These previous results are not entirely
representative given that the corpus used for evaluation was signi cantly smaller
than the one used in FACT, in particular, the CF class had only 16 occurrences.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>We presented the rst edition of the FACT shared task, that was an
important opportunity to work on the extension and revision of an existing factuality
corpus, and to perform experiments on factuality recognition.</p>
      <p>Three models based on Deep Learning were presented, but the best results
were reached by a system based on Random Forest and word embeddings.</p>
      <p>Some research directions we would like to pursue in the future include using
the more complex six-valued annotation schema, and including another subtask
for recognizing events (verb and noun events) together with their factuality value,
instead of having the events pre-annotated in the corpus.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Giudice</surname>
          </string-name>
          , V.:
          <article-title>Aspie96 at FACT</article-title>
          (IberLEF
          <year>2019</year>
          )
          <article-title>: Factuality Classi cation in Spanish Texts with Character-Level Convolutional RNN and Tokenization</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS, Bilbao,
          <source>Spain (9</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Factuality Classi cation using the Pre-trained Language Representation Model BERT</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS, Bilbao,
          <source>Spain (9</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speranza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caselli</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The EVALITA 2016 Event Factuality Annotation Task (FactA)</article-title>
          .
          <source>In: Proceedings CLiC-it 2016 and EVALITA</source>
          <year>2016</year>
          . CEUR Workshop Proceedings, CEUR-WS, Napoli, Italy (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Premjith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soman</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poornachandran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Amrita</surname>
            <given-names>CEN</given-names>
          </string-name>
          @
          <article-title>FACT: Factuality Identi cation in Spanish Text</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS, Bilbao,
          <source>Spain (9</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Saur</surname>
          </string-name>
          , R.:
          <article-title>A Factuality Pro ler for Eventualities in Text</article-title>
          . Brandeis University (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Saur</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pustejovsky</surname>
          </string-name>
          , J.:
          <article-title>Factbank: a corpus annotated with event factuality</article-title>
          .
          <source>Language resources and evaluation 43(3)</source>
          ,
          <volume>227</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wonsever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malcuori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Rosa</given-names>
            <surname>Furman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Factividad de los eventos referidos en textos</article-title>
          .
          <source>Reportes Tecnicos 09-12</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wonsever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malcuori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Factuality annotation and learning in spanish texts</article-title>
          . In: LREC (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>