<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Machine Learning Research 25 (2024) 1-53.
URL: http://jmlr.org/papers/v25/23</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3115/V1/P15-1017</article-id>
      <title-group>
        <article-title>Leveraging Pretrained and Large Language Models for Inference</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Flores Gustavo Miguel</string-name>
          <email>gustavo.flores@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ä Youssra Rebboud</string-name>
          <email>youssra.rebboud@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ä Pasquale Lisena</string-name>
          <email>pasquale.lisena@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphäel Troncy</string-name>
          <email>raphael.troncy@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Event Relation Extraction, Event Knowledge Graphs, Causal Event Relations, Web Platform,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>Sophia Antipolis, Biot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2005</year>
      </pub-date>
      <volume>1</volume>
      <fpage>26</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Event relation extraction is crucial for understanding the temporal sequence and interconnections between events. To demonstrate this, we developed a Streamlit-based application that showcases our event relation extraction system, capable of identifying semantically accurate relations such as Direct-cause, Enable, Intend, and Prevent. The system features an API that simplifies inference and displays results in a user-friendly manner. Users can input text like a sentence and the application highlights extracted events and their corresponding relationships. The backend runs a series of pre-trained language models, trained on datasets focused on events and their semantic relations. The app allows users to switch between various models, including HuggingFace's RoBERTa, REBEL, and large language models like Zephyr. The demo is available at https://demo.kflow.eurecom.fr/.</p>
      </abstract>
      <kwd-group>
        <kwd>Language</kwd>
        <kwd>supports</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
Understanding the flow of events and their interconnections is crucial for tasks such as narrative
com-prehension, historical analysis, and machine learning applications. The way information is
represented significantly impacts the contextual knowledge models can access, often encoded
through relational triplets. While knowledge about entities is important, understanding the context
surrounding those entities especially events is equally vital. Events are instances that occur in time
and space, inherently existing within a web of causal relationships that can provide critical insights
into their nature and consequences [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Given the significance of event knowledge, researchers have developed various methods to
represent events and their relationships. Event Relation Extraction (ERE) is the task of identifying
and predicting relationships between events in text, enabling a deeper understanding of their
progression and impact [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>However, with the proliferation of different machine learning
models, architectures, and tuning parameters, evaluating their performance on this task remains
challenging.</p>
      <p>This research aims to address this challenge by developing a comprehensive Event Relation
Extraction pipeline. The pipeline allows users to experiment with various models and datasets,
enabling them to input sentences, extract events, and visualize the relationships between them.
For everyday users, this pipeline enhances the understanding of event flows in textual data. For
researchers, it offers a qualitative evaluation tool that allows them to analyze and compare model
performance on event relation extraction tasks, identifying potential strengths and weaknesses.To
make this accessible, we developed a user-friendly Streamlit web application https://demo.kflow.
eurecom.fr/ that visually presents the extracted events and their relations. The application
http://www.eurecom.fr/~troncy (R. Troncy)
CEUR</p>
      <p>ceur-ws.org
multiple pre-trained models and datasets, providing an interactive platform for generating and
analyzing inferences.</p>
      <p>
        The structure of this demo paper is the following: In Section 2 we cover the related work in
event and relation extraction. Section 3 explains the pipeline architecture, detailing the tasks
performed, how inferences are generated, and how users can interact with thesystem. Finally,
Section 4 highlights observations, potential improvements, and future development directions.
2. Related work
The study of event relations has historically focused on temporal relationships, where researchers
aimed to represent the temporal order of events [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Subsequently, attention shifted towards
causal relations between events, which sought to understand the influence of one event on
another. In these causal relationships, the cause is typically regarded as the subject, while the effect
is viewed as the object.In our work, we aim to move beyond basic causality and focus on extracting
fine-grained causal relationships between events. These nuanced event relations, initially
introduced by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], were accompanied by the creation of the first dataset specifically designed to
capture such detailed event relations.
      </p>
      <p>Event Relation Extraction (ERE) is generally divided into two main subtasks: (1) identifying the
type of relation, and (2) extracting the corresponding spans of the subject and object from the
sentence. Early work in this domain was carried out by the Linguistic Data Consortium (LDC)
through the Automatic Content Extraction (ACE) program [4], which focused on texts from various
domains such as newswire, broadcast news, conversational speech, weblogs, Usenet, and
telephone conversations. The primary objective of ACE was to develop information extraction
techniques that could facilitate the automatic processing of human language in textual form.</p>
      <p>
        In recent years, neural models have gained prominence in event extraction tasks. With
advancements in deep learning, researchers have explored the use of Convolutional Neural
Networks (CNNs) [5], Recurrent Neural Networks (RNNs) [6], and, more recently,
transformerbased models [7]. Pretrained language models (PLMs) have become a focal point in event
extraction studies due to their ability to learn general-purpose representations from raw text,
which aids in extracting relevant event relations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. BERT, in particular, has demonstrated strong
performance in this area, as highlighted in a study [7] showing that BERT could achieve
state-ofthe-art results without the need for task-specific architectures or external resources [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Large
Language Models (LLMs) have demonstrated strong performance in relation extraction tasks. In
the work of [8], the Flan-T5 model [9] significantly outperformed previous baselines on the
CoNLL04 dataset [10], underscoring the potential of LLMs for event relation extraction.
      </p>
      <p>
        For precise event relations, such as Direct-cause, Enable, Intend, and Prevent, [11] proposed an
approach to augment thedataset introduced by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] usingGPT. They then employed BERT [12]
to perform event relation extraction tasks. While their method achieved good performance on the
relation classification subtask, it showed limitations in the quality of event extraction.
      </p>
      <p>
        In this work, we aim to provide an API based on an event relation extraction pipeline that
leverages various pre-trained language models (PLMs) and large language models (LLMs) instead of
relying solely on BERT [12]. Although detailed performance results cannot be shared here, as they
are under review in another study, we offer insights into the models performance and provide access
to the code and a link to the API.
3. Platform and API for Event RelationExtraction
3.1. Event Relation Extraction fromText
In our pipeline, the goal is to perform event relation extraction from textual data, focusing on
four semantically precise event relations: Direct-Cause, Enable, Intend, and Prevent [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These
relations are categorized under the broader supertype of Cause.
      </p>
      <p>Input
Input Text</p>
      <p>RD
Output: 0/1
Models: RoBERTa</p>
      <p>Filtered
Sentences</p>
      <p>RC
Output: Relation Type
Models: Roberta/
Langchain LLMs/</p>
      <p>REBEL</p>
      <p>EE
Output: (Subject,</p>
      <p>Object)
Models: RoBERTa/
Langchain LLMs/REBEL</p>
      <p>Streamlit
Application
Framework</p>
      <p>UI</p>
    </sec>
    <sec id="sec-2">
      <title>REBEL</title>
    </sec>
    <sec id="sec-3">
      <title>Langchain</title>
      <p>Hugging Face
(Roberta)</p>
      <p>The pipeline performs three tasks: Relation Detection(RD), Relation Classification(RC), and Event
Extraction(EE). Dividing the task into three subtasks could enable testing a broader combination of
models for each task, allowing evaluation of strengths and weaknesses for each subtask independently.
In the RD phase, the model filters out sentences that do not have a causal event relation, this task is not
optional. The sentences containing a causal relation will passe to the RC module. At this level, the causal
sentence will be given as input to the RC module to determine which type of event relation is in the
sentence from the four relations. Finally, when we decide the relation type, the EE module will extract the
subject and the object of the event relation in a given sentence. Figures 1 illustrates the pipeline modules.</p>
      <p>The most integral component of the pipeline is the ERE models. The pipeline runs only one model for
each task (RC, RD, and EE), chosen from the available options. At the present, the models included are:
• the BERT family of models by Hugging Face for (RC, RD, EE)[13];
• REBEL for (RD, EE)[14];
• the large language models (LLM) available through the LangChain library1 for (RD, EE).</p>
      <p>The available LLMs are: Zephyr[15], DPO[16], UNA[17], SOLAR[18], and GPT4[19].
Both the BERT family models and REBEL were trained using a combination of two datasets, The
Event Relations Dataset from [11], and the CausalNewsCorpus [20] which made a total of 5613
example sentences annotated with the four relations Direct-Cause, Enable, Intend, and Prevent
together with the subject and object of each relation. On the other hand, The same prompt template 2
was designed for every LLM that we have been using. The chosen LLMs ranked among the top
performers on the Huggingface Open LLM Leaderboard at the time of writing, excelling across
various benchmarks, including the Multi-Task Language Understanding Benchmark (MMLU) [21].</p>
      <p>The RoBERTa model performed well in the relation detection task, achieving an average
F1-score of 0.86. In contrast, the REBEL model excelled in both relation classification and
event extraction tasks, with F1-scores of 0.975 and 0.829, respectively, showcasing its overall
effective-ness. The performance details of these models are currently under review for
another conference and cannot be disclosed atthis time. However, the code and data for
this work are available at: https://github.com/ANR-kFLOW/Relation_extraction/tree/main. Figures
3 shows an example of an accurate and an inaccurate prediction produced using RoBERTa as a
filter (RD) and REBEL for both RC and EE.
3.2. Relation Extraction Pipeline
The pipeline is made in Python and the specifications for the inferences can be passed through:
command line, a configuration file, or through the user interface developed for the pipeline.
1https://Python.langchain.com/
2https://github.com/ANR-kFLOW/Relation_extraction/blob/main/LLMs_as_Relation_Classifiors_and_ Event_Extractors/
prompt_template.yml</p>
      <p>After training, the pretrained model can be made available to the pipeline by saving the trained
models in a common folder, making them available at for the inference stage.</p>
      <p>Information that the pipeline can receive from users is: the path to a pretrained model for a given
task, the choice to skip performing one of the tasks, and the user’s OpenAI key (if GPT4 is used for
any of the tasks). The pipeline has a default that configuration that is ran if the user does not input
any instructions. If the user inputs instructions that does not cover all arguments then the pipeline
will fill in the missing arguments with the default values.
3.3. Streamlit Platform Architecture</p>
      <p>This application serves to provide users with a curated demonstration of the capabilities of the
models. This application is developed using Streamlit 3, which acts as both the web application and
the web API as shown in 2. the Streamlit application receives input from the user and passes it
along to the Python
pipeline via a configuration file. The user can write their own text or use an input preset. After the
user makes his choice of the model used for each task,the inference running in the Back-End will be
produced.</p>
      <p>The output returned tothe user will include his original text used to produce the inference
with highlights of subject and object of the extracted event relation. Next to the highlights are labels
indicating what part of the span it is (subject or object) together with the event relation type.</p>
      <p>There are two different versions for how the highlighting is formatted: one for spans that do not
overlap, and one for spans that overlap. In the case of spans that do not overlap, the color of the
highlights are color coded for the classification of the event relation and there are labels at the end
of each highlight. In the case where the spans can overlap one another the spans are represented
by being encased in color coded brackets. The color for the bracket indicates what part of the span
the bracket contains(subject or object). The labels are placed at the closing bracket to avoid cluttering
up the sentence. The classification for the relation can either be: cause, intend, prevent, enable, or
other. Other refers to when the model producing the classifications gives a nonstandard response.
Some models such as RoBERTa[13] identify multiple event relations in a given sentence. In that case
the sentences that have multiple event relations will be displayed multiple times for each event
relation detected. This makes it so that there is only one span displayed at a time, for visual clarity.
Figures 4 shows a screen of the demo with Streamlit.</p>
      <sec id="sec-3-1">
        <title>4. Conclusion and Future Work</title>
        <p>In this work we have constructed an API for event relation extraction based on a set of
pretrained language models, BERT, RoBERTa, and REBEL together with few LLMs such as GPT4,
and Zephyr. The API was created to help stream line the process of preforming inferences on
textual input from a given user, and aiding the process of comparing ERE models to one another.
The API is accessible at
https://demo.kflow.eurecom.fr/.</p>
        <p>In the future, the platform will allow a user to compare multiple models in an A/B testing
fashion. The A/B testing happens in the user comparing the inferences generated by the models side
by side and recording their evaluation of how one model compares to another. First, an
automatic test will apply widely adopted metrics – e.g. precision, recall and F1-score – on a
predefined ground truth to evaluate the performance of the models. These comparisons and
evaluations will be saved so that users in future can use these metrics to determine what are the
best performing models. The best 3 models for a given task will be selected for human evaluation
through an UI. A future addition to the pipeline can be including functionality to be able to train
the models by using the pipeline.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Acknowledgements</title>
        <p>This work has been partially supported by the French National Research Agency (ANR) within the
kFLOW project (Grant nrˇANR-21-CE23-0028).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rebboud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lisena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          , Beyond Causality:
          <article-title>Representing Event Relations in Knowledge Graphs, in: Knowledge Engineering and Knowledge Management: 23rd International Conference</article-title>
          , EKAW 2022, Bolzano, Italy,
          <year>September 2629</year>
          ,
          <year>2022</year>
          , Proceedings, Springer-Verlag, Berlin, Heidelberg,
          <year>2022</year>
          , p.
          <fpage>121135</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -17105-
          <issue>5</issue>
          _
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          , X. Cheng, Protoem:
          <article-title>A prototype-enhanced matching framework for event relation extraction</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2309.12892. arXiv:
          <volume>2309</volume>
          .
          <fpage>12892</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Extracting events and their relations from texts: A survey on recent research progress and challenges</article-title>
          ,
          <source>AI</source>
          Open 1
          <article-title>(</article-title>
          <year>2020</year>
          )
          <fpage>22</fpage>
          -
          <lpage>39</lpage>
          . URL: https://www.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>