Leveraging LLMs for Event Extraction in Italian Documents:
                                a Roadmap for Future Research
                                Federica Rollo* , Giovanni Bonisoli and Laura Po
                                "Enzo Ferrari" Engineering Department, University of Modena and Reggio Emilia, MO 41121 Italy


                                                Abstract
                                                Event extraction is a task of significant interest in the field of Natural Language Processing (NLP) and plays a vital role
                                                in various applications, such as information retrieval and document summarization. Large Language Models (LLMs) have
                                                demonstrated remarkable capabilities in natural language understanding and generation. In this paper, we present a roadmap
                                                for the application of LLMs for event extraction from Italian documents, aiming to address the gap in research and resources for
                                                event extraction in non-English languages. We first discuss the challenges of event extraction and the current state-of-the-art
                                                approaches based on LLMs. Next, we present potential Italian datasets suitable for adapting linguistic models to the domain
                                                of event extraction. Furthermore, we outline future research directions and potential areas for improvement in this evolving
                                                field.

                                                Keywords
                                                event extraction, Large Language Model, Italian language


                                1. Introduction                                                                                          efficient access to relevant information by automatically
                                                                                                                                         identifying text spans containing the desired answers to
                                The recent development of Large Language Models                                                          specific questions. While other models can be provided
                                (LLMs) poses significant promise for advancing several                                                   with detailed instructions to extract specific data from
                                natural language-based tasks, including event extrac-                                                    the text. Integrating these models into NLP pipelines can
                                tion from lengthy text. LLMs such as GPT models [1]                                                      streamline the process of real-time event analysis, allow-
                                have demonstrated remarkable capabilities in under-                                                      ing for timely and efficient extraction of event-related
                                standing and generating natural language text. The appli-                                                information from textual data. This paper explores the
                                cation of LLMs for event extraction offers several advan-                                                role of LLMs in advancing event extraction from lengthy
                                tages. Firstly, these models can process vast amounts of                                                 text. In particular, we focus on the Italian language and
                                text data, enabling comprehensive analysis of events de-                                                 we explore the resources available for adapting and eval-
                                scribed in natural language. Secondly, LLMs can capture                                                  uating LLMs to event extraction on Italian documents. In
                                complex linguistic structures and contextual nuances typ-                                                the end, we define possible future directions for research
                                ical of different kinds of documents, enhancing the accu-                                                in this dynamic field.
                                racy of extracted event details. The continuous learning
                                ability of LLMs allows them to adapt to different writing
                                styles and language conventions.                                                                         2. Event Extraction
                                   However, challenges persist in leveraging LLMs for
                                event extraction in languages other than English, par-                                                   2.1. Task formulation
                                ticularly in languages with limited available resources
                                                                                                                                         Event extraction aims at identifying and categorizing
                                such as Italian. Fine-tuning requires curated datasets
                                                                                                                                         events described within a text, including the recognition
                                that accurately represent the diversity of language and
                                                                                                                                         of the entities involved in the event (such as individu-
                                scenarios, and the annotation of different event-related
                                                                                                                                         als, organizations, or locations), and the extraction of
                                data.
                                                                                                                                         temporal references and any elements that are relevant
                                   Despite these challenges, the potential of LLMs to rev-
                                                                                                                                         for the event. This task has gained significant popu-
                                olutionize event extraction is substantial. For instance,
                                                                                                                                         larity in recent years due to its broad applicability and
                                Question Answering (QA) models can facilitate rapid and
                                                                                                                                         practical utility in various real-world scenarios. Figure 1
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-                                  shows an example of the results of event extraction from
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                            a document describing an air crash. In addition to the
                                *
                                  Corresponding author.                                                                                  identification of the event type, different event roles have
                                $ federica.rollo@unimore.it (F. Rollo);                                                                  been annotated, e.g., the date of the event occurrence,
                                giovanni.bonisoli@unimore.it (G. Bonisoli); laura.po@unimore.it
                                (L. Po)
                                                                                                                                         the aircraft agency.
                                 0000-0002-3834-3629 (F. Rollo); 0000-0001-8538-8347
                                (G. Bonisoli); 0000-0002-3345-176X (L. Po)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: Example of event extraction.


2.2. Challenges                                      2.3. Large Language Models based
Due to the complexity of the natural language, event      approaches
extraction poses several challenges that require sophisti-     Several approaches have been proposed for event extrac-
cated techniques to address effectively.                       tion in recent surveys, from traditional methods which
   The first challenge consists of detecting multiple          rely on the use of linguistic rules for pattern identifica-
events described in the same document and understand-          tion within the text to more advanced solutions such as
ing which are the references to each event. Natural lan-       machine learning and deep learning algorithms able to
guage often contains ambiguous expressions that can            learn patterns after training on annotated data, and the
refer to multiple events or entities. This ambiguity, along    use of pre-trained language models [2, 3]. LLMs based ap-
with the use of coreference, further complicates the task      proaches have emerged as a promising avenue for event
of accurately extracting event data from text since resolv-    extraction in recent years. These models leverage the
ing ambiguity requires contextual understanding and            power of machine learning and deep learning algorithms
disambiguation techniques.                                     as they are pre-trained on vast amounts of text data and
   Identifying relevant elements for each event requires       then fine-tuned for specific tasks. By encoding contextual
distinguishing between event triggers (words or phrases        information and capturing semantic relationships within
that indicate the occurrence of an event) and background       the text, LLMs seem to be promising in identifying and
information and noise. Another complexity is given by          extracting events from various sources.
the variability in language usage, writing styles, syntactic      We identified three main approaches based on the use
structures, and document length. Indeed, event extrac-         of LLMs that could reach good performance in event ex-
tion can be performed on short text like tweets, longer        traction: sequence labeling models, extractive Question
documents such as news articles, and lengthy documents         Answering (QA) models and instruction-tuned models.
such as investigative reports or government documents.
All these factors require the use of techniques able of        Sequence Labeling models In Sequence labeling
accommodating these variations to achieve accurate and         each token in a sequence is assigned a label based on
reliable results across diverse text types and genres.         its role or category within the context of the sequence.
   Two of the key aspects of events are the time and the       Sequence labeling models can be used to identify those
space, i.e., when the event took place and where. The          text spans reporting relevant information within a text.
recognition and standardization of temporal and spatial        Therefore, it is widely employed for several classical NLP
expressions could be complex since temporal reference          tasks like part-of-speech (POS) tagging, named entity
can be expressed in various formats (such as dates, times,     recognition (NER), text chunking.
part of the day). In addition, a document describing an           Sequence labeling models are suitable for the scenario
event can refer to the location of the event providing         of event extraction, where they can identify and classify
information at different granularity, for example indicat-     those parts of text reporting information about events.
ing the name of the city, specifying the address, and/or       Indeed, some works in literature have already treated
describing the type of the place like an apartment, a shop,    event extraction as a sequence labeling or NER problem,
or a park. During event extraction, the references to all      [4, 5], also for Italian Language [6].
these locations should be identified.
Extractive Question Answering The goal of extrac-              next-word prediction objective of LLMs and the users’
tive QA models is to understand an input question in           objective of following their instructions helpfully and
natural language and extract the answer as a span from         safely. Instruction-tuning involves a fine-tuning of Auto-
an input text. QA models can facilitate rapid and effi-        Regressive LLMs with input-output pairs, where input
cient access to event-related information by automati-         denotes the human instructions, and output denotes the
cally identifying text spans containing the desired an-        desired output that follows the instruction. The results
swers to specific questions. For instance, the question        of this process are the Instruction-Tuned LLMs, designed
“When did the event take place?” (Q1) can be formulated        specifically to provide appropriate results based on in-
to retrieve the date of the event.                             struction inputs. This ability is also enhanced as a cross-
   The results of these models depend significantly on the     task generalization, leading Instruction-Tuned LLMs to
quality of the input documents, as well as the structure       better performances on novel tasks.
of the questions provided to the models. Prior knowledge          Instruction-Tuned LLMs can be employed to solve a
about the kind of event described in the document allows       wide range of NLP tasks through various techniques of
to formulate ad hoc questions. For instance, considering       prompt engineering [13], i.e., the process of designing
the document in Figure 1, the question “When did the air       task-specific instructions to guide model output. There-
crash take place?” (Q2) should provide more accurate an-       fore, the utilization of these models can also yield benefits
swers than Q1. In addition, questions should be enriched       for event extraction.
by other details about the event after a partial process of       Currently, there are several Instruction-Tuned LLMs
event extraction. For example, the question “When did          capable of understanding and generating text. For those,
the Flight 345 crash?” (Q3) contain the reference to the       Italian represents a minority percentage in the training
flight number and should help the QA models to select          data compared to more widely used languages on the
the correct context for the extraction of the date.            web such as English. Among these, there are proprietary
   Within the QA models, distinctions arise between            models like GPT-3.5 and GPT-4 from OpenAI, Gemini
Single-Span QA (SQA) and Multi-Span QA (MQA). While            from Google, and open-source families of LLMs like Mis-
the former identifies a single text segment for each ques-     tral [14] and Mixtral [15] from Mistral AI and Llama
tion, the latter locates answers even when distributed         [11] and Llama 2 [16] from Meta. From this last family,
across non-consecutive text segments, potentially located      Llamantino [17] has been derived through a language
far apart within a document. Given the prevalence of           adaptation process to the Italian Language.
such scenarios, especially in complex inquiries and de-
tailed documents, the limitations of SQA models are ev-
ident. An example is the annotation of “causalities and        3. Italian datasets
losses” in Figure 1. The recent surge in MQA model
                                                               Currently, there are few Italian datasets suitable for event
development [7, 8, 9] underscores a notable interest.
                                                               extraction. Some of them provide a comprehensive an-
   In the current state-of-the-art, the only Italian dataset
                                                               notation of event-related data, while in other cases, only
properly designed for training QA models is SQuAD-it
                                                               one type of information (e.g., the temporal references) is
[10], derived from the automatic translation of the En-
                                                               annotated.
glish SQuAD dataset, consisting of a list of pairs question-
answer. However, this dataset can be used only for SQA,
therefore it is unsuitable for complex tasks like event        3.1. EVENTI
extraction which requires the ability to retrieve multiple     The EVENTI1 corpus was built in 2014 for the evalua-
spans for one question.                                        tion of Temporal Information Processing systems of the
                                                               EVENTI evaluation exercise [18] in the EVALITA work-
Instruction-Tuned models Among LLMs, Auto-                     shop. The corpus consists of three datasets: the Main
Regressive models such as GPT [1] or Llama [11] series         task training data (274 documents) and test data (92 doc-
stand out. These models leverage advanced deep learn-          uments) of contemporary news articles and the Pilot task
ing techniques to predict the subsequent word based on         (10 documents) test data of historical news articles. The
an input text. This prediction process is repeated mul-        annotation guidelines involve the use of four tags to an-
tiple times, with each predicted word being added to           notate different elements within news texts: the EVENT
the original text. By training on vast amounts of text         tag is used to annotate all the mentions of events includ-
data, Auto-Regressive LLMs effectively capture complex         ing verbs, nouns, prepositional phrases and adjectives;
patterns and structures in language, leading them to gen-      the TIMEX3 tag is used for temporal expressions; the
erate full and coherent text which is contextually relevant    SIGNAL tag identifies textual items which encode a re-
to input text.                                                 lation either between EVENTs, or TIMEX3s or both; the
   The research in recent years has led to the development
of instruction tuning [12] to bridge the gap between the       1
                                                                   https://sites.google.com/site/eventievalita2014/data-tools
TLINK tag is used for temporal dependencies between          [24, 25]. The news articles underwent automated NLP
EVENTs and/or Temporal Expressions.                          processes to extract temporal references, entities, and cor-
                                                             responding DBpedia resources. Duplicates are annotated
3.2. NewsReader MEANTIME                                     to identify news articles referring to the same crime event.
                                                             The theft-related news articles are annotated manually
The NewsReader MEANTIME (Multilingual Event ANd following a sophisticated annotation schema to identify
TIME) is a multilingual semantically annotated corpus of stolen items (What), crime locations (Where), references
480 Wikinews articles in four languages: English, Italian, to authors and victims, and their sociodemographic char-
Spanish, and Dutch [19]. The corpus was released in 2016 acteristics (Who). The annotation provided in the dataset
and derives from the NewsReader Project2 [20] which is multi-span since it involves identifying and linking
aims at extracting information about what happened to multiple text spans within the document.
whom, when, and where, processing a large volume of
financial and economic data. The corpus is enriched with
                                                             3.5. EventNet-ITA
annotations that span multiple levels, including entities,
entity mentions, events, temporal information, semantic EventNet-ITA4 [26] is an Italian corpus for Frame Parsing
roles, and intra-document and cross-document event and applied to events released in 2024. Semantic Frame Pars-
entity coreference.                                          ing is a task which aims at identifying semantic frames
                                                             within textual data. A semantic frame [27] is a cognitive
3.3. De Gasperi                                              structure that organizes and represents knowledge about
                                                             a concept or situation. It consists of a set of intercon-
The De Gasperi corpus [21] is a collection of historical nected elements such as roles, attributes, and relations,
documents by Alcide De Gasperi, the first Prime Minister which collectively define the meaning and typical fea-
of the Italian Republic. The corpus was released in 2019 tures of that concept or situation. Frames help humans
and includes 2,762 documents published between 1901 understand and interpret language by providing a mental
and 1954, originally released in an oral or written form. In framework for comprehending and categorizing informa-
addition to the raw text, a set of meta-data and additional tion.
semiautomatically annotated information are available.          EventNet-ITA is built upon the idea of enabling frame
The corpus contains different kinds of documents, like parsing for event extraction. It is composed of 53,854
daily press written by De Gasperi when he worked as a sentences manually annotated with 205 semantic frames
journalist for newspapers in Trentino, and speeches in of events and covers different domains, like conflictual,
institutional venues when he was a Member of the Italian social, communication, legal, geopolitical, economic and
Parliament. In each document, references to persons and biographical events.
places are annotated.

3.4. DICE
                                                              4. Future directions
DICE3 [22] is a collection of 10,395 Italian news articles    Automated information extraction from documents con-
describing crime events that happened in the Modena           tinues to captivate the scientific community due to its
province between 2011 and 2021. The news articles are         manifold advantages, facilitating improved information
extracted from one of the most popular local newspapers,      accessibility across various domains. By leveraging LLMs
“Gazzetta di Modena”, following the approach described        and exploiting annotated datasets, researchers can de-
in [23]. Thanks to an agreement between the University        velop robust event extraction systems capable of achiev-
of Modena and Reggio Emilia and the Gazzetta di Mod-          ing high accuracy and efficiency across a wide range of
ena, DICE was released online in 2023, free to redistribute   text sources. As the field continues to advance, further
and transform without encountering legal copyright is-        research into LLMs and their applications in event ex-
sues under an Attribution-NonCommercial-ShareAlike            traction is expected to drive continued innovation and
4.0 International (CC BY-NC-SA 4.0).                          progress in this area.
   Along with the data related to the title, the text, and       Future directions will focus on three key aspects:
the publication date of each news article that are crawled             • Definition of an Italian benchmark: while
from the newspaper’s webpage, several annotations are                    we have identified five Italian datasets suitable
available on the data. The crime event category (e.g.,                   for event extraction, further efforts are needed to
theft, robbery) is assigned to each news article using text              expand their annotation and support comprehen-
categorization approaches based on word embeddings                       sive event extraction tasks. This entails defining
2
    http://www.newsreader-project.eu/
3                                                             4
    https://github.com/federicarollo/Italian-Crime-News           https://huggingface.co/datasets/mrovera/eventnet-ita
       a standardized benchmark for evaluating event    [2] G. Frisoni, G. Moro, A. Carbonaro, A survey on
       extraction systems. Such a benchmark would           event extraction for natural language understand-
       serve as a common evaluation dataset, enabling       ing: Riding the biomedical literature wave, IEEE
       comparisons between different approaches and         Access 9 (2021) 160721 – 160757. doi:10.1109/
       fostering the development of more accurate and       ACCESS.2021.3130956.
       reliable event extraction models.                [3] W. Xiang, B. Wang, A survey of event extraction
     • Evaluation of LLMs on the benchmark: de-             from text, IEEE Access 7 (2019) 173111–173137.
       spite the limited literature on Italian event extrac-doi:10.1109/ACCESS.2019.2956831.
       tion, our preliminary evaluation of three BERT-  [4] A. Ramponi, R. van der Goot, R. Lombardo, B. Plank,
       based QA models on the DICE dataset revealed         Biomedical event extraction as sequence labeling,
       promising results [22]. However, challenges per-     in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Pro-
       sist, particularly related to the size and quality   ceedings of the 2020 Conference on Empirical
       of the annotated data. Once the benchmark is         Methods in Natural Language Processing (EMNLP),
       defined, future efforts will focus on evaluating     Association for Computational Linguistics, On-
       and comparing various approaches outlined in         line, 2020, pp. 5357–5367. doi:10.18653/v1/2020.
       Section 2.3. The evaluation will include the recent  emnlp-main.431.
       Minerva models that represent the first family of[5] S. Pongpaichet, B. Sukosit, C. Duangtanawat,
       LLMs trained from scratch on Italian documents       J. Jamjongdamrongkit, C. Mahacharoensuk,
       developed by Sapienza NLP.                           K. Matangkarat, P. Singhajan, T. Noraset, S. Tuarob,
     • Creation of a synthetic annotated dataset:           Camelon: A system for crime metadata extraction
       since manual annotation is a time and resource-      and spatiotemporal visualization from online
       consuming process, new strategies will be studied    news articles, IEEE Access 12 (2024) 22778–22802.
       to automate the process of annotation. Employ-       doi:10.1109/ACCESS.2024.3363879.
       ing LLMs for data augmentation (i.e., to expand  [6] N. Viani, T. A. Miller, D. Dligach, S. Bethard,
       the annotated dataset) is now the most promising     C. Napolitano, S. G. Priori, R. Bellazzi, L. Sacchi,
       approach, especially focusing on text generation     G. K. Savova, Recurrent neural network architec-
       models. Given a list of desired annotations, i.e.,   tures for event extraction from italian medical re-
       the spans to extract from the text (like “May 14th”  ports, in: A. ten Teije, C. Popow, J. H. Holmes,
       as the date of the event), the LLM is asked to cre-  L. Sacchi (Eds.), Artificial Intelligence in Medicine,
       ate a document with that span with the expected      Springer International Publishing, Cham, 2017, pp.
       role in the event described (like “create a docu-    198–202.
       ment describing an event that occurred on May    [7] H. Li, M. Tomko, M. Vasardani, T. Baldwin, Mul-
       14th”). This methodology allows for obtaining a      tispanqa: A dataset for multi-span question an-
       synthetic dataset that is also already annotated.    swering, in: M. Carpuat, M. de Marneffe, I. V. M.
       Furthermore, this approach offers control over       Ruíz (Eds.), Proceedings of the 2022 Conference
       text generation and ensures fairness in dataset      of the North American Chapter of the Associa-
       composition, ultimately contributing to the devel-   tion for Computational Linguistics: Human Lan-
       opment of balanced and unbiased datasets essen-      guage Technologies, NAACL 2022, Seattle, WA,
       tial for training accurate and equitable AI models.  United States, July 10-15, 2022, Association for
                                                            Computational Linguistics, 2022, pp. 1250–1260.
                                                            doi:10.18653/V1/2022.NAACL-MAIN.90.
References                                              [8] E. Segal, A. Efrat, M. Shoham, A. Globerson, J. Be-
                                                            rant, A simple and effective model for answering
[1] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D.          multi-span questions, 2020, p. 3074 – 3080.
    Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, [9] M. Zhu, A. Ahuja, D. Juan, W. Wei, C. K. Reddy,
    G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,      Question answering with long multiple-span an-
    G. Krueger, T. Henighan, R. Child, A. Ramesh,           swers,       in: T. Cohn, Y. He, Y. Liu (Eds.),
    D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen,        Findings of the Association for Computational
    E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,      Linguistics: EMNLP 2020, Online Event, 16-20
    C. Berner, S. McCandlish, A. Radford, I. Sutskever,     November 2020, volume EMNLP 2020 of Findings
    D. Amodei, Language models are few-shot learners,       of ACL, Association for Computational Linguis-
    in: H. Larochelle, M. Ranzato, R. Hadsell, M. Bal-      tics, 2020, pp. 3840–3849. doi:10.18653/V1/2020.
    can, H. Lin (Eds.), Advances in Neural Information      FINDINGS-EMNLP.342.
    Processing Systems, volume 33, Curran Associates, [10] D. Croce, A. Zelenanska, R. Basili, Neural learning
    Inc., 2020, pp. 1877–1901.                              for question answering in italian, Lecture Notes
     in Computer Science (including subseries Lecture                 clicit201425.
     Notes in Artificial Intelligence and Lecture Notes          [19] A.-L. Minard, M. Speranza, R. Urizar, B. Altuna,
     in Bioinformatics) 11298 LNAI (2018) 389 – 402.                  M. van Erp, A. Schoen, C. van Son, MEANTIME, the
     doi:10.1007/978-3-030-03840-3_29.                                NewsReader multilingual event and time corpus, in:
[11] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A.            Proceedings of the Tenth International Conference
     Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Ham-               on Language Resources and Evaluation (LREC’16),
     bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave,                European Language Resources Association (ELRA),
     G. Lample, Llama: Open and efficient foundation                  Portorož, Slovenia, 2016, pp. 4417–4422.
     language models, 2023. arXiv:2302.13971.                    [20] P. Vossen, R. Agerri, I. Aldabe, A. Cybulska, M. van
[12] S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang,             Erp, A. Fokkens, E. Laparra, A.-L. Minard, A. P.
     J. Li, R. Hu, T. Zhang, F. Wu, G. Wang, Instruction              Aprosio, G. Rigau, et al., Newsreader: Using knowl-
     tuning for large language models: A survey, 2024.                edge resources in a cross-lingual reading machine
     arXiv:2308.10792.                                                to generate more knowledge from massive streams
[13] P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal,              of news, Knowledge-Based Systems 110 (2016) 60–
     A. Chadha, A systematic survey of prompt engi-                   85.
     neering in large language models: Techniques and            [21] S. Tonelli, R. Sprugnoli, G. Moretti, Prendo la parola
     applications, 2024. arXiv:2402.07927.                            in questo consesso mondiale: A multi-genre 20th
[14] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bam-                 century corpus in the political domain, volume
     ford, D. S. Chaplot, D. de las Casas, F. Bressand,               2481, 2019.
     G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud,           [22] G. Bonisoli, M. P. di Buono, L. Po, F. Rollo, DICE:
     M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril,                  a dataset of italian crime event news, in: H. Chen,
     T. Wang, T. Lacroix, W. E. Sayed, Mistral 7b, 2023.              W. E. Duh, H. Huang, M. P. Kato, J. Mothe, B. Poblete
     arXiv:2310.06825.                                                (Eds.), Proceedings of the 46th International ACM
[15] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch,                SIGIR Conference on Research and Development in
     B. Savary, C. Bamford, D. S. Chaplot, D. de las                  Information Retrieval, SIGIR 2023, Taipei, Taiwan,
     Casas, E. B. Hanna, F. Bressand, G. Lengyel,                     July 23-27, 2023, ACM, 2023, pp. 2985–2995. doi:10.
     G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A.             1145/3539618.3591904.
     Lachaux, P. Stock, S. Subramanian, S. Yang, S. An-          [23] F. Rollo, L. Po, Crime event localization and dedu-
     toniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang,               plication, in: J. Z. Pan, V. Tamma, C. d’Amato,
     T. Lacroix, W. E. Sayed, Mixtral of experts, 2024.               K. Janowicz, B. Fu, A. Polleres, O. Seneviratne, L. Ka-
     arXiv:2401.04088.                                                gal (Eds.), The Semantic Web – ISWC 2020, Springer
[16] H. Touvron, L. Martin, K. Stone, P. Albert, A. Alma-             International Publishing, Cham, 2020, pp. 361–377.
     hairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhar-          [24] F. Rollo, G. Bonisoli, L. Po, A comparative analysis
     gava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer,            of word embeddings techniques for italian news
     M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu,            categorization, IEEE Access 12 (2024) 25536 – 25552.
     W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal,                  doi:10.1109/ACCESS.2024.3367246.
     A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kar-         [25] F. Rollo, G. Bonisoli, L. Po, Supervised and unsu-
     das, V. Kerkez, M. Khabsa, I. Kloumann, A. Ko-                   pervised categorization of an imbalanced italian
     renev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee,            crime news dataset, Lecture Notes in Business In-
     D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov,           formation Processing 442 LNBIP (2022) 117 – 139.
     P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizen-            doi:10.1007/978-3-030-98997-2_6.
     stein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M.   [26] M. Rovera, EventNet-ITA: Italian frame parsing
     Smith, R. Subramanian, X. E. Tan, B. Tang, R. Tay-               for events, in: Y. Bizzoni, S. Degaetano-Ortlieb,
     lor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov,           A. Kazantseva, S. Szpakowicz (Eds.), Proceedings of
     Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Ro-                 the 8th Joint SIGHUM Workshop on Computational
     driguez, R. Stojnic, S. Edunov, T. Scialom, Llama 2:             Linguistics for Cultural Heritage, Social Sciences,
     Open foundation and fine-tuned chat models, 2023.                Humanities and Literature (LaTeCH-CLfL 2024), As-
     arXiv:2307.09288.                                                sociation for Computational Linguistics, St. Julians,
[17] P. Basile, E. Musacchio, M. Polignano, L. Siciliani,             Malta, 2024, pp. 77–90.
     G. Fiameni, G. Semeraro, Llamantino: Llama 2 mod-           [27] C. J. Fillmore, C. F. Baker, Frame semantics for
     els for effective text generation in italian language,           text understanding, in: Proceedings of WordNet
     2023. arXiv:2312.09993.                                          and Other Lexical Resources Workshop, NAACL,
[18] T. Caselli, R. Sprugnoli, M. Speranza, M. Mona-                  volume 6, 2001.
     chini, Eventi evaluation of events and temporal
     information at evalita 2014, 2014. doi:10.12871/