<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explainable LLM-powered RAG To Tackle Tasks In The Unstructured-structured Data Spectrum</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Darío Garigliotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the context of multiple spaces of research and application in text and information processing dominated by Large Language Models (LLMs), Retrieval-augmented Generation (RAG) provides a general framework with which to integrate external, explicit knowledge into the vast parametric knowledge of LLMs. In this paper, we present a crosspoint of tasks of diverse nature, maturity and level of cognitive challenge for an intelligent system, that nevertheless share in their analogies the suitability for being addressed by a similar RAG approach. Based on observations from several of our recent works, we reflect on the RAG framework, in particular about methods where the LLM is prompted with strategies to explain its generation output, across these tasks with components ranging from unstructured to structured data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Awaiting its due time to be eventually judged as a landmark, the paradigm of Generative AI is nowadays
meaning an ever-consolidating leap forward in the possible solutions that machine learning techniques
enable, and so too new creative challenges to be further solved. Playing with a rather sweet ambiguity
–no more than what is needed–, a new generation (or era) of technology seems to be emerging thanks
to a new generation level (or skill set), the one recently brought to mainstream consideration by
powerful Large Language Models (LLMs) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The come-to-dominance of representation –or deep–
learning, across multiple fields within areas such as image and text processing, meant at its time the
combination of fundamental theoretical models in artificial neural networks, higher computational
power by advancements in the dedicated hardware, and the availability of large amounts of data to
efectively train those models in those infrastructures. A similar spot finds together in synergy a few
key factors driving the successful capabilities of LLMs: transformers-based neural models learning
auto-regressively over massive volumes of text scrapped from web pages, plus multi-task supervision
over several datasets deemed relevant to approach linguistic and cognitive tasks –such as machine
translation, summarization, and conversational search, to name a few–, and complemented with
techniques like fine-tuning to continuously improving underperforming cases, and Reinforcement
Learning with Human Feedback for alignment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This paper describes the work in assessing the
capabilities of LLMs in a series of research problems in overlapping areas such as Natural Language
Processing (NLP), Information Extraction (IE) and Retrieval (IR), and Knowledge Representation (KR).
The problems themselves meet in a common ground where their analogies enable a similar kind of
approaches to address them and compare the corresponding outcomes.
      </p>
      <p>
        The billions of learnable parameters that these neural architectures are equipped with implicitly
encode the knowledge that their models capture. Their higher-abstraction and wider-context levels of
information belong to the core of the seemingly emergent abilities of LLMs increasingly associated with
various notions of intelligence [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. However, for many tasks that vary suficiently in their underlying
data distributions –as it is the case of shifting the information domains or capturing new behaviours
in the expected machine-learned outcomes–, this implicit parametric data is not enough. In these
scenarios, additional knowledge provided explicitly in the input, i.e. by augmenting the prompt to the
LLM, allows its generation process to have access to this external context [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A prevalent umbrella of
approaches within this general strategy is Retrieval-augmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This framework
is, at a high-level overview, a three-stage pipeline. Its first stage, that of retrieval, obtains ranked
knowledge items to be the relevant contexts, typically textual excerpts that are identified with some
retrieval method. The second stage, augmentation, integrates the contexts in a well-engineered prompt
as the input for the LLM to trigger, in the last stage, the generation containing the kind of expected
output at which these models excel. In this work, we use RAG as the general approach to address an
ensemble of research problems. Characteristic aspects of each of these problems allow for bringing them
here together by considering their analogies. We have approached each of these problems separately,
instantiating them via a similar strategy based on RAG framework. The objective of this paper is then
to present the similarities in the respective problem definitions that allows and remark our observations
on comparing, too, their experimental results.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Space: Tasks, Datasets and Settings</title>
      <p>Although it is possible to put these tasks in correspondence within a common approach given some
key analogies, the research problems that we have studied present their unique aspects that make their
ensemble diverse. Each task emerges in a particular context or domain, with earlier or later stages of
consolidation in the space of research, and put in value by distinct users whose expectations for the
kind of intelligent abilities from these assessed models vary too. Their common ground of lending
themselves into being expressed in natural language allows for approaching all of them via language
models, in particular, Large Language Models. Their dynamic character –by which new items of interest
continuously emerge– calls for a mechanism to challenge the still-limited knowledge skills of LLMs.
This, alongside a requirement to often overcome privacy constraints in the manipulation of data, suits
well with the common umbrella of RAG here applied. We determined a handful of parameters in
the experimental setting of the general RAG umbrella, so that each fixed combination of values for
the parameters, i.e. a configuration, is a RAG-based method. Some of the recurrent parameters we
experiment with are:
• The retrieval method, and the ranking length, used during the first stage of RAG;
• The order of the retrieved items when integrated to the prompt at augmentation phase –as some
artefacts in the LLMs tend to memorize, in this case, the order–;
• The set of examples provided as few-shot learning also during prompting to illustrate, in particular,
in what format the LLM should produce its expected output;
• The actual LLM finally invoked during generation.</p>
      <sec id="sec-2-1">
        <title>2.1. Self-supported Question Answering</title>
        <p>
          Question Answering (QA) is the first of the research problems in our work. Having been for long at the
heart of the developments in NLP, QA is the basic instantiation of the intent of information seeking
in natural language. Given how ubiquitous the need for search is, it lives as a fundamental in related
areas such as Information Retrieval and Databases, yet QA is the primitive in a challenge like the
Turing Test that has come to define, for many, what intelligent behaviour is. Here we consider our
work on assessing several LLMs, from open to close commercial models, on performing QA in the
context of proprietary data [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This type of data abounds in many organizations that intend to apply
LLM-powered business intelligence systems to their vast collections. Yet the developers are typically
legally restricted from sharing this data given how valuable it becomes, so inputting entire documents
to an external state-of-the-art LLM is out of possibility. We experiment with a test collection built from
corporate news articles that were published after the cut-of date for all the LLMs of interest, emulating
the scenario where these documents participate when prompting for generation as external knowledge
not present yet e.g. during LLM training or fine-tuning. Passages from these documents are retrieved for
the question, and incorporated during prompt augmentation as contexts to generate the final answer.
        </p>
        <p>
          Our setting actually addresses Self-Supported Question Answering (SQA), as the LLM is also requested
to cite the passages that support the correctness of the generated answer, and answer and citation are
evaluated. This evaluation can become a challenge in itself, as relevant benchmarking in literature may
present discrepancies in the way that the typical retrieval evaluation possibly propagates through the
entire RAG assessment while the LLM has access only to a subset of the universe of evidence [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          In order to study how to extend the abilities of this framework to support explainable generations,
we equip it at prompting stage with simple mechanisms that elicit interpretations from the model
itself [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Specifically, we extend the prompt with an additional question in the augmentation phase. In
our experiments, a couple of additional questions are used: (i) one that directly requests to the model
for explaining the reasons behind the its generation output, and (ii) one that counterfactually proposes
to the LLMM an alternative scenario where changes –irrelevant to the correct answer– are made in a
given experimental parameter. We find some approach configurations exhibiting improvements while
also others where the model generates diferently from when it is not prompted with a request for
explainability.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. SDG Evidence Identification and Target Detection</title>
        <p>
          The second task we consider in this paper is the one with least volume of research development, as it
deals with the recency of (i) digitalization in the area of environmental impact assessment (EIA) and of
(ii) the Sustainable Development Goals (SDG) framework which large proportion of the eforts within
EIA are and will be guided by. Our task is actually a tandem of dual problems: Evidence Identification
(EI) of textual excerpts from EIA reports where a given SDG target is addressed, and, symmetrically,
Target Detection (TD) that consists in identifying relevant SDG targets addressed in a given textual
excerpt [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The tandem captures a recurrent interplay between EI and TD in the EIA practice. This
task pushes the limit of the apparent cognitive abilities of LLMs given that it concerns with more than
information extraction and requires a comprehensive understanding of the complex space of factors
afecting an environment and their relation with a SDG target. Our instantiation of RAG involves here
a target as a query to retrieve its excerpts for EI task, and the other way around for TD task where SDG
targets are indexed as a collection for retrieval. This domain of information is understood to be much
less available digitally, and hence less likely to have been incorporated to the training routines for the
LLMs of interest. Privacy requirements often apply to EIA reports too.
        </p>
        <p>
          In this space of tasks, we also experiment with the RAG framework modified with similar strategies
of interpretation elicitation as used for the SQA task [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Query Target Type Identification</title>
        <p>
          The last of our three tasks is Query Target Type Identification (TTI) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This problem has its roots
in web search, where users were increasingly interested in obtaining focused, direct-access units of
information, entities, beyond the “ten blue links” pointing to relevant documents; the Entity Retrieval
task [12] and the whole Entity-oriented Search field [ 13] were born. The semantic class, or type, of
an entity is known to improve the efectiveness of an entity retriever [ 14]. TTI is the problem of
predicting, for a user query, the type information of its expected relevant entities. This research problem
relates unstructured, keyword queries with types as structured entries in the ontology associated to
the knowledge base storing the uniquely identified entities. Here, the types themselves retrieved for
the query are assessed by the LLM after being added to the dedicated prompt [15]. Specific parameters
that we assess in this problem include the usage of the textual type descriptor and/or that one of actual
relevant entities in the augmented prompt.
        </p>
        <p>Our interest for explainable generation gears in this task towards example-based explainability. We
have adapted the RAG methods to augment the prompt further, by incorporating known relevant entities
for the input query. Then, it is possible to compare its final RAG performance with the respective one
without entity examples, as a proxy to justify why the method generates an answer when it is or it is
not in the presence of these examples [16].</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Conclusion</title>
        <p>After having approached these tasks with similar RAG-based methods, our results shed light on
some common observations over the assessed parameters as well as more distinctive aspects in the
characterization of each research problem. In particular, we have described strategies towards equipping
the common approach with explainability across this space of tasks in the unstructured-structured data
spectrum.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments References</title>
      <p>This work was funded by the Norwegian Research Council grant 329745 Machine Teaching for
Explainable AI.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR ’17, 2017, pp. 845–848.
[12] K. Balog, M. Bron, M. de Rijke, Query modeling for entity search based on terms, categories, and
examples, ACM Trans. Inf. Syst. 29 (2011) 1–31.
[13] K. Balog, Entity-Oriented Search, volume 39 of The Information Retrieval Series, Springer, 2018.
[14] D. Garigliotti, F. Hasibi, K. Balog, Identifying and exploiting target entity type information for ad
hoc entity retrieval, Information Retrieval Journal 22 (2019) 285–323.
[15] D. Garigliotti, Retrieval-Augmented Generation for Query Target Type Identification, in:
RetrievalAugmented Generation Enabled by Knowledge Graphs, co-located with ISWC 2024, CEUR-WS.org,
2024.
[16] D. Garigliotti, Entity Examples for Explainable Query Target Type Identification with LLMs, in:
Intelligent Data Engineering and Automated Learning – IDEAL 2024, Springer Nature Switzerland,
2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>H. T.</surname>
          </string-name>
          et al.,
          <source>Llama</source>
          <volume>2</volume>
          :
          <string-name>
            <given-names>Open</given-names>
            <surname>Foundation</surname>
          </string-name>
          and
          <string-name>
            <surname>Fine-Tuned Chat</surname>
            <given-names>Models</given-names>
          </string-name>
          ,
          <source>ArXiv abs/2307</source>
          .09288 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Elazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhagia</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Magnusson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ravichander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Groeneveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <article-title>What's in my big data</article-title>
          ?,
          <year>2024</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>20707</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Asai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gardner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Evidentiality-guided generation for knowledge-intensive NLP tasks</article-title>
          , in: M.
          <string-name>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          <string-name>
            <surname>Meza Ruiz</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the</source>
          <year>2022</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>2226</fpage>
          -
          <lpage>2243</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .naacl-main.
          <volume>162</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          . naacl-main.
          <volume>162</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Johansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Kallestad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-E.</given-names>
            <surname>Cho</surname>
          </string-name>
          , C. Ferri,
          <article-title>EquinorQA: Large Language Models for Question Answering over proprietary data</article-title>
          ,
          <source>in: ECAI 2024 - 27th European Conference on Artificial Intelligence - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS</source>
          <year>2024</year>
          ), IOS Press,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <article-title>On the Relevant Set of Contexts for Evaluating Retrieval-Augmented Generation Systems, in: Retrieval-Augmented Generation Enabled by Knowledge Graphs, co-located with ISWC 2024, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <article-title>Explaining LLM-based Question Answering via the self-interpretations of a model</article-title>
          ,
          <source>in: Advances in Interpretable Machine Learning and Artificial Intelligence</source>
          , co-located
          <source>with ECMLPKDD 2024</source>
          , Springer Nature Switzerland,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <article-title>SDG target detection in environmental reports using retrieval-augmented generation with LLMs</article-title>
          , in: D.
          <string-name>
            <surname>Stammbach</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Schimanski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dutia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bingler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Christiaen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kushwaha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Muccione</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Vaghefi</surname>
          </string-name>
          , M. Leippold (Eds.),
          <source>Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>250</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .climatenlp-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <article-title>Self-Explanatory Retrieval-Augmented Generation for SDG Evidence Identification</article-title>
          , in: Advances in Conceptual Modeling, Springer Nature Switzerland,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garigliotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hasibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <article-title>Target type identification for entity-bearing queries</article-title>
          , in:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>