<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Advanced Computer Sci- V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau
ence and Applications, Vol.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/TKDE.2020.3014166</article-id>
      <title-group>
        <article-title>Advancements and Challenges in Generative AI: Architectures, Applications, and Ethical Implications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Flora Amato</string-name>
          <email>flora.amato@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Benfenati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Egidia Cirillo</string-name>
          <email>egidia.cirillo@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Maria De Filippis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mattia Fonisto</string-name>
          <email>mattia.fonisto@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Galli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marrone</string-name>
          <email>stefano.marrone@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lidia Marassi</string-name>
          <email>lidia.marassi@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Moscato</string-name>
          <email>vincenzo.moscato@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Narendra Patwardhan</string-name>
          <email>narendra.patwardhan@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Moccardi</string-name>
          <email>alberto.moccardi@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Elia Pascarella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio M. Rinaldi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristiano Russo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Sansone</string-name>
          <email>carlo.sansone@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristian Tommasino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II</institution>
          ,
          <addr-line>Via Claudio 21, 80125 Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Interdepartmental Center for Research on Management and Innovation in Healthcare (CIRMIS), University of Naples Federico II</institution>
          ,
          <addr-line>Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>7</volume>
      <issue>2015</issue>
      <fpage>2420</fpage>
      <lpage>2422</lpage>
      <abstract>
        <p>Architecture, classification, and major applications of Generative AI interfaces, specifically chatbots, are presented in this paper. Research paper details how the Generative AI interfaces work with various Generative AI approaches and show the architecture and their working. On the other hand, the generative model is built using advanced machine learning techniques to build dynamic, contextually relevant responses automatically. On the other hand, the retrieval-based model builds up with dependency on a predefined response library. The paper also discusses the use of Generative AI to populate Multimedia Knowledge Graphs (KGs), presenting technologies based on the semantic analysis of deep learning and NoSQL to more efectively integrate and retrieve data. The social and ethical challenges that come with the deployment of generative models are critically reviewed. These dialogues bring forward the balance that has to be maintained between progress and necessity in technological advancements, for which the call for ethical responsibility in developing AI is made. The paper presents a comprehensive review of state-of-the-art Generative AI with special focus on the promises and pitfalls in Generative AI research related to both natural language processing and knowledge management.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;artificial intelligence</kwd>
        <kwd>Generative AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A chatbot, also known as a conversational agent, is an
artificial intelligence (AI) software that can simulate a
conversation (or a chat) with a user through text or voice
interfaces [1]. Chatbots can use natural language
processing (NLP) and machine learning algorithms to understand
user inputs and generate appropriate responses, allowing
them to provide assistance, automate tasks, and perform
other functions without the need for human intervention.</p>
      <sec id="sec-1-1">
        <title>The term "chatbot", short for "chatterbot", was originally coined by Michael Mauldin in 1994 to describe these conversational programs in his attempt to develop a Turing System [2].</title>
        <p>This work aims to explore various techniques, approaches
and technologies that have been utilized for developing
chatbots since the late 1990s; furthermore, we will
provide insights into the most common applications and use
cases.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Architecture and Classification of Generative AI Interfaces</title>
      <sec id="sec-2-1">
        <title>As a modern approach for architecture of Generative AI</title>
        <p>Interfaces, we will follow [3, 4, 5] and divide the
intelligent interfaces structure proposed in the state of the art
in four parts: the interface, the multimedia processor, the
multimodal input analysis, and the response generator.
In detail,
1. The interface is responsible for managing the
interaction between the chatbot and users, which
involves receiving inputs in various forms such
as text or audio and returning appropriate
responses.
2. The multimedia processor (optional) may be
required to preprocess voice or video signals and
convert them into text or recognize the user’s
tone to facilitate response generation.
3. The multimodal input analysis unit handles
classification and data pre-treatment, often
using natural language understanding (NLU)
techniques such as semantic parsing, slot filling, and
intent identification.
4. The response generator either associates a
proper response for the given pre-processed input
from a stored dataset or, using modern machine
learning techniques, maps the normalized input
to the output using a pre-trained model.</p>
        <p>The response generator is the core component of a
chatbot where the actual question-and-answer process
takes place, and it can be considered as the "brain" of the
system. Based on the architecture of the response
generator, chatbot systems can be classified into two main
categories: retrieval-based chatbots, which select their
responses form a pre-defined set of possible outcomes,
and generative-based chatbots, which use ML
techniques to dynamically generate answers [6].</p>
        <sec id="sec-2-1-1">
          <title>2.1. Retrieval-based chatbots</title>
          <p>The goal of retrieval-based chatbots is to "understand"
the user input and choose the most suitable responses
from a knowledge dataset. There are four sub-categories
of retrieval-based chatbots, which can be distinguished
based on the architecture of their knowledge dataset and
retrieval techniques. These categories are template-based,
corpus-based, intent-based, and RL-based [5].
Template-based chatbots</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Template-based chatbots select responses from a set of</title>
        <p>possible candidates by comparing the user input to
certain query patterns.
model leverages this information to link normalized user
inputs with the most probable user intent [7].
RL-based chatbots
RL-based chatbots adopt reinforcement learning for
response generation. Reinforcement learning itself is
mainly based on the Markov decision process, i.e. a
4tuple (, , , ) where:
•  = (1, 2, ..., ) is a set of states, called the
state space;
•  = (1, 2, ..., ) is a set of actions, called the
action space;
• (, ′) = Pr(+1 = ′| = ,  = ) is the
probability that action , in the state  at step 
will lead to state ′ at step  + 1;
• (, ′) is the reward received after
transitioning from state  to state ′ when action  is
performed.</p>
        <p>The goal of a Markov decision process it to find a
function  () (generally called policy) that associate, for
every state , the action  () =  which maximizes
the overall reward, i.e. the following expectation value:
 = 
[︃ ∞
∑︁   ()(, +1)
=0
]︃
(1)
where  is a coeficient (the discount factor) between 0
and 1 [8]. In RL-based chatbots, each state  corresponds
to a specific turn in the conversation and is usually
represented by an embedded vector. After the chatbot is
trained, it is able to select the most appropriate response
(action)  to ensure that the conversation remains
relevant and coherent [9].</p>
        <sec id="sec-2-2-1">
          <title>2.2. Generative-based chatbots</title>
          <p>Corpus-based chatbots Generative-based chatbots have the advantage of being
able to generate responses dynamically, which can lead to
Although template-based chatbots have shown efective- more natural and flexible conversations with users.
Genness in certain cases, their fundamental architecture ne- erative chatbots can generate novel responses, which
cessitates scanning through all potential outputs for each means that they are not limited to pre-defined responses
input until the appropriate response is located. As a like retrieval-based chatbots. This flexibility allows them
result, this approach can be slow and unsuitable for ap- to provide more personalized and relevant responses.
plications with a large knowledge dataset. Depending on the machine learning architecture used,
we will discuss about RNN-based chatbots and
Intent-based chatbots Transformer-based chatbots.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Intent-based chatbots utilize machine learning tech</title>
        <p>niques to establish a connection between user inputs
and pre-defined outputs. Typically, relevant data is
collected and stored to establish associations between user
intents (i.e., the conceptual meaning behind a user’s
request) and appropriate responses. Next, a pre-trained
RNN-based chatbots
One commonly used method for developing
generationbased chatbots involves the use of two interconnected
neural networks known as recursive neural networks
(RNNs). The first network, called the encoder, is trained
to associate an input sentence with an intermediate vec- (AI) to streamline and revolutionize complex
decisiontor called the context vector. The second network, making processes, augmenting the power of
cuttingcalled the decoder, takes the context vector as input and edge technologies, enhancing the classical
Retrievalis trained to generate an output sentence, either by gen- Augmented Generation (RAG) models. Through a
meticuerating actual words or by using tokens. This approach lous exploration of a multi-query &amp; human centred RAG
is commonly referred to as "sequence-to-sequence" or application design, the access and the understanding to
Seq2Seq [6, 10]. sophisticated AI capabilities, bridging the gap between
As RNN-based chatbot responses are dynamically gen- technical expertise and practical application, is
guaranerated through machine learning models, they may be teed. The culmination of this inquiry comes with a
conless precise and more uncertain than retrieval-based chat- cise and robust architectural flow proposal, laying the
bots. For this reason, RNN-based chatbots are less com- groundwork for the seamless integration of
multiquerymonly used in task- or knowledge-oriented scenarios and RAG solutions into decision-making processes and
oferare instead more frequently used in entertainment and ing further insights that extends beyond the confines of
mental-health-related activities [5]. this study and pave the way for future advancements in
the field.</p>
        <p>Transformer-based chatbots
A Transformer is a recent type of neural network
architecture used for NLU and chatbots. First introduced in
[11], is also used in other tasks such as language
translation and text summarization. Transformers are based on
the self-attention mechanism, which allows the model
to learn which parts of the input sequence to attend to
at each step of processing, based on the relevance of the
other parts of the sequence to the current position. This
is done through a process called scaled dot-product
attention, where the model learns a set of weights to compute
a weighted sum of the input sequence representations.</p>
        <p>An important language model based on the Transformer
architecture is the Generative Pre-trained
Transformer (GPT), which was developed by OpenAI in 2020
[12]. GPT serves as the underlying architecture for the
ChatGPT chatbot, which has gained widespread
recognition for its ability to provide detailed and articulate
responses across a variety of domains [13].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Multiquery Retrieval</title>
    </sec>
    <sec id="sec-4">
      <title>Augmented Generation</title>
      <sec id="sec-4-1">
        <title>In the actual forefront of Generative Artificial Intelli</title>
        <p>gence (Gen-AI) streamlining complex decision-making
processes by enabling accessible and comprehensible
tools to all users it is vitally important. The core of this
section is relative to propose an alternative to the classical
RAG, introduced by Lewis et al. in 2021 [14], enhancing
its capabilities with a multiquery approach presenting
a concise and solid architectural flow along with main
evaluation metrics.</p>
        <sec id="sec-4-1-1">
          <title>3.1. Methodology</title>
          <p>Question Generation Chain The multiquery-RAG
system distinguishes itself through its ability to generate
multiple variations of the original user query, in a human
like fashion, through a specialized question generation
chain that produces a prefixed number of alternative
queries capturing distinct viewpoints and nuances
associated with the original question. This diversification
of the query set, if correctly fine-tuned, plays a pivotal
role in surmounting the limitations of distance-based
similarity searches in vector databases, ensuring a
comprehensive and more eficient document retrieval process
despite the classical retrieving process.</p>
          <p>Answer Generation Chain Following the retrieval of
information (documents), the system proceeds to
generate answers by synthesizing and formulating responses
using the data extracted from the documents and
leveraging a wide LLMs systems. Contextualizing and
elaborating on those information it ensures that the responses are
both accurate and easily understandable for non-experts
facilitating broader accessibility and utilization of the
information among a wider audience.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.2. Evaluation Criteria</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>This section outlines the principal metrics [15] that are integral for evaluating a Retrieval-Augmented Generation (RAG) in measuring diferent aspects of the system’s performance as presented in figure [1].</title>
      </sec>
      <sec id="sec-4-3">
        <title>Context Precision This metric evaluates the signal-tonoise ratio within the retrieved contexts measuring how many of the retrieved documents are actually relevant respect to the user’s query.</title>
      </sec>
      <sec id="sec-4-4">
        <title>This methodological section delves into the profound im</title>
        <p>plications of leveraging Generative Artificial Intelligence</p>
      </sec>
      <sec id="sec-4-5">
        <title>Context Recall This metric assesses whether all neces</title>
        <p>sary information required to answer the query has been
Recent advancements, however, ofer promising
solutions. [18] and [19] present novel frameworks integrating
semantic analysis, deep learning, and NoSQL
technologies to extract entities from knowledge corpora, bridging
the gap between textual and multimedia sources. Their
approaches mark significant strides in enriching KGs
with diverse data types, fostering more comprehensive
knowledge representation and analysis.</p>
        <p>Meanwhile, Chen et al. [20] propose a generative
approach to the KG population, leveraging machine
learning to establish relationships and reduce human
intervention in the curation process. Training models to learn
underlying data distributions and generate triplets
regardless of entity pair co-occurrence in textual corpora
pave the way for more eficient and scalable KG
construction. This innovative approach streamlines the
population process and broadens the scope of knowledge
capture, enabling KGs to encapsulate a wider array of
interconnected concepts and relationships.</p>
        <p>Manual curation, though traditional, is labor-intensive
and impractical in the face of expanding data landscapes
Figure 1: RAG Evaluation criterion [21]. To address this, a data-centric architecture
harnessing generative deep-learning models emerges,
automating KG creation, particularly for multimedia instances.
retrieved ensuring that the system’s knowledge base cov- By synthesizing multimedia data, irrespective of absolute
ers all aspects needed to formulate a comprehensive and data scarcity, a dynamic, infinitely expandable pool of
accurate response and relying on a comparison between instances is ensured, underpinning model training and
inthe retrieved contexts and the ground truths. ference with a multimedia knowledge graph that evolves
alongside data trends.</p>
        <p>Faithfulness This metric quantifies the factual accu- Diferent knowledge graph population approaches with
racy of the answers generated by the RAG system. It in- generative AI are based on standard steps. The first is
volves counting the number of correct factual statements grabbing information from curated textual sources. It is
made in the generated answers based on the retrieved possible to enrich it by using Linked Open Data (LOD)
contexts and comparing this count to the total number and base the image’s generation using the enhanced
texof statements in the answers. tual description to make the text as complete as possible.
The next step combines the previously obtained textual
Answer Relevancy This metric measures how well statement and produces a representative multimedia
inthe generated answers address the user’s queries. For ex- stance of the input text via a generative text-image
synample, if a query asks for multiple pieces of information, thesis model. The last step consists of using a focused
the relevancy score reflects how completely the response crawler, which allows a check on the quality of the
generaddresses all elements of the query. ated image, exploiting diferent metrics useful to measure
the degree of similarity of the generated image
concerning its textual description and real images crawled from
4. Multimedia Knowledge Graph the web. If the image from the previous step exhibits
metpopulation using Generative AI ric values that surpass a threshold determined through
experimental evaluation, it can be stored in the node of
Knowledge Graphs (KGs) serve as potent repositories, the multimedia knowledge base.
adeptly organizing, connecting, and extracting insights In image generation for the knowledge graph population,
from many data sources, embodying contemporary text-image synthesis models are developed to bridge the
knowledge management principles in semantic web ap- semantic gap between textual descriptions and
correplications [16]. Despite their invaluable utility, realizing sponding visual representations. These models
leverthe full potential of KGs necessitates a systematic pop- age cutting-edge generative strategies to produce
highulation with relevant information, a task fraught with quality images aligned with the provided textual prompts.
challenges, mainly when data is scarce [17]. The application of text-to-image models improved a lot in
recent years, migrating from Generate Adversarial
Network (GAN) to Latent Difusion Models, such as Stable on creating a concrete sustainable generative model,
adDifusion [22]. A latent difusion model refines a latent dressing crucial issues related to data collection, key
representation by applying difusion steps in the latent model components, and essential additions. One of the
space, gradually reducing noise and revealing the desired main goals of the project is to improve model eficiency
image. This iterative process involves adding noise and without compromising performance, using techniques
updating the latent code. The model implements a de- such as attention and linear layer optimization within the
coder network to reconstruct the image from the refined Transformer architecture. Hominis also aims to ensure
latent code. the sanitization of public data and develop data collection
The evaluation phase of the quality of multimedia in- strategies to capture a wide range of multifaceted data.
stances for the KG node is important. The evaluation pro- Additionally, the project involves developing tools for the
cess of text-to-image synthesis models involves assessing community to analyze, curate, and critique datasets while
their accuracy in converting text inputs into synthetic ensuring fairness, privacy, and legality. The proposed
images. methodologies, such as Universal Tokenization, Assisted
Some quantitative metrics are used to assess not only the Generation by Recovery (RAG), the use of difusion to
quality of the image about the text but also the degree improve model controllability, and the use of muTransfer
of realism in a generated image by comparing it to real technique to optimize hyperparameters and reduce
carimages, such as Cosine Similarity, which compares the bon footprint associated with training, all aim to improve
feature vectors, calculating the cosine between them, FID the eficiency, sustainability, and fairness of AI models. In
(Frechèt Inception Distance) [23], a numerical value that particular, the approach of unifying data through
Univerquantifies the similarity between the statistical distribu- sal Tokenization can help better manage data diversity,
tions of real and generated images computing the Fréchet while RAG can improve model relevance and accuracy,
distance between the two distributions, and CLIP score ensuring greater fairness in outcomes. Furthermore, the
[24], a metric that understands the relationship between use of difusion to improve model controllability helps
enimages and text, used for evaluate the model’s ability to sure that AI outputs are transparent and understandable.
rank images based on their relevance to a given textual Today, attention to sustainable, adaptable, and
responsidescription and vice versa. ble AI is crucial to ensure that the benefits of artificial
intelligence are evenly distributed and that negative
impacts, such as the carbon footprint associated with model
5. Ethical and social challenges training, are minimized. In an era where sustainable and
responsible AI is essential for our future, projects like
Hominis represent a step in the right direction, helping
ensure that the benefits of AI are accessible to all while
minimizing negative impacts on the environment and
society.</p>
        <p>The recent advances in generative AI are revolutionizing
many sectors thanks to the ability to create original
content based on patterns learned from training data. Models
such as those based on transformer architectures, have
already demonstrated significant success in various fields,
including natural language processing, computer vision,
and reinforcement learning. However, despite the advan- Acknowledgments
tages ofered by generative models, their development
and deployment raise concerns regarding ethical and en- This work was partially supported by PNRR MUR Project
vironmental implications. Firstly, these models require PE0000013-FAIR.
massive computational resources and consume a large The FAIR project is committed to promoting an advanced
amount of energy during both training and execution vision of Artificial Intelligence, driving research and
deprocesses. This raises concerns about the environmental velopment in this crucial field and constantly keeping
impact of AI, especially considering the urgent need to ethical, legal and sustainability considerations in mind
reduce carbon emissions to address climate change.
Additionally, there are ethical concerns regarding the use
and management of training data. Since these models References
can generate original content, there is a risk that they
may perpetuate biases or discriminations present in the
training data, raising questions about fairness, privacy,
and data security in the era of AI [25].</p>
        <p>The Hominis project, conducted at the University of
Naples Federico II in collaboration with industrial
partners (DeepKapha), aims to advance toward sustainable
and programmable AI solutions [26]. The project focuses</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>