<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Oliverio);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Modular LLM-based Dialog System for Accessible Exploration of Finite State Automata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano Vittorio Porta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pier Felice Balestrucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Oliverio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Anselma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Mazzei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In the field of assistive technologies, making accessible to visually impaired users complex visual content such as graphs or conceptual maps remains a significant challenge. This work proposes a modular dialog system that leverages a combination of neural Natural Language Understanding (NLU) and Retrieval-Augmented Generation (RAG) to translate graphical structures into meaningful text-based interactions. The NLU module combines a fine-tuned BERT classifier for intent recognition together with a spaCy-based Named Entity Recognition (NER) model to extract user intents and parameters. Moreover, the RAG pipeline retrieves relevant subgraphs and contextual information from a knowledge base, reranking and summarizing them via a language model. We evaluate the system across multiple specific tasks, achieving over 92% F1 in intent classification and NER, and demonstrate that even open-weight models, like DeepSeek-r1 or LLaMA-3.1, can ofer competitive performance compared to GPT-4o in specific domains. Our approach enhances accessibility while maintaining modularity, interpretability, and performance on par with modern LLM architectures.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dialogue Systems</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tures combine LLMs with modular architectures, such to handle ambiguous or context-dependent queries,
esas Retrieval-Augmented Generation (RAG) systems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], pecially in domains that involve structured or graphical
which integrate the text generation capabilities of LLMs information.
with an information retrieval module for selecting and To overcome these limitations, modern DSs
increaspresenting the most relevant information to the user. ingly adopt neural NLU methods. Intent classification is
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we introduced AIML+, a novel framework based commonly modeled as a supervised classification task,
on AIML, specifically developed for building DSs to as- where transformer-based models such as BERT have
sist visually impaired students in navigating graphical demonstrated state-of-the-art performance [14, 15].
structures. The use of AIML was motivated by the need Early NER systems relied on hand-crafted rules and
to provide accurate responses to users, although it also domain-specific features, which required significant
hurevealed limitations in terms of NLU. Building on this, man efort and expertise [ 16]. Recent advances
leverand with the goal of creating a reliable system suitable for age distributed representations, context encoders, and
critical domains such as education, this paper extends our tag decoders, achieving state-of-the-art results with less
previous work by integrating LLMs into rule-based DSs, manual feature engineering [17, 18]
resulting in a RAG pipeline. This work aims to improve In parallel, RAG has emerged as a prominent approach
the often brittle NLU of traditional rule-based approaches to enabling language models to ground their responses
and to reduce hallucinations in NLG. in external knowledge. Although initially developed for
      </p>
      <p>Specifically, our proposal employs a hybrid architec- open-domain QA and document-based tasks, its use in
ture that combines: (i) an NLU module based on intent structured or symbolic domains, such as graphs, is
gainclassifier and NER to interpret user utterances; (ii) a rule- ing attention, particularly in educational or assistive
setbased information retrieval module to extract relevant tings [19, 20]. However, these systems often focus on
information; and (iii) an LLM-based NLG module to gen- general factual retrieval and rarely address the
accessibilerate the system response. ity needs of users navigating inherently visual content.</p>
      <p>
        The paper is structured as follows. Section 2 reviews This work builds upon the NoVAGraphS project,
related work in the field of accessible technologies and which first proposed transforming non-visual access to
dialog systems. Section 3 presents the proposed method- graphical content into a dialog-based paradigm via
handology. Section 5 focuses on the performance of the NLU crafted AIML conversational systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We build on
pipeline, Section 6 explains the Dialog Manager and Re- this work by introducing a neural NLU pipeline and a
trieval Layer logic, while Section 7 evaluates the genera- RAG component specifically tailored to the retrieval and
tion module through both human and automatic assess- generation of descriptions from symbolic graph
strucments. We conclude with a discussion of our findings tures.
and future directions in Section 8.3
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <sec id="sec-3-1">
        <title>We propose a modular dialog system based on</title>
        <p>
          Accessible technologies have explored various strategies transformer-based components used for both NLU and
to convey graphical information to VIP, including haptic NLG (see Figure 1). To build the NLU module, we
exfeedback (e.g., vibrations and touch cues) [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ], soni- tended an existing resource [21] by applying both
autoifcation (data-to-sound mappings) [ 10, 11], and textual matic data augmentation and manual annotation. In this
descriptions [12, 13]. While efective in specific contexts, way, we have been able to train models for both the tasks
these approaches often lack flexibility, interactivity, and of (1) Intent Classification and (2) Named-Entity
Recognigeneralizability—particularly when dealing with com- tion. The output of the NLU module is then passed to the
plex or symbolic visual content. To address these limita- dialog management module, a rule-based system
respontions, DSs have been proposed as a more dynamic and sible for retrieving the specific information requested by
user-adaptive interface for mediating access to graphical the user, referred to as retrieved evidence in this paper.
structures. The retrieved evidence originates from structured
knowl
        </p>
        <p>
          Early DSs often relied on hand-crafted rules to parse edge bases that, in the experimentation described below,
user input and generate responses. AIML [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], for instance, consists of a specific diagram. The NLG module employs
encodes pattern-response pairs via XML, enabling de- a prompt built by the Dialog Manager to generate from
terministic rule-based dialogs. Although accessible and LLMs natural and contextually relevant responses by
interpretable, these systems lack the robustness required leveraging both the current user intent and the retrieved
evidence.
        </p>
        <p>Given our task-based approach, we focus on dialogs
3All code and experimental results are publicly available at https: about Finite State Automata (FSA) as a specific case study.
//github.com/stefa168/tesi_tln.</p>
        <p>User Input
Is there a state called 's9' in
the automaton?
System Output
There is no node called "s9"
in the automaton.</p>
        <p>NEURAL-NLU
Intent Classifier
Named-Entity
Recognizer
NLG-LLM</p>
        <p>Intent + NE
Intent = state.existence
Entities = [(NODE, 's9')]
Prompt with
User Input +
Retrieved
Evidence</p>
        <p>Query
A.exists_node(s9)
Query Signature
node s9 existence
Retrieved
Evidence
false</p>
        <p>Automata</p>
        <p>KB
Dialog Manager</p>
        <p>Retrieval Layer</p>
        <p>FSA are mathematical models of computation typically
taught in computer science degree programs which are
often represented as structured graphs. They are formally
defined as a quintuple consisting of: (1) a finite set of
states , (2) a finite set of input symbols Σ , (3) a transition
function  :  × Σ →  that maps each state and input
symbol to a new state, (4) a start state 0 ∈ , and (5) a
set of accepting (or final) states  ⊆ .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data Collection and Annotation</title>
      <sec id="sec-4-1">
        <title>To develop the NLU module, we built upon an existing re</title>
        <p>source, the NoVAGraphS corpus [21]. The corpus consists
of 32 human–computer conversations focused on the
domain of FSA, comprising a total of 706 dialog turns. Since
our work focuses on understanding user input, we
exclusively use the 353 human utterances from the dataset.</p>
        <p>Based on this corpus, we extended the dataset through
data augmentation techniques by using a mix of
commercial and open-weight LLMs, including GPT-4o, GPT-o1,
and GPT-o3.mini, as well as two locally run models,
Llama3.1 and DeepSeek R1, generating paraphrases of
the original utterances.4 To ensure data quality, we
manually reviewed the synthetic utterances to verify their
correctness. In addition, we also included 100 random
of-topic questions extracted from the SQuAD 2.0 dataset
[22, 23], selected to represent out-of-domain input5.</p>
        <p>The final dataset contains 1, 080 user utterances. All
utterances, both original and synthetic, were
manually annotated by one of the authors—proficient in
English—for both intent and entity information.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4https://openai.com/index/hello-gpt-4o/, https://openai.</title>
        <p>com/index/openai-o3-mini/, IntroducingOpenAIo1, https:
//ollama.com/library/deepseek-r1:8b, https://huggingface.co/
meta-llama/Llama-3.1-8B
5https://huggingface.co/datasets/rajpurkar/squad_v2</p>
      </sec>
      <sec id="sec-4-3">
        <title>Intents We used a hierarchical labeling annotation to</title>
        <p>better capture the specific topic of each user utterance.
The resulting dataset consists of two levels of classes:
main intents and sub-intents. Specifically, we defined 7
main intents representing the general topic of the
question (Table 1). For four of these main intents (AUTOMATON,
TRANSITION, STATE, and GRAMMAR) an additional
annotation level, called sub-intent, was introduced. This
second level includes a total of 32 sub-intents (Table 2),
which specify the question’s more fine-grained topic
depending on the main intent category.</p>
      </sec>
      <sec id="sec-4-4">
        <title>6https://github.com/doccano/doccano An entity is encoded as</title>
        <p>[init-char,fin-char,type]
5. Neural NLU
• INPUT: for text fragments containing inputs or The first module of our architecture handles NLU through
sequences of symbols. For example, in the sen- a two-step pipeline: (i) Intent Classification and (ii)
tence “Does it only accept 1s and 0s?” there are two Named-Entity Recognition. The goal is to extract a
entities of type INPUT: [20,21,"input"], structured representation of the user’s utterance by
iden[27,28,"input"]; tifying the intent and the entities in the user input. For
• NODE: for text fragments containing nodes or example:
states of the automaton. For example, in the
sentence “Is there a transition between q2 and
q0?” there are two entities of type NODE:
[30,32,"node"], [37,39,"node"];
Input: “Is there a state called s9 in the
automaton?”
Output: {</p>
        <p>Intent = state.existence,
Entities = [(NODE, ‘s9’)]
• LANGUAGE: for text fragments containing
information about the language accepted by the au- }
tomaton. For example, in the sentence “Does
the automaton accept strings over the alphabet To build the NLU module, we trained two models for
{0,1}?” there is one entity of type LANGUAGE: Intent Classification and Named-Entity Recognition
us[53,58,"language"]. ing the corpus described in Section 4, and we evaluated
them against the AIML system we proposed in [24].
(a) AIML baseline
(b) BERT model</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Dialog Manager and Retrieval</title>
    </sec>
    <sec id="sec-6">
      <title>Layer</title>
      <sec id="sec-6-1">
        <title>Intent Classification For intent classification, we fine</title>
        <p>tuned a BERT-base-uncased model7 for both main and
sub-intent classification. The dataset was split into 60%
training, 20% development, and 20% testing. We fine- The Dialog Manager is responsible for orchestrating the
tuned with the following hyper-parameters: 20 epochs, interaction flow by interpreting the NLU output and
coorLR 2× 10− 5, linear warm-up 10%, batch 16. Training was dinating the appropriate system response. This involves
logged with Weights &amp; Biases. Our approach signifi- analyzing the classified intent and any associated
enticantly outperforms the AIML baseline, achieving a macro- ties, and invoking the corresponding function from the
F1 score of 0.92 on main intents and 0.86 on sub-intents. Retrieval Layer.</p>
        <p>This marks a substantial improvement over AIML, which The Retrieval Layer is activated whenever the
recogscores only 0.33 and 0.20, respectively (see Table 3). Fig- nized intent is relevant to the domain, thus neither START
ure 2 compares the confusion matrices for both systems, nor OFF TOPIC. Indeed, START typically triggers a
welshowing that BERT produces far fewer of-topic errors come message, while OFF TOPIC handles inputs outside
and handles ambiguous utterances more robustly. the system’s scope. Since these cases do not require
access to the automaton’s knowledge, retrieval is skipped.</p>
        <p>Table 3 For domain-specific intents (e.g., checking the
exisPerformance on main and sub-intent classification for the tence of a state), the Dialog Manager uses a rule-based
fine-tuned BERT model and the AIML baseline (↑ higher is system that maps intent–entity pairs to specific queries.
better). This design ensures transparency and precise control
Model Main Intent F1 Sub-intent F1 NER over system behavior. For instance, when the intent
is state.existence and the entity is a node
identiBERT (ours) 0.92 0.86 0.92 ifer like ‘s9’, the Dialog Manager calls the function
AIML baseline 0.33 0.20 - exists_node(‘s9’). This function queries the
underlying automaton representation to determine whether
the specified node exists. The automaton is stored in a
Named Entity Recognition NER is handled using a Knowledge Base (KB) constructed using the NetworkX
simplified spaCy v3 pipeline that exclusively employs the Python library,9 which allows eficient graph
manipuNER component on top of a blank model,8 fine-tuned on lation. The automaton’s structure is serialized in DOT
our annotated dataset with the same data split (60/20/20). format, a standard for graph description, and visualized
The pipeline is based on the transformer architecture [25] using Graphviz.10
and identifies domain-specific entities such as states, The Retrieval Layer then returns a structured output
transitions and input strings. It achieves an F1- (e.g. false, if the node is not found), which is passed to
score of 0.92 on the test set (see Table 3). the NLG module for the generation of the final response.
7https://huggingface.co/google-bert/bert-base-uncased
8https://spacy.io/usage/v3</p>
      </sec>
      <sec id="sec-6-2">
        <title>9https://networkx.org/ 10https://graphviz.org/</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. LLM-based NLG</title>
      <p>
        a-Judge), applying the same error taxonomy. The
annotator pool included 8 students from the Department of
For the NLG module, we adopt a prompting strategy Computer Science, 2 with an engineering background,
based on LLMs that uses both the user input and the and 2 from the Departments of History and Biology. The
output of the Dialog Manager to generate contextually average age was 28, with a range from 21 to 68 years.
relevant and accurate responses. This technique is widely Each annotator evaluated a subset of the responses, with
adopted in RAG systems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], as it enables the model to overlapping assignments to ensure that all 75 generated
ground its answers in retrieved evidence, reducing hallu- answers were reviewed by multiple judges.
cinations and increasing factual accuracy. Our prompt
template drives the model to act as a domain-specific ex- Table 4
pert — in this case, for finite state automata — instructing Average percentage of answers containing at least one
lait to use only the retrieved data without introducing ex- beled error, computed by aggregating the four error categories
traneous information or explicit references to the source. (INCORRECT, NOT-CHECKABLE, MISLEADING, OTHER). Lower
This approach helps maintain concise, focused answers values indicate better performance.
that avoid potential confusion or unverifiable content.
      </p>
      <sec id="sec-7-1">
        <title>Generator</title>
      </sec>
      <sec id="sec-7-2">
        <title>Human error ↓</title>
      </sec>
      <sec id="sec-7-3">
        <title>GPT-4.5 error ↓</title>
      </sec>
      <sec id="sec-7-4">
        <title>GPT-o3-mini GPT-4o</title>
      </sec>
      <sec id="sec-7-5">
        <title>DeepSeek-r1-8B</title>
      </sec>
      <sec id="sec-7-6">
        <title>Gemma2-9B LLaMA3.1-8B</title>
        <p>LLaMA3.1-8B.11</p>
        <p>To assess the quality of the generated answers, we con- • CLARITY: whether the response is
understandducted a human evaluation using the FactGenie platform able and well-structured;
[26]. A group of 12 volunteer annotators labeled each • USEFULNESS: whether the response is helpful
generation according to four error categories defined by and provides relevant information;
the taxonomy in Kasner and Dusek [27]. In particular: • OVERALL APPRECIATION: whether the response
INCORRECT indicates that the text contradicts the data; is perceived as satisfactory or positively received
NOT-CHECKABLE means the information cannot be veri- by the annotator;
ifed; MISLEADING refers to text that is deceptive given • FACTUAL ACCURACY: whether the response is
the context or omits crucial information; and OTHER in- entirely correct and free from factual errors.
cludes problematic cases that do not fit into the other
categories. In addition to human annotation, we also The same group of 12 human annotators performed
performed automatic labeling using GPT-4.512 (LLM-as- labeling according to these dimensions.
Table 5 shows that GPT-o3-mini receives the most
11https://openai.com/index/hello-gpt-4o/, https://openai.com/index/ favorable user judgments across all dimensions. Among
openai-o3-mini/, https://ollama.com/library/deepseek-r1:8b, open-weight models, DeepSeek-r1-8B is the most
poshttps://huggingface.co/google/gemma-2-9b, https://huggingface. itively rated, while LLaMA3.1-8B and Gemma2-9B
re12chott/pmse:/t/ao-plleanmaia./cLolmam/ina-d3e.x1/-i8nBtroducing-gpt-4-5/ ceive consistently lower preferences from annotators.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>This work presents a significant advancement over
previous systems aimed at the exploration of graphical
structures, by proposing a hybrid modular architecture that
integrates NLU and NLG techniques based on
Transformers and LLMs. The implemented DS addresses several
key limitations of rule-based DSs, such as rigid pattern
matching, limited context handling, and dificulties in
interacting with external data sources.</p>
      <p>Compared to AIML, our system stands out for its
greater expressive flexibility and its ability to adapt to
complex conversational flows, thanks to a more
articulated dialog management mechanism. The
introduction of a neural classifier for intent recognition, along
with a spaCy-based NER module, has substantially
improved the robustness of natural language understanding,
achieving F1 scores above 90% for both Intent
Classiifcation and NER. Moreover, the RAG component has
significantly reduced hallucinations and ambiguity in
generation, providing contextually accurate responses
that are well-grounded in structured data.</p>
      <p>The results demonstrate that a hybrid and modular
approach can ensure accessibility, reliability, and
control—fundamental features for the adoption of DSs in
educational and assistive contexts. Our framework
therefore represents a concrete step toward more interpretable,
adaptable, and user-centered intelligent DSs. In future
works we plan to evaluate the complete system with blind
people.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Limitations</title>
      <sec id="sec-9-1">
        <title>While the system shows strengths in modularity, accu</title>
        <p>racy, and integration of LLMs, a significant limitation
persists: its accessibility has yet to be validated with learners.</p>
        <p>Although designed with accessibility in mind, the
system’s real-world efectiveness and usability—especially
for visually impaired individuals interacting with
graphical content—remain untested. Conducting a structured
evaluation with these target users is crucial to determine
its pedagogical impact and practical usability.
agrams, in: Proceedings of the 21st International T.-M. Georgescu, Artificial intelligence text
proACM SIGACCESS Conference on Computers and cessing using retrieval-augmented generation:
ApAccessibility, ASSETS ’19, Association for Com- plications in business and education fields,
Proputing Machinery, New York, NY, USA, 2019, p. ceedings of the International Conference on
Busi312–317. doi:10.1145/3308561.3353811. ness Excellence 18 (2024) 209 – 222. doi:10.2478/
[10] D. Ahmetovic, C. Bernareggi, J. a. Guerreiro, S. Ma- picbe-2024-0018.</p>
        <p>scetti, A. Capietto, Audiofunctions.web: Multi- [20] F. Miladi, V. Psyché, D. Lemire,
Leveragmodal exploration of mathematical function graphs, ing gpt-4 for accuracy in education: A
comin: Proceedings of the 16th International Web for parative study on retrieval-augmented
generaAll Conference, W4A ’19, Association for Comput- tion in moocs (2024) 427–434. doi:10.1007/
ing Machinery, New York, NY, USA, 2019, pp. 1–10. 978-3-031-64315-6_40.</p>
        <p>doi:10.1145/3315002.3317560. [21] E. Di Nuovo, M. Sanguinetti, P. F. Balestrucci,
[11] J. Su, A. Rosenzweig, A. Goel, E. de Lara, K. N. L. Anselma, C. Bernareggi, A. Mazzei, Educational
Truong, Timbremap: enabling the visually- dialogue systems for visually impaired students:
Inimpaired to use maps on touch-enabled devices, troducing a task-oriented user-agent corpus, in:
in: M. de Sá, L. Carriço, N. Correia (Eds.), Proceed- Proceedings of the 2024 Joint International
Conings of the 12th Conference on Human-Computer ference on Computational Linguistics, Language
Interaction with Mobile Devices and Services, Mo- Resources and Evaluation (LREC-COLING 2024),
bile HCI 2010, Lisbon, Portugal, September 7- ELRA and ICCL, Torino, Italia, 2024, pp. 5507–5519.
10, 2010, ACM International Conference Proceed- [22] P. Rajpurkar, J. Zhang, K. Lopyrev, P. Liang,
ing Series, ACM, 2010, pp. 17–26. doi:10.1145/ SQuAD: 100,000+ questions for machine
compre1851600.1851606. hension of text, in: J. Su, K. Duh, X. Carreras
[12] V. Sorge, M. Lee, S. Wilkinson, End-to-end solution (Eds.), Proceedings of the 2016 Conference on
for accessible chemical diagrams, in: Proceedings Empirical Methods in Natural Language
Processof the 12th International Web for All Conference, ing, Association for Computational Linguistics,
W4A ’15, Association for Computing Machinery, Austin, Texas, 2016, pp. 2383–2392. URL: https:
New York, NY, USA, 2015. doi:10.1145/2745555. //aclanthology.org/D16-1264. doi:10.18653/v1/
2746667. D16-1264. arXiv:1606.05250.
[13] S. Chockthanyawat, E. Chuangsuwanich, [23] P. Rajpurkar, R. Jia, P. Liang, Know what you
A. Suchato, P. Punyabukkana, Towards automatic don’t know: Unanswerable questions for SQuAD,
diagram description for the blind, in: i-CREATe. in: I. Gurevych, Y. Miyao (Eds.), Proceedings of
The International Convention on Rehabilitation the 56th Annual Meeting of the Association for
Engineering and Assistive Technology, 2017, pp. Computational Linguistics (Volume 2: Short
Pa1–4. doi:10.13140/RG.2.2.11969.04961. pers), Association for Computational Linguistics,
[14] Z. Zhang, Z. Zhang, H. Chen, Z. Zhang, A joint Melbourne, Australia, 2018, pp. 784–789. URL:
learning framework with bert for spoken language https://aclanthology.org/P18-2124. doi:10.18653/
understanding, IEEE Access 7 (2019) 168849– v1/P18-2124. arXiv:1806.03822.</p>
        <p>168858. doi:10.1109/ACCESS.2019.2954766. [24] P. F. Balestrucci, E. Di Nuovo, M. Sanguinetti,
[15] M. Roman, A. Shahid, S. Khan, A. Koubâa, L. Yu, L. Anselma, C. Bernareggi, A. Mazzei, An
educaCitation intent classification using word embed- tional dialogue system for visually impaired people,
ding, IEEE Access 9 (2021) 9982–9995. doi:10. IEEE Access (2024).</p>
        <p>1109/ACCESS.2021.3050547. [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
[16] D. Nadeau, S. Sekine, A survey of named entity L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
Atrecognition and classification, Lingvisticae Inves- tention is all you need, Advances in neural
infortigationes 30 (2007) 3–26. doi:10.1075/LI.30.1. mation processing systems 30 (2017).</p>
        <p>03NAD. [26] Z. Kasner, O. Platek, P. Schmidtova, S. Balloccu,
[17] J. Li, A. Sun, J. Han, C. Li, A survey on deep learning O. Dusek, factgenie: A framework for
spanfor named entity recognition, IEEE Transactions on based evaluation of generated texts, in: S.
MaKnowledge and Data Engineering 34 (2018) 50–70. hamood, N. L. Minh, D. Ippolito (Eds.),
Proceeddoi:10.1109/TKDE.2020.2981314. ings of the 17th International Natural Language
[18] P. Liu, Y. Guo, F. Wang, G. Li, Chinese named entity Generation Conference: System Demonstrations,
recognition: The state of the art, Neurocomputing Association for Computational Linguistics, Tokyo,
473 (2021) 37–53. doi:10.1016/j.neucom.2021. Japan, 2024, pp. 13–15. URL: https://aclanthology.
10.101. org/2024.inlg-demos.5/. doi:10.18653/v1/2024.
[19] B.-S. Posedaru, F.-V. Pantelimon, M.-N. Dulgheru, inlg-demos.5.</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content
as needed and take(s) full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Power</surname>
          </string-name>
          ,
          <source>The ALT Text: Accessible Learning with Technology</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Balestrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Anselma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernareggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazzei</surname>
          </string-name>
          ,
          <article-title>Building a spoken dialogue system for supporting blind people in accessing mathematical expressions</article-title>
          , in: F. Boschetti,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Lebani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          , N. Novielli (Eds.),
          <source>Proceedings of the 9th Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2023</year>
          ), CEUR Workshop Proceedings, Venice, Italy,
          <year>2023</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .clicit-
          <volume>1</volume>
          .10/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <source>The Elements of AIML Style, ALICE A.I Foundation</source>
          ,
          <year>2001</year>
          . Available at https://files.ifi.uzh. ch/cl/hess/classes/seminare/chatbots/style.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oliverio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , D. De Giorgi,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Balestrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Manolino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazzei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Anselma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernareggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Serio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sabena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Armano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coriasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Capietto</surname>
          </string-name>
          , Novagraphs:
          <article-title>Towards an accessible educational-oriented dialogue system</article-title>
          ,
          <source>in: Proceedings of the Second International Workshop on Artiifcial INtelligent Systems in Education co-located with 23rd International Conference of the Italian Association for Artificial Intelligence (AIxIA</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Bobrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Thompson</surname>
          </string-name>
          , T. Winograd,
          <article-title>Gus, a frame-driven dialog system</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>8</volume>
          (
          <year>1977</year>
          )
          <fpage>155</fpage>
          -
          <lpage>173</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/0004370277900182. doi:https://doi. org/10.1016/
          <fpage>0004</fpage>
          -
          <lpage>3702</lpage>
          (
          <issue>77</issue>
          )
          <fpage>90018</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abusitta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey on explainable ai: Techniques, challenges</article-title>
          and open issues,
          <source>Expert Systems with Applications</source>
          <volume>255</volume>
          (
          <year>2024</year>
          )
          <fpage>124710</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Retrievalaugmented generation for large language models: A survey</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2312.10997. arXiv:
          <volume>2312</volume>
          .
          <fpage>10997</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernareggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Comaschi</surname>
          </string-name>
          , G. Dalto,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mussio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Parasiliti</given-names>
            <surname>Provenza</surname>
          </string-name>
          ,
          <article-title>Multimodal exploration and manipulation of graph structures</article-title>
          ,
          <source>in: Proceedings of the 11th International Conference on Computers Helping People with Special Needs</source>
          ,
          <source>ICCHP '08</source>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2008</year>
          , p.
          <fpage>934</fpage>
          -
          <lpage>937</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -70540-6_
          <fpage>140</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernareggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ahmetovic</surname>
          </string-name>
          , S. Mascetti, muGraph:
          <source>Haptic Exploration and Editing of 3D Chemical Di</source>
          [27]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kasner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dusek</surname>
          </string-name>
          ,
          <article-title>Beyond traditional benchmarks: Analyzing behaviors of open LLMs on data-to-text generation</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>12045</fpage>
          -
          <lpage>12072</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>651</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>651</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>