<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CILC</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Concept2Text: an explainable multilingual rewriting of concepts into natural language⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Flavio Bertini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Dal Palù</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Fabiano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Formisano</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Zaglio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, New Mexico State University</institution>
          ,
          <addr-line>Las Cruces, New Mexico</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematical, Physical and Computer Sciences, University of Parma</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Mathematics</institution>
          ,
          <addr-line>Computer Science and Physics</addr-line>
          ,
          <institution>University of Udine</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>39</volume>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Automated and explainable data interpretation hinges on two critical steps: (i) identifying emerging properties from data and representing them into abstract concepts, and (ii) translating such concepts into natural language. While Large Language Models have recently demonstrated impressive capabilities in generating natural language, their trustworthiness remains dificult to ascertain. The deployment of an explainable pipeline enables its application in high-risk activities, such as decision making. Addressing this demanding requirement is facilitated by the fertile ground of knowledge representation and automated reasoning research. Building upon previous work that explored the first step, we focus on the second step, named Concept2Text. The design of an explainable translation naturally lends itself to a logic-based model, once again highlighting the contribution of declarative programming to achieving explainability in AI. This paper explores a Prolog/CLP-based rewriting system designed to interpret concepts expressed in terms of classes and relations derived from a generic ontology, generating text in natural language. Its key features encompass hierarchical tree rewritings, modular multilingual generation, support for equivalent variants across semantic, grammar, and lexical levels, and a transparent rule-based system. We present the architecture and illustrate a simple working example that allows the generation of hundreds of diferent and equivalent rewritings relative to the input concept.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Concept-to-text</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Natural Language</kwd>
        <kwd>Prolog</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the demand for system explainability is heightened in contexts deemed to carry unacceptable or high
risks because they pose a threat to people or afect safety or fundamental rights, such as real-time and
remote biometric identification systems and medical devices.</p>
      <p>The demand for explainable AI (xAI) applications propels us beyond the “black-box” approach
characteristic of traditional sub-symbolic AI. Unlike these systems, which often rely on machine
learning or deep learning, the evolving landscape of AI necessitates elucidating the reasoning behind
its decisions, particularly in high-risk scenarios, such as automatically describing an electrocardiogram
in a medical report.</p>
      <p>
        Our research centers on the process of transforming raw information (e.g., data series) into natural
language descriptions. Designing an explainable framework entails an initial step for interpreting
and abstracting features (Data2Concept), followed by translating concepts into natural language
(Concept2Text). The first step, as outlined in [
        <xref ref-type="bibr" rid="ref4 ref6 ref7">3, 4</xref>
        ], involves identifying desired patterns in raw data and
representing them as high-level descriptions or concepts. For instance, a time series might exhibit a
consistent increase in values over time, which can serve as a logical fact for subsequent processing.
These concepts can be enriched with additional knowledge, such as information about the time domain,
measured quantities, and contextual factors.
      </p>
      <p>In our previous work, we solely demonstrated a proof of concept for the second step. We presented
a basic generator of natural language expressions, serving as a foundational precursor to the system
elaborated upon in this paper. Subsequent extensions of that work have already led to the deployment
of domain-specific applications. For instance, [ 5] delineates the utilization of Logic Programming in
crafting an explainable decision-making support system tailored for learning analytics in academia.</p>
      <p>This paper presents the design of a general and explainable Concept2Text pipeline. Its key features
include:
• Explainability: The system is grounded in Logic Programming (LP) features, particularly
focusing on rewriting rules and Constraint Satisfaction Problems, ensuring transparency for
editing and scaling.
• Modularity: The system’s architecture allows for seamless expansion to accommodate various
domain-specific concepts and languages.
• Tree Rewriting: Concepts are represented as trees and undergo a series of stages, culminating in
the translation of the concept into natural language. These stages progress from the conceptual
to the grammatical, ultimately reaching the syntax level.
• Variants: To facilitate the generation of diverse semantic-equivalent sentences, each rewriting
permits the creation of multiple versions that can be selected. Each stage of rewriting determines
various levels of equivalence, ranging from conceptual and structural to grammatical and lexical.
• Multi-Language Support: The modular architecture of the system enables the creation of
multiple language rule sets, allowing users to select diferent languages without afecting the
overall structure. In this paper, we illustrate an example involving English and Italian.</p>
      <p>The paper is structured as follows. Section 2 provides a concise overview of related work on the
topic. Section 3 presents the design of the system, while Section 4 demonstrates some practical results.
Finally, Section 5 ofers concluding remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Ontologies</title>
        <p>
          In information science, ontologies serve as organizational frameworks for structuring knowledge within
specific domains. They define facts, properties, and their interrelationships using simple representational
primitives, such as classes, attributes, and relations among class members [
          <xref ref-type="bibr" rid="ref5">6</xref>
          ]. Primarily, they elucidate
the relationships between concepts relevant to a given discourse domain. Ontologies can encapsulate a
broad spectrum of human knowledge, spanning from art and science to technology and medicine.
        </p>
        <p>In practical terms, ontologies employ specific languages to specify concepts and relations, abstracting
away implementation details. The Web Ontology Language (OWL) is a prominent example, designed
to enable applications to process information and concepts autonomously, without direct human
intervention [7]. Leveraging ontologies can greatly benefit computational reasoning for textual content
generation by enabling computers to grasp word meanings and assemble them into complex sentences,
akin to human language processing [8].</p>
        <p>
          In the medical realm, ontologies play a pivotal role in unlocking and facilitating computational
reasoning, particularly in precision medicine and explainable AI [
          <xref ref-type="bibr" rid="ref2">9</xref>
          ]. Biomedical and health sciences
extensively employ ontologies to represent knowledge across diverse areas, including diseases [10],
gene products [11], phenotypic abnormalities [12], clinical trials [13], vaccine information [14], and the
entire human anatomy [15].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Sub-symbolic text models</title>
        <p>In recent times, large language models (LLMs) have surged in popularity, fueled by their remarkable
ability to manipulate, summarize, generate, and predict textual content, after training on text corpora,
in a manner reminiscent of human language [16]. LLMs, notably based on transformer-based neural
network architectures, are characterized by hundreds of billions (or more) of parameters and trained on
vast text datasets. Positioned within the realm of generative AI, they excel in generating content based
on their training data.</p>
        <p>Despite their impressive performance across various text processing tasks, LLMs harbor certain
limitations, including susceptibility to misinterpreting instructions, producing potentially biased content,
and generating factually incorrect information [17]. These limitations underscore a lack of control over
the accuracy and consistency of the concepts synthesized within generated text, leading to undesirable
outcomes such as the generation of fake news and plagiarism [18]. Such challenges align with the
characterization of LLMs as typical "black box" systems, as described in the EU AI Act, wherein
understanding and interpreting their inner workings are inherently challenging.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Symbolic text models</title>
        <p>The utilization of Logic Programming techniques (particularly approaches based on Prolog, ASP, and
CLP [19, 20]) in the extraction of concepts from raw data in compliance with xAI standards, and their
subsequent translation into natural language expressions, remains a relatively unexplored area in the
literature. For instance, Pereira proposed models of grammars and graph-based structures leveraging
Prolog’s unification [21].</p>
        <p>The converse problem, which involves processing natural language text provided as input, has gained
significant attention from the Logic Programming (LP) community [ 22]. Definite Clause Grammars
(DCGs), a well-known tool introduced in the 1980s primarily for parsing natural and artificial languages
using explicit grammar rules and Prolog syntactic sugar, have been instrumental in this endeavor
[23]. DCGs are bidirectional in nature, ofering a methodology not only for parsing but also for
generating text from a controlled context-free grammar that adheres to Backus-Naur Form (BNF)
rules. Additionally, they facilitate the modeling of multiple variants of the grammar tree. Furthermore,
examples of formalization of language structure, although not rule-based or operational, can be found
in the literature [24].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model</title>
      <p>We now focus into the design choices of the system. For the sake of simplicity, let us assume that
a concept is articulated in terms of a set of general classes and the relations among them. While a
graph could serve as a suitable structure to host such knowledge, with classes represented as nodes
and relations as (hyper-)edges, we opt in this paper to operate with a spanning tree that captures the
most pertinent details for description. Even if for now we manually generate the tree, in future work,
[class(student),
[rel(attribute),attribute(plural)],
[rel(attributive_spec),class(course)]
rel(attribute)</p>
      <p>rel(attributive_spec)
attribute(plural)
class(course)
we intend to explore methods for automatic selection and regulation of the amount of information
loss in this process. As an illustrative example, let us encode the concept of students of a course. Here,
we encode the concepts of student and course using a predicate class/1, while utilizing the rel/1
predicate to denote relevant relations attributed to them. For instance, we may wish to specify that
student is plural and establish a relationship between students and a specific subset associated with
those attending a course. Syntactically, we structure a Prolog nested list [Root,Child_1,. . . ,Child_n],
where children may contain further nested lists. Figure 1 illustrates the encoding (on the left) and the
corresponding tree visualization (on the right).</p>
      <p>The core idea revolves around processing a tree that describes a concept, as depicted above, to
derive an equivalent grammar tree containing a well-formed sentence. We have devised a uniform
rulebased rewriting system to drive this process. During rewriting, trees undergo structural and semantic
modifications. Diferent sets of rules govern distinct stages in the transformation of the input tree.
Although we initially considered using DCGs, we encountered two primary limitations: Firstly, DCG
rules naturally model the structure of a specific tree rather than the rewriting relationships of a subtree
into a new subtree. Although adaptable for tree rewriting, the head of DCG rules does not support a
general tree shape. Secondly, in cases involving multiple alternative outputs of a rule (variants), explicit
control over the selection process is necessary, as opposed to allowing the SLD resolution to explore all
possibilities. Consequently, we opted to model a native tree-to-tree rewriting system instead of adapting
DCGs for this purpose. Notably, our approach eliminates the need to generate rules for determining
the well-formedness of a sentence in a specific language, as a DCG would. Instead, the rewriting rules
are designed to ensure that the resulting trees ultimately conform to the grammar rules. Moreover,
our approach ensures the lack of over-generation and under-generation of sentences w.r.t. the implicit
grammar modeled as the set of rewriting rules.</p>
      <p>Let us explore how tree rewriting is conceptualized (further details in Section 3.1.1). When making
changes to a tree structure, we need to pinpoint specific conditions that trigger these modifications,
typically based on the presence of certain subtrees and their relationships embedded in the larger
structure. We expect a rule to encompass the lowest node that is an ancestor of all relevant subtrees,
including locations that are afected by the rule rewriting (potentially elsewhere in the tree).</p>
      <p>Each rule is responsible for constructing a new subtree that replaces the previous content hanging
from that node. While some cases involve straightforward substitutions of one subtree with another,
more complex scenarios can entail assembling intricate structures by combining existing components,
rearranging their structure, and introducing new elements.</p>
      <p>This level of generality enables the modeling of typical semantic equivalent rewritings as well as
grammatical transformations (e.g., active vs. passive voice, word to pronoun substitution, etc.).</p>
      <p>Concept rewriting into text is governed by tree rewriting rules, implemented through a fixed-point
algorithm. We have identified diferent stages to structure the process. Although the fixed-point
rewriting mechanism is common across all stages, we prefer to tailor rules and introduce barriers to fix-points.
This approach ofers several advantages, including the ability to accommodate language-independent
stages alongside those requiring language-specific rules. This modular setup also contributes to a more
organized system.</p>
      <p>The concept-to-text rewriting process can be broken down into several stages, each addressing
specific objectives. The overall construction of the final tree benefits from the iterative tree rewriting
performed at each stage’s fix-points. Depending on the rules activated at each stage, we achieve diferent
objectives as outlined below:
1. Equivalent Concepts: Concepts represented as relations among classes can be rewritten into
equivalent forms. The output remains a concept-based tree containing more detailed descriptions
or equivalent concepts captured by specific class-dependent properties and ontological relations.
This stage ensures semantic variants of the concept, leading to the furthest but still equivalent
ifnal text.
2. Concept2Structure: This stage transforms the tree of concepts and their relations into a
prototype of grammar structure. It constructs the components of a sentence (noun and verbal subtrees,
as well as complements) and introduces certain structural relationships (e.g., potential information
forwarding across subtrees). Classes and relations retain their ontology descriptions.
3. Structure2Grammar: While maintaining the overall structure, this stage translates each class
and relation into grammar lexemes and/or other simple grammatical forms.
4. Coordination: This stage ensures that subtrees are coordinated, as necessary, to match gender
and number for nouns, verbs, etc.
5. Inflection and Sorting: Responsible for producing the correct inflections for nouns and
adjectives, as well as conjugations for verbs. Additionally, it computes the correct word order for words
within the same phrase through the resolution of language-dependent Constraint Satisfaction
Problem (CSP).
6. Syntax: Applies local rules to consecutive words to ensure syntactic properties are met (e.g.,
contractions, ellipsis, etc.).</p>
      <p>In the following we provide some details about key aspects of implementing this model.</p>
      <sec id="sec-3-1">
        <title>3.1. Implementation</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Tree rewriting</title>
          <p>Our objective is to devise rules flexible enough to handle common language properties, such as subtree
swapping and restructuring. The sequence of transformations outlined in the preceding section has been
realized using Prolog. Here, we provide a brief overview of the main components of this implementation.</p>
          <p>Each transformation in the process takes a tree as input and produces a list of trees as output
(representing possible variants according to a rule). A tree is represented as a Prolog list [RootInfo|Children],
where Children is the list of child trees.</p>
          <p>The phases of the process operate as a pipeline (as discussed earlier), with each phase rewriting the
output of the previous one by applying a set of rewriting rules. These rule sets are unique and tailored
to the specific requirements of each phase. However, all rules are uniformly described using Prolog
clauses that define the predicate rule(Lang,Type,Name,Tree,RewTree), where:
∙ Lang: Specifies the target language of the translation, which remains consistent throughout the
process. Currently, the accepted values for Lang are limited to English and Italian.
∙ Type: Indicates a specific phase of the rewriting process, as described in Section 3. Acceptable
values for Type include the atoms: equiv_concept, concept2structure, structure2grammar,
coordination, inflection and syntax.
∙ Name: Identifies the applied rule, distinguishing a specific rewriting among those possible in the
phase Type.
∙ Tree: Represents the tree to be rewritten by the clause.
∙ RewTree: The rewritten version of Tree.</p>
          <p>Each phase repeatedly applies its rules to the input Tree, until a fix-point is reached (i.e., no more
rules of Type are applicable). This is implemented by the following Prolog clause:
where, the predicate rewrite/6, occurring in line 2, implements a Breadth-First Search (BFS)-based
rewriting of InputTree:
6 rewrite(Lang, Type, InputTree, OutputTree, RuleTree, Changed)
:7 mk_empty_queue(EmptyQ), % EmptyQ is an empty queue
8 enqueue(InputTree-OutputTree-RuleTree, EmptyQ, QofTrees),
9 bfs_rewrite(Lang,Type, QofTrees, ’unchanged’, Changed).
11 bfs_rewrite(_Lang, _Type, QofTrees, Changed, Changed)
:12 is_empty_queue(QofTrees). % no more sub-trees
13 bfs_rewrite(Lang, Type, QofTrees, ChangetTilNow, Changed)
:14 dequeue(Tree-RewTree-RuleTree, QofTrees, QofTrees1),
15 multiple_rewrite_tree(Lang, Type, Tree, NewTree, Applied),
16 ((Applied == ’nop’) -&gt; (TempChanged=ChangetTilNow) ; (TempChanged=’changed’)),
17 (isLeafTree(NewTree) % if it is leaf build a leaf for RuleTree
18 -&gt; (NewTree=RewTree, singletonTree(Applied, RuleTree), QofTrees2=QofTrees1)
19 ; (getChildrenTrees(NewTree, Root, Children), % Gets the sub-trees and
20 length(Children, Num), % their number
21 length(Ls, Num), length(Rs, Num), % Creates two lists of Num fresh variables
22 getChildrenTrees(RewTree, Root, Ls), % Constraints the structure of the
23 getChildrenTrees(RuleTree, Applied, Rs), % terms RewTree and RuleTree
24 enqueueTriples(Children, Ls, Rs, QofTrees1, QofTrees2)) ),
25 bfs_rewrite(Lang,Type, QofTrees2, TempChanged, Changed).</p>
          <p>BFS traversal is implemented in the standard manner, utilizing a queue that stores, for each unvisited
element, a Prolog term of the form Tree-RewTree-RuleTree. Here, Tree represents the subtree to
be visited (and possibly rewritten), while RewTree and RuleTree are initially variables that will be
instantiated during the traversal.</p>
          <p>Specifically, RewTree will be instantiated to the rewritten version of Tree, while RuleTree will be a
term isomorphic in shape as RewTree, describing the applied rule(s) for each node of RewTree.</p>
          <p>It is important to note that the information gathered in RewTree plays a vital role in ensuring the
explainability of the approach. RewTree serves as a description of the justification for each rewrite
performed. This is achieved by recording the Name argument found in the definition of the clause(s) of
rule/5 (as seen earlier) used in the rewriting process. Currently, this information comprises a rule ID,
but richer knowledge can also be easily managed if needed.</p>
          <p>Each dequeued subtree (line 14) undergoes rewriting by multiple_rewrite_tree/5 (line 15), which
iteratively applies rules of Type to Tree until a fix-point is reached. The resulting tree NewTree and
Applied, which gathers the names/justifications of all applied rules, reflect this step. The Prolog atom
’nop’ is used if no rules are applied.</p>
          <p>In line 24, the predicate enqueueTriples/5 enqueues a term Child-L-R for each child tree of NewTree.
Here, L and R are fresh variables (created in line 21 as members of lists with as many variables as the
number of subtrees to be enqueued), to be instantiated in the subsequent part of the BFS visit.</p>
          <p>At the conclusion of the visit, OutputTree will be instantiated to the result of the rewriting phase.
Both OutputTree and the last argument of fixpoint_rewrite/5 (i.e., [Rewriting|Rewritings]) will
have an isomorphic structure (as mentioned, lists representing trees). Therefore, the explanation for
each node in OutputTree can be found in the corresponding node of [Rewriting|Rewritings].</p>
          <p>The final stage (syntax rewriting) is executed by a simplified version of the aforementioned program.
The tree is flattened, and leaves are extracted in order to form a straightforward list of words comprising
the sentence. This list of words is then rewritten until a fixpoint is reached, using a set of rules that
only inspect pairs of consecutive words.</p>
          <p>Here are some additional details about rule/5. Each rule generates a list of trees as alternative
variants. When applying a rule, we must select one variant from this list. We’ve chosen a random
selection strategy that takes into account previous choices. This approach has proven particularly
efective in preventing the repetition of the same structure in diferent parts of the final sentence. We
keep track of the choice history for each rule using simple assertions. In cases where the same rule is
ifred multiple times, we ensure that the last choice is avoided if possible.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Language independent rules</title>
          <p>The first stage, responsible for handling equivalent concepts, is language-independent. While classes and
relations must be named according to a specific language (english in the paper), this naming convention
does not afect the generation of a specific language, as they will be converted later according to
language-dependent rules.</p>
          <p>Now, let us introduce another working example to illustrate some key features contained in the
stages described above. We’ll model the concept of an interval that specifies the use of the year class as
the unit of measure (uom):
&gt; [class(interval),[rel(attribute),attribute(uom(class(year)))],...]
If a common knowledge ontology is accessible, we could utilize the information that
is_a(year,time), implying that the class interval can be further specified as an interval of time.
Moreover, additional semantic knowledge about equivalences could inform the rewriting rules that an
interval of time is equivalent to the class period.</p>
          <p>Implementing rules that trigger whenever common knowledge adds some information is
straightforward. In this case, a sequence of rewritings could be:
&gt; [class(interval),[rel(attributive_spec),attribute(class(time))],...]
&gt; [class(period),...]
where attributive_spec represents the specification attribute provided by the class time.</p>
          <p>Let also discuss some potential equivalences that can be drawn for an interval that deals with a range
of numbers 1 and 2, e.g.:
&gt; [class(interval),[rel(attribute),attribute(range(V1,V2))],...]</p>
          <p>Focusing on the treatment of the class interval, we can propose various alternative versions. These
include: (i) presenting the interval as a simple measure of a range, (ii) using it as a temporal complement
to introduce the interval (e.g., during the interval ...), and (iii) employing a more refined version with
the addition of a relative subordinate (e.g., during the interval that spans ...). Furthermore, the actual
measure itself (the range between two numbers) can be expressed using diferent prepositions (from ...
to, between ..., starting from ... until). It is evident that the combinatorial explosion of such rewritings
gives rise to a rich set of alternative options even at the concept stage.</p>
          <p>Figure 2 illustrates a simplified example demonstrating the application of two rewriting rules described
above. It is noteworthy how the classes are matched and rewritten: from (a) to (b), the class interval is
replaced with the left subtree, introducing the concept of measure; from (b) to (c), the class measure,
denoting a range, is rewritten into a nesting of two complements (source and goal) with measures of
simple numeric quantities. Let us show a simplified snippet of the first rule:
1 rule(_Lang,equiv_concept,equiv_interval,[Root|C],
2 [[Root|C2], ... ] ]):- %%% list of equivalent trees
3 %%% firing condition
4 member([class(interval)|C1],C),
5 member([rel(attribute),attribute(range(V1,V2))],C1)
6 member_non_var([rel(attribute),attribute(uom(class(Uom)))],C1),
7 %%% prepare new tree structure
8 %%% (depending whether isa relation in known)
9 ( common_knowledge_isa(Uom,Isa),!,
10 Int=[class(interval),[rel(attribute),attribute(singular)],
11 [rel(attributive_spec),[class(Isa),singular]]];
12 Int=[class(interval),[rel(attribute),attribute(singular)]]),
singular
(b) Rewriting of concept interval
(a) Input concept</p>
          <p>rel(definite_time)
(c) Rewriting of concept measure with range</p>
          <p>Let us assume a predicate replace/4 that takes the input list, the element to be replaced, the
replacement list (enclosed by an additional list, in case multiple elements need to be inserted), and
returns the output list.</p>
          <p>For the second rule applied in the example, a simplified code snippet could be:
1 rule(_Lang,equiv_class,measure_range, [Root|C],
2 [[Root|C4], ... ]):- %%% list of equivalent trees
3 member([class(measure)|C1],C),
4 member([rel(attribute),attribute(range(N1,N2))],C1),
5 (El=[rel(attribute),attribute(uom(class(U)))],
6 member(El,C1),!,Uom=[El]; %% there is a UoM specified
7 Uom=[]),
8 replace(C,[rel(attribute),attribute(range(N1,N2))],[[]],C2),
replace(C2,El,[[]],C3),
%%% replace subtree at measure class with single measures
replace(C3,[class(measure)|C1],[[rel(source_compl),
[class(measure),[rel(attribute),attribute(number(N1))],
[rel(goal_compl),[class(misura),[rel(attribute),attribute(number(N2))]|Uom]
]|Uom]]],C4),
13</p>
          <p>...</p>
          <p>The second stage (concept to structure) is essentially language independent. It involves creating
internal nodes to house information about their subtrees. Specifically, classes and relations denote
specific grammar subtrees referred to as phrases. These phrases can include nouns, verbs, propositions,
relative phrases, etc., identified through the analysis of each subtree and the presence of various
relations.</p>
          <p>During the rewriting process, concepts are structurally adapted into phrases within the tree. For
instance, a subject may have a verb relation as one child, and an object may be associated with the verb
as its child. This three node branch at the concept level is flattened, resulting in three ordered siblings
(subject, verb, object).</p>
          <p>Internal nodes are enriched with explicit descriptions of their subtrees. This information is encoded
using a predicate info/4, which specifies the type of phrase (using standard linguistic terminology
such as np, vp, pp, rp for noun, verbal, propositional, relative phrases), the subtype (e.g., subject, object),
and the gender and number attributes that apply to the subtree.</p>
          <p>The fourth stage is language independent as it is responsible for linking (via unification) the gender
and number variables contained in the info/4 nodes. Some links are predetermined by default (e.g.,
between subject and verb), but in other cases, a previous rewriting may have required an explicit
coordination (e.g., a relative subordinate phrase must match the gender and number of the associated
noun, as soon as that subtree is create in a next stage). At this stage, all coordinations are enforced
essentially by ensuring unification where necessary.</p>
          <p>The remaining stages are language dependent and will be discussed in the next section.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Language dependent rules</title>
          <p>The overall structure allows us to focus on language-dependent rules for specific stages:
structure2grammar, inflection, and syntax. One advantage of this model is that we can easily plug in
sets of rules without modifying the system.</p>
          <p>The structure2grammar stage is responsible for translating classes and relations into their counterparts
associated with the target language. Typically, an explicit mapping of classes to grammar lexemes is
required. While we are exploring automated tools to streamline this phase, for small domain-specific
applications, manual crafting is an option. At this stage, synonyms can be suggested as variants. Classes
and relations can generate a list of lexemes, and the tree structure can be modified accordingly. In
general, each object is encapsulated by a parent node that provides its type (e.g., noun, adjective, number,
verb, preposition, etc.). This tagging allows a simple reasoning when determining the correct order of
elements within a phrase.</p>
          <p>Let us now provide some details of the inflection stage. Here, the rewriting process governs the
selection of appropriate inflections for nouns, adjectives, and verbs, considering their gender and
number. This is typically accomplished by consulting a dictionary that explicitly associates lexemes
with words. When dealing with verbs, additional considerations come into play to determine the correct
tense. Auxiliary verbs are generated according to the rules of the target language.</p>
          <p>After the inflection process, another crucial step is required: arranging the words in the correct
order within each phrase. Languages have various rules governing the order of words in a phrase,
and attempting to cover all possible cases would be impractical due to the combinatorial explosion of
possibilities.</p>
          <p>To address this challenge, we have devised a set of rules that describe local and partial orderings
among subsets of words within the phrase. By combining these partial orders, we can derive the correct
total order of words. These orderings may depend on word types and specific words themselves. For
example, the fact that in the English language an adjective appears before a noun, is rendered by
imposing a constraint on their relative positions in the noun phrase.</p>
          <p>We represent this network of sorting constraints as a Constraint Satisfaction Problem (CSP). While
solving the CSP itself is straightforward, developing accurate order constraints requires careful tuning.
Thus, the flexibility of a CSP allows to support the updates of the constraints considered.</p>
          <p>The final stage addresses the enforcement of writing rules for adjacent words. Every language
possesses distinct rules governing word combinations, typically controlled through local pattern matching
attr(range(2022,2024))
attr(uom(class(year)))
attr(plural)
class(teaching)
at the word or character level. For instance, this stage handles scenarios where two words are merged
into contractions or a single letter is removed to indicate ellipsis.</p>
          <p>Operating on the leaves of the tree, which at this stage contain the individual words of the sentence,
this stage ensures adherence to language-specific rules. By conducting a Depth-First Search (DFS)
traversal of the tree, the correct order for assembling the text is recovered.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Let us consider a complete sentence that embodies the typical elements of a concept description
generated by a Data2Concept framework. This sentence includes temporal characterization over a
range in time (class years), describing a phenomenon that has increased over time (by a range of
values). It also specifies the unit of measurement (integer units), employs past tense, and allows for the
potential use of a relative subordinate clause to specify the interval of time. This showcases how several
descriptions can be generated by simply changing classes and relations within this prototype concept.</p>
      <p>In the following listing, we provide the concept tree represented in the form of a list. Lines 1–3
describe the interval with attributes for the numerical range and the unit of measure (years). Line 4
defines the subject (class student, plural), and lines 5–6 specify that the students are associated with a
course. Line 7 adds information that the course pertains to computational logics. Another attribute of
the course, described in line 8, specifies that an increase occurred. Its attributes include modulation
through an adverb (intensity of 60% on a linear scale where adverbs are selected according to the
intensity) and the actual measure, providing a numerical range and the unit of measure (class unit).
1 [class(interval),
2 [rel(attribute),attr(range(2022,2024))],
3 [rel(attribute),attr(uom(class(year)))]],
4 [class(student),plural,
5 [rel(attributive_spec),
6 [class(course),singular,
7 [rel(explicative_spec),class(computational_logics)]]],
8 [rel(increase),
9 [rel(attribute),attr(modulate(lightly_remarkably,60,100))],
10 [class(measure),
11 [rel(attribute),attr(range_from_to(87,125))],
12 [rel(attribute),attr(uom(class(unit)))]]]]</p>
      <p>Figure 3 depicts the corresponding tree representation of the list.</p>
      <p>The set of rules we developed for generating basic variants resulted in a satisfactory combinatorial
expansion of rewritings. Specifically, when testing the concept described above, we obtained more than
500 distinct sentences for both English and Italian languages. In Table 1, we present examples of output
text for English rules (1–3) and for Italian (4–6). It is noteworthy how the rewriting of equivalence
and grammar structures is capable of providing remarkable diferences, while preserving semantics
perfectly, thanks to the strict transitivity of the applied equivalences. The resulting text appears natural,
although some additional synonyms could be included to enrich and diversify certain words (e.g.,
student, significantly, class).</p>
      <p>1. During the interval of time that has spanned from 2022 until the year 2024 students of the
class of computational logics have significantly increased from 87 to 125 units.
2. There has been a pronounced growth of students of the class of computational logics starting
from 87 until 125 units in the interval of time between the years 2022 and 2024.
3. From the year 2022 and during the next 2 years students of the class of computational logics
have significantly incremented starting from 87 until 125 units.
4. C’è stato un incremento deciso di studenti del corso di logica computazionale da 87 fino a
125 unità tra gli anni 2022 e 2024.
5. Durante l’intervallo di tempo che è intercorso dal 2022 all’anno 2024 gli studenti del corso di
logica computazionale sono decisamente incrementati a partire da 87 fino a 125 unità.
6. Nel periodo tra gli anni 2022 e 2024 gli studenti del corso di logica computazionale sono
decisamente cresciuti a partire da 87 fino a 125 unità.</p>
      <p>In Figures 4, 5, and 6, we present a sequence of tree rewritings for each stage, beginning with the
input illustrated in Figure 3. The rewritten trees are depicted after each stage has reached its fixed point.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>We introduced a Prolog-based rewriting system that converts concepts, represented as trees, into
natural language. The system’s modularity allows for easy adaptation to various domain-specific
applications and output languages. Moreover, the system adheres to explainable AI standards by
ofering transparency and verifiability. Initial findings demonstrate its efectiveness when combined
with an xAI concept extractor, thereby forming a comprehensive data-to-text explainable pipeline.</p>
      <p>This work opens diferent lines of research for further exploration. Developing a comprehensive rule
set for accurately translating classes into suitable grammar synonyms is a complex task, as the most
appropriate choices depend heavily on context. We aim to devise automatic methods to retrieve such
preferences and stylistic usages.</p>
      <p>While we currently focus on a tree-like set of relations associated to a concept, other approaches
may yield a general graph. Converting this graph into a spanning tree, or alternatively, synthesizing
the information in a controlled manner, could facilitate the creation of guided concept summaries.</p>
      <p>
        The entire pipeline is versatile and applicable across various domains, particularly in scenarios where
reports are generated based on data analytics. We intend to extend this methodology to automated
analysis of ECGs and other medical data, financial data, and more broadly, to produce trustworthy
Business Intelligence (explainable automated reporting). Furthermore, we aim to leverage our findings
to enable real-time and interactive dialogue, facilitating robot-guided sessions for cognitive impairment
rehabilitation.
[2] European Commission, Regulation of the European Parliament and of the Council. Laying down
harmonised rules on Artificial Intelligence (Artificial Intelligence Act), 2021. https://eur-lex.europa.
eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206.
[
        <xref ref-type="bibr" rid="ref4 ref6">3</xref>
        ] A. Dal Palù, A. Dovier, A. Formisano, Towards explainable data-to-text generation, in: A. Dovier,
A. Formisano (Eds.), Proc. of the 38th Italian Conference on Computational Logic, CILC 2023,
Udine, Italy, June 21-23, 2023, volume 3428 of CEUR Workshop Proceedings, CEUR-WS.org, 2023.
[
        <xref ref-type="bibr" rid="ref7">4</xref>
        ] A. Dal Palù, A. Dovier, A. Formisano, An xAI approach for data-to-text processing with ASP, in:
E. Pontelli, et al. (Eds.), Proc. of the 39th International Conference on Logic Programming, ICLP 2023,
Imperial College London, UK, 9th July 2023 - 15th July 2023, volume 385 of Electronic Proceedings
in Theoretical Computer Science, EPTCS, 2023, pp. 353–366. doi:10.4204/EPTCS.385.38.
[5] F. Bertini, A. Dal Palù, A. Formisano, A. Pintus, S. Rainieri, L. Salvarani, Students’ careers and AI:
a decision-making support system for academia, in: Proceedings of the 2023 Italia Intelligenza
Artificiale - Thematic Workshops, (Ital-IA 2023), Pisa, Italy, May 29-30, 2023, volume 3486 of CEUR
Workshop Proceedings, CEUR-WS.org, 2023, pp. 272–277.
[
        <xref ref-type="bibr" rid="ref5">6</xref>
        ] S. Staab, R. Studer, Handbook on ontologies, Springer Science &amp; Business Media, 2013.
[7] D. L. McGuinness, F. Van Harmelen, et al., OWL web ontology language overview, W3C
recommendation 10 (2004) 2004.
[8] R. Speer, J. Chin, C. Havasi, Conceptnet 5.5: An open multilingual graph of general knowledge, in:
      </p>
      <p>
        Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
[
        <xref ref-type="bibr" rid="ref2">9</xref>
        ] M. A. Haendel, C. G. Chute, P. N. Robinson, Classification, ontology, and precision medicine, New
      </p>
      <p>England Journal of Medicine 379 (2018) 1452–1462.
[10] L. M. Schriml, et al., Human disease ontology 2018 update: classification, content and workflow
expansion, Nucleic acids research 47 (2019) D955–D962.
[11] M. Ashburner, et al., Gene ontology: tool for the unification of biology, Nature genetics 25 (2000)
25–29.
[12] S. Köhler, et al., The human phenotype ontology in 2021, Nucleic acids research 49 (2021)</p>
      <p>D1207–D1217.
[13] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck, A. Ireland,
C. J. Mungall, et al., The OBO foundry: coordinated evolution of ontologies to support biomedical
data integration, Nature biotechnology 25 (2007) 1251–1255.
[14] Y. Lin, Y. He, Ontology representation and analysis of vaccine formulation and administration and
their efects on vaccine immune responses, Journal of biomedical semantics 3 (2012) 1–15.
[15] N. F. Noy, M. A. Musen, J. L. Mejino Jr, C. Rosse, Pushing the envelope: challenges in a frame-based
representation of human anatomy, Data &amp; Knowledge Engineering 48 (2004) 335–359.
[16] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., A
survey of large language models, arXiv preprint arXiv:2303.18223 (2023).
[17] Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, X. Jiang, Q. Liu, Aligning large
language models with human: A survey, arXiv preprint arXiv:2307.12966 (2023).
[18] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey of
hallucination in natural language generation, ACM Computing Surveys 55 (2023) 1–38.
[19] A. Dovier, A. Formisano, E. Pontelli, Parallel answer set programming, in: Y. Hamadi, L. Sais
(Eds.), Handbook of Parallel Constraint Reasoning, Springer, 2018, pp. 237–282. doi:10.1007/
978-3-319-63516-3_7.
[20] A. Dovier, A. Formisano, G. Gupta, M. V. Hermenegildo, E. Pontelli, R. Rocha, Parallel
logic programming: A sequel, Theory Pract. Log. Program. 22 (2022) 905–973. doi:10.1017/
S1471068422000059.
[21] F. C. N. Pereira, Grammars and logics of partial information, in: Proc. 4th ICLP, 1987, pp. 989–1013.
[22] F. C. N. Pereira, S. M. Shieber, Prolog and natural-language analysis, Microtome Publishing, 2002.
[23] Y. Matsumoto, H. Tanaka, H. Hirakawa, H. Miyoshi, H. Yasukawa, BUP: a bottom-up parser
embedded in prolog, New Generation Computing 1 (1983) 145–158.
[24] M. den Dikken, The Cambridge handbook of generative syntax, Cambridge University Press, 2013.</p>
      <p>r
m e
u 7 to b p
_n 8 m
sab_ )()91 _ngu
g ,v l
) a
8 n
,))()12v rebun_m p ,)())61v5 ,,)((71vvp1 ja_gd ttiacupoom
(20 g ,(v1 f(on em
,lv ce in an
cpom repp from sep_v _g ilcgo
rce_u tiliac rep f
o p p o
,s tr x
p a ,e
p _ p
( o p
ifon n (fon</p>
      <p>i
)
)
4
)) (1
1 ,v
) ,()v1 ()31 reb
)9 10 ,)v um s
( ( 2 n
,)v ,cv (1 _g
,)))(56v ,,)((pv7v8 itseeu_vp ,i(fnponv eanm lssa
,(cv (fno ittrb _g c
sep in ,ap
e_ (p p
itv fon rep fo
a i
c p
li re
p p
x
e
,
p r
p e
( b
o
f um p
n 0
i 0 n
1 _</p>
      <p>g
te e t
)()2v ))(4 laudo 60 an_gm tsedun
,()1 ,)3v g_m
v (
,,()v0 ,,npvv re laekb fo
(s (o b ra
fno ifn um em
i n r
_ _
g t
h
g
i
l
e
m
a
n
_
g s
t
r
a
_
t h
e t
d w
in ro</p>
      <p>g
b
) r
) e
(4 v_ eb
,v g
)
3
(
v
,
,pv tec
v j
( b
fo o
n _
i o
n
_
g
e
r
e
h
t
_
g
j
d tx
a_g en
)
p
,
)
6 e
,(1v am rae
) _n y
5
1 g
)) ,(v
(2 p
,v (n r
) o e
1 f b
( n
v i m
, u
e n 2
m _
i s
t b
_ a
d _
e g
u
n
i
t
n p
co re g
, n
p p i
p r
( u
o d
) f
) n
2 i
(
v
,
)
1
(
v
,
p j
_ con and
p
j
n
o
c
,
p
p
( r
fo e
n b
i
um 4
_n 02
s 2
b
a
_
g
)
,)s ts
4 la
) 1 _
) ( r
(2 ,)v eb
,)v1 (31 um r
( v n a
v , e
,l p y
p (n
m fo e
co in m
_ a t
e n r
c _ a
r g _
u o
o n
,s
p p
p e
(
fo rp om e t
in rf an_m itnu rao_
g n</p>
      <p>e
,))p ,))p anm itun
,))p1 ,l(11v ,()11v g_
,()v1 pom ,(12v reb
0 c p
(1 e_ (n um 5
,v rc fo n 2
p u n s_ 1
(on ,so i ab
f p _
in (p g
o
f
n
i
p
e
r
p o
t</p>
      <p>t
r r
e a
b _
um on l</p>
      <p>a
_n 87 ) n
,)s sb ,)p tio
,()v3 ag_ ,)s ,)(v9 ja_d tapu
lpom ,)(7v ,(8pv g com
rcec_ repp from sec_p if(non aem c
u e n i
,i(fsppoon tra_no ,ilticaepvx rep fg_ log
p p o
p
(
o
f
n
,)()s3 ,,,()())v4pv5p ,,())secp_v5p ,,,()())svp6v7 iaen_m lssac
,cev f(on itev f(on g
sp in ub in
_ i
e tr
v t
it ,a
a p
ilc p (p ep f
,exp rep ifno rp o
p
p
(
o
f
n
i 0 e t
10 am edn
n u
_ t
g s
)
)
,(v2 ,))s e
,))(0v1 ,,(vpv3 tladou 06 f
( n m o
,(sv f(o g_
fo in
n
i
e
l
e b</p>
      <p>a
am rk
n a
_
g m
e
r
_
t
h
tr ig
a l
_
t
e
d
n
i h
t
w
o
r
g
b
r
) e
,)s v_ e
3 g b
(
v
,
v
,
p t
v c
( e
o j
f b
n o
i _
o
n
_
g
e
r
e
h
t
_
g
m
o
c_ p
l re to
a s
o p ic
g
, n og
p l
(p )
)p fo ,p
,,)f(6 in ,)s ,)(5m lan
,,l)sm ,(fvpon um 78 ,scep_m ,(fvpon jad ittaoup
pom in n iteav in com
c c
e_ li
rcu rep om ,exp rep fo
,so p rf pp p
p (
(p fo
fo in
n
i ) )
,p ,s
,m ,)m
ec (4 ss
,)s ,)p se_p ,npv n lca
,cm ,)m tiv f(o
e 3 u n
p ( b i t
isev_ ,(npv ,titra trea_ teh
t o p d
a f p
licxp in if(on repp
e
,pp fo
( p
fo re s
n p t
) i n
) n e
2 d
,)(v ,)s h tsu
(1 ,m t
,v ,v ow
) p n r
(0 (n g f
,v fo o
(s in
fo d
n e
i j c
d n
a u
o
n
o
r
p</p>
      <p>s
v a
h</p>
      <p>t
j</p>
      <p>x
d e
a</p>
      <p>n</p>
      <p>g
p n
e i
r r
p u</p>
      <p>d</p>
      <p>4
m 2
u 0
n 2
r
a
n e</p>
      <p>y
s
t
n i
n
u
s
c
i
n g
o
l
jd ito
a a
t
u
p
m
o
c
s
s
n l
a
c
s
t
n
n e
d
u
t
s
h
t
n w
o
r
g
d
e
j c
d n
a u
o
n
o
r
p</p>
      <p>s
v a
h</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Arrieta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bennetot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benjamins</surname>
          </string-name>
          , et al.,
          <article-title>Explainable artificial intelligence (xAI): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          ,
          <source>Information fusion 58</source>
          (
          <year>2020</year>
          )
          <fpage>82</fpage>
          -
          <lpage>115</lpage>
          .
          <article-title>e am rae _n y ) g ) 1 4 ( ,v r ) e 0 b 4 ( m v u</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          )) ,)
          <article-title>9 s_n 2</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>(83 ,(3v ab_</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>(3 fo</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>6 n</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>3 i</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>i um 4 _n 02 s 2 b a _ g</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>