1. Introduction

10.3233/FAIA240896

Argumentation-based Explainable Recommender System with ARES

Riccardo Felici

riccardofelici7@gmail.com 2

Emanuele De Angelis

emanuele.deangelis@iasi.cnr.it 0

Alessio Ferrato

alessio.ferrato@uniroma3.it 2

Maurizio Proietti

maurizio.proietti@iasi.cnr.it 0

Giuseppe Sansonetti

Francesca Toni

f.toni@imperial.ac.uk 1 0 CNR-IASI , Rome , Italy 1 Imperial , London , UK 2 Roma Tre University , Rome , Italy

2009

392 199 218

Traditional recommender systems lack transparency, limiting user trust. This paper presents ARgumentationbased Explainable recommender System - ARES, which ofers traceable recommendations with explicit reasoning paths. For explainability ARES relies upon ABALearn, a system that learns Assumption-Based Argumentation (ABA) frameworks from positive and negative examples, given a background knowledge. Argumentative explanations are reformulated into natural language via a Large Language Model, linked in ABA logic to prevent hallucinations. The system uses an iterative learning mechanism, guided by ABALearn, and facilitated by an interactive chatbot, to dynamically adapt user profiles.

eol>Explainable Recommender Systems Assumption-based Argumentation Traceable and Iterative Learning

1. Introduction

Recommender Systems (RSs) are widely used and highly efective tools for guiding users through vast amounts of information and products, ofering personalized suggestions across various domains. However, RSs are often black boxes, as they provide suggestions failing to explain why, limiting the trust of the users who do not understand the reasons behind the suggestion [1]. This work seeks to address this critical transparency gap by developing an Explainable Recommender System (XRS) [2], built upon Assumption-based Argumentation (ABA) frameworks [3, 4].

We propose ARES (ARgumentation-based Explainable recommender System), whose core contribution is using ABA frameworks to provide traceable recommendations, in which every step of the reasoning process, including rules and assumptions leading to a recommendation, is fully explicit and verifiable. This intrinsic traceability supports a complete reconstruction of the entire reasoning process, from the initial background knowledge to the final recommendation, ensuring a very high degree of transparency. The argumentative explanations are then reformulated in natural language using an advanced linguistic model, making it more understandable to the user. Unlike many LLM-based XRSs that might generate plausible but unfaithful explanations, ARES ensures that natural language generation is directly based on the rigorous logic of ABA, significantly reducing the presence of hallucinations [ 5] and guaranteeing explanations aligned with the actual formal reasoning.

ARES features an iterative learning mechanism, which allows the user profiles to be dynamically updated based on preferences and feedback, making suggestions increasingly accurate. The learning process is driven by ABALearn [6, 7], an automated logic-based learning system designed to infer ABA frameworks from positive and negative examples, and background knowledge. The interactive chatbot enables the learnt ABA frameworks to evolve continuously, integrating new user feedback and preferences without requiring a full retraining from scratch. This represents a clear advantage over traditional one-shot learning approaches, which lack such flexibility.

We validate our method in the complex and subjective efild of perfumery, showing that ARES can deliver recommendations that are transparent, personalized, and capable of adapting over time, while having comparable or competitive performances against standard baselines.

Related work In the context of RS, various approaches have been developed to enhance explainability and user trust. Xian et al. [8] introduced CAFE, a CoArse-to-FinE neural symbolic reasoning framework that generates user profiles as coarse sketches of behaviors, guiding a path-finding process to derive reasoning paths for recommendations as fine-grained predictions. This method emphasizes the importance of incorporating symbolic reasoning into RS to improve interpretability. Similarly, Tan et al. [9] proposed CountER, a counterfactual explainable recommendation model that utilizes counterfactual reasoning from causal inference to generate minimal changes on item aspects, creating a counterfactual item where the recommendation decision is reversed. This approach aids in providing clear explanations by highlighting what would need to change for a diferent recommendation outcome.

Several argumentation-based RS have been proposed in the literature to date. Among these, Rago et al. [10] uses quantitative tripolar argumentation frameworks, rather than ABA frameworks, generated automatically from data without any need for knowledge to be manually incorporated, and Rago et al. [11] draw recommendations for a variety of products from their textual reviews, but with quantitative bipolar argumentation rather than ABA, and without integrating user profiles. Furthermore, Briguez et al. [12] use a further form of argumentation (Defeasible Logic Programming) to formulate the conditions under which a movie should be recommended to a given user, but without any learning from examples/background knowledge. To the best of our knowledge, the proposed methodology is the ifrst to use learnt ABA frameworks towards explainable recommendations.

Paper Structure In Section 2 we present background on ABA frameworks and ABALearn. In Section 3 we present our argumentative approach used for the development and implementation of the ARES recommender system. In Section 4 we describe the implementation of iterative learning with ABALearn via a chatbot interface. In Section 5 we present the results of the experimental evaluation, before concluding and discussing future work in Section 6.

2. Background 2.1. Assumption-Based Argumentation Frameworks

An ABA framework [3, 4] is a tuple ⟨ℒ, ℛ, ,− ⟩, where: ∙ ℒ is a set of sentences; ∙ ℛ is a set of inference rules of the form 0 ← 1, . . . , , with ∈ ℒ, for = 0, . . . , ; ∙ ⊆ ℒ is a non-empty set of assumptions; ∙ − : → ℒ is a contrary function, mapping each assumption to its contrary in ℒ.

In a rule 0 ← 1, . . . , , the sentence 0 is the head of the rule and 1, . . . , is the body of the rule. If the body of a rule is empty we call it a fact. In this work, we focus on flat ABA frameworks, where assumption in heads of rules are disallowed. In general, elements of ℒ can be any sentences, but in this paper we restrict to ABA frameworks where ℒ is a finite set of ground atoms. However, in the spirit of logic programming, we use schemata to write sentences, rules, assumptions and contraries, using variables that range over a given universe of constants.

Example 1 (Dream Fragrance). To illustrate the fundamental concepts of ABA, let us consider a simple scenario, aimed at determining the liking of a perfume. We define an ABA framework = ⟨ℒ, ℛ, ,− ⟩ as follows, where and range over a suitable universe to describe perfumes: ∙ ℒ = {(), _ (, ), ( ), ( ), _( )}; ∙ ℛ = {() ← _ (, ), ( ), ( ), _() ←, _ (_ , ) ←, _ (, ) ←, () ←, () ←}; ∙ = {( )}; ∙ ( ) = _( ).

Intuitively, a perfume is liked if it contains a floral ingredient , unless is , and a particular fragrance (_ ) contains and ingredients, both floral.

We often write facts as rules with equalities in the body, e.g., in the earlier example, we may write () ← as () ← = .

Given an ABA framework, an argument for a claim ∈ ℒ is a deduction of constructed from a finite set of assumptions ⊆ by applications of rules in ℛ [4]. constitutes the support for the argument.

The acceptability of arguments depends on their ability to defend from possible “attacks”: an argument 1 attacks an argument 2 if the claim of 1 is the contrary of an assumption in the support of 2. In this paper, the notion of acceptability we focus on (for flat ABA frameworks) is given in terms of stable extensions [3, 4], which determine accepted (and rejected) arguments and their associated claims as follows. A set ∆ ⊆ of arguments is a stable extension if (i) no argument in ∆ attacks any argument in ∆ (i.e. ∆ is conflict-free) and (ii) every argument not in ∆ is attacked by an argument in ∆ (i.e. ∆ “attacks” all arguments it does not contain, thus pre-emptively “defending” itself against attacks). We say that an ABA framework is satisfiable if it admits at least one stable extension, and unsatisfiable otherwise. We also say that a sentence is (credulously) accepted in a stable extension ∆ of an ABA framework if it is the claim of an argument in ∆.

Example 2 (Dream Fragrance, Cont.). Given in Example 1, there is an argument 1 for claim (_ ) with support {()} , and another argument 2 for the same claim with support {()} . 1 is not attacked by any other argument of . In contrast, 2 is attacked by the argument 3 for _() with support the empty set of assumptions. 1 and 3 belong to the unique stable extension of (and thus (_ ) is accepted), and 2 does not.

An ABA argument can be represented and visualized as an argument tree in a “hierarchical” manner. An argument tree is a finite tree whose root node corresponds to the claim, the internal nodes represent the atoms derived by applying rules in the intermediate steps, and the leaves are the assumptions in the support of the argument. The structure of these trees is particularly interesting because it allows us to visualize and trace the reasoning paths that the inferential process has taken to deduce the claim. For an argument tree to be considered well-formed, it must be finite and acyclic [13].

2.2. Learning ABA Frameworks

Learning ABA frameworks aims at generating understandable rules for decision-making, thus helping to promote transparency and interpretability, which are crucial aspects for overcoming the black box nature of many traditional machine learning models. In ABALearn [6, 14, 15] the learning process takes as input: ( 1 ) a background knowledge, in the form of a satisfiable ABA framework = ⟨ℒ, ℛ, , ⟩, ( 2 ) positive examples ℰ + ⊆ ℒ , and ( 3 ) negative examples ℰ − ⊆ ℒ , and derives an ABA framework ′ = ⟨ℒ′, ℛ′, ′, ′⟩, with ℛ ⊆ ℛ ′, ⊆ ′, ⊆ ′, such that (i) ′ admits a stable extension ∆ , (ii) all positive examples are accepted in ∆, and (iii) no negative example is accepted in ∆.

ABALearn learns ABA frameworks automatically by making use of transformation rules, including: ( 1 ) rote learning, which, given a positive example (), introduces a new rule () ← = ; ( 2 ) folding, which, given rules ← , and ← , derives the new rule ← , ; ( 3 ) assumption introduction, which, given rule ← , introduces an assumption , with contrary , and derives the new rule ← , ; and (4) fact subsumption, which deletes any fact of the form () ← (or () ← = ) if there is an accepted argument with claim () in the ABA framework ⟨ℒ, ℛ ∖ {() ←}, , − ⟩.

The ABALearn algorithm follows an iterative strategy based on four steps: 1. Generating initial rules. This step applies rote learning to learn facts from positive examples. 2. Generalising facts. This step selects a fact obtained by rote learning and applies fact subsumption.

If the fact is not subsumed, it applies folding with the goal of generating a new, more general, rule that makes no explicit references to the constants occurring in the ABA framework. 3. Introducing new assumptions. This step applies assumption introduction to any rule obtained by step ( 2 ) if it supports an argument for a negative example. 4. Learning facts for contraries. This step applies rote learning to derive facts for the contraries of the new assumptions introduced by step ( 3 ).

The ABALearn strategy consists in following the above steps according to the pattern: ( 1 ); (2; 3; 4)* . Example 3 (Learning ABA frameworks). We now show how the first two rules of ℛ in Example 1 can be learnt from the facts in ℛ. We assume that the examples are: ℰ + = {(_ )} and ℰ − = {()}. By rote learning, we get1

1. () ← = _ By repeatedly folding, we get:

2. () ← _ (, ), ( ) Now, we have derived an ABA framework with a single stable extension, where the positive example (_ ) is accepted, but also the negative example {()} is accepted. To avoid the acceptance of the negative example, by assumption introduction, we add the assumption () to the body of rule 2 and, by rote learning, we add the fact _() ← = for the contrary of (), thereby getting the set ℛ of rules shown in Example 1. 2

When considering the stable extension semantics, ABALearn is implemented in ASP-ABAlearn (available at https://github.com/ABALearn/aba_asp) using the SWI-Prolog [16] system and the Clingo [17] ASP solver. . The central idea of ASP is to solve a given computational problem by specifying it as a set of rules, called ASP program, whose models, called answer sets, represent solutions to the computational problem. In particular, by translating an ABA framework into an ASP program, ABALearn can take advantage of the mapping between stable extensions and answer sets, thereby reducing some reasoning tasks required by rote learning and fact subsumption to computing answer sets of the ASP encoding. Indeed, by inspecting the answer sets of the ASP encoding of an ABA framework, we can learn facts to (i) accept positive examples (at step 1), and (ii) attack assumptions to reject negative examples (at step 4), as well as remove facts (at step 2) whose corresponding claims already belong to the stable extension.

Although the above mentioned mapping would allow us to recast our method for learning (flat) ABA frameworks as a method for learning ASP programs, we believe that working with the ABA representation gives us several advantages. First of all, it reflects more naturally the argumentative approach to the learning process, by which the learnt rules can be considered to be defeasible, and hence they can be modified when conflicting conclusions or exceptions arise. This aspect is especially significant when ABA frameworks are learnt incrementally [18]. Furthermore, the adoption of this formalism allows the direct use of tools for learning ABA frameworks that have been recently developed [6].

3. ARES

The ARES architecture in Figure 1a is designed to support an iterative, learning-based process. The system has been developed for providing recommendations in the perfume domain, but its structure is very general, and could be easily adapted to support a very wide range of diferent domains. It consists of several interconnected modules that manage the diferent stages of the recommendation process: • User Profiling Module: Manages the collection of user information and its preparation. It captures explicit and implicit preferences, converting them into positive and negative examples (ℰ +, ℰ − ), and background knowledge (BK) rules in the ABALearn syntax. 1We label rules with identifiers for ease of reference. 2No applications of fact subsumption have been necessary in this example.

• ABALearn Module: Represents the core of ARES. It receives as input the examples (ℰ +, ℰ − ) and the background knowledge BK. Its task is to learn and continuously update the ABA framework that constitutes the user preference profile. • Recommender Module: Uses the learnt ABA framework to generate personalized recommendations. It also receives specific queries describing the desired features in the perfume sought. • Explanation Module: Generates natural language explanations, using a Large Language Model, and argument trees for the recommendations produced. Input includes recommended item details and logic evidence derived from the answer set via the inferential process.

User Profiling The initial phase of ARES is devoted to acquiring information from the user and constructing her/his personalized profile, represented as an ABA framework. Explicit preferences, such as liked or disliked perfumes indicated directly by the user, are translated into positive and negative examples that will be used to train the ABA framework. Implicit preferences result from the selection of evocative images related to olfactory scenarios or ingredient groups. These selections are used to generate rules that reflect assumptions about the user’s preferences. These rules, which model the user’s implicit preferences, constitute the ARES’ dynamic component of the background knowledge, evolving with interaction. For example, if the user likes an image associated with a woody scenario, the system generates a rule stating that the user likes scents containing woody ingredients. To ensure the robustness and flexibility of the learnt rules, dummy items may be introduced among the positive examples: the presence of these items prevents the invalidation of the rules in later stages of learning, in the case where these rules are attacked and cover no real examples. Contextually, fragrance domain information, such as ingredients, olfactory scenarios, and designers, are extracted from a dataset and converted to ground facts in the ABA framework, going to constitute the static component of the background knowledge, that is, the set of facts describing the domain.

Example 4 (ABA representation of the user profile). We show below an ABA framework representing the profile of a user who likes sweet ingredients. This preference is supported by the first rule and a positive dummy example (1). The rule is defeasible, as it contains an assumption ℎ_1(, ) (a) Recommender System (b) Chatbot for whose contrary _ℎ_1(, ) ABALearn can learn rules, if needed to capture exceptions. The user’s preferences are completed by a positive example (meltine) and a negative example (velvette). We use the Prolog-like syntax accepted by the ABALearn system with ‘:-’ to indicate ‘← ’. The ABA framework representation also includes declarations of assumptions and their contraries. % Rules from the user profile for dummy example like(A) :- ingredient_of(A,B), sweet(B), alpha_1(A,B). sweet(A) :- A=t1. ingredient_of(A, B) :- A=p1, B=t1. % Rules from the perfume dataset for non-dummy examples oriental(A) :- A=vanilla. ingredient_of(A, B) :- A=meltine, B=vanilla. sweet(A) :- A=sugar. ingredient_of(A, B) :- A=velvette, B=sugar. % Assumptions and contraries assumption(alpha_1(A,B)). contrary(alpha_1(A,B),c_alpha_1(A,B)) :- assumption(alpha_1(A,B)). % Positive examples: [like(p1),like(meltine)] % Negative examples: [like(velvette)]

Examples and background knowledge are given as input to ABALearn, which produces an ABA framework that represents the user profile, and is capable of supporting arguments for or against liking certain perfumes. In particular, the learnt framework accepts all positive example, does not accept any negative example, and is also able to decide the acceptance of unseen items.

Example 5 (Learnt ABA framework). From the ABA framework, positive and negative examples in Example 4, ABALearn generates a rule for the contrary of the assumption ℎ_1(, ). This rule captures an exception to the rule representing a preference for sweet scents, and excludes the ingredient sugar to avoid accepting the negative example (). A rule for like is also generated to cover the positive example (), stating the preference for oriental ingredients. The output is: c_alpha_1(A,B) :- ingredient_of(A,B), B=sugar.

like(A) :- ingredient_of(A,B), oriental(B).

The learning process is inherently dynamic: when new information or feedback is collected from the user, it is used to update the examples or background knowledge, triggering a re-execution of ABALearn. However, as we will see in detail in Section 4, by rendering learnt rules defeasible, ABALearn is able to modify existing rules and add exceptions without having to rerun the training from scratch. This mechanism allows refining the ABA framework and adapting the user profile dynamically over time. Recommender The recommendation generation phase uses the learnt ABA framework to identify and suggest (unseen) perfumes that the user may like. The process begins by capturing the user-specified recommendation goal, such as seasonality or desired scent intensity. A preliminary filtering step is then applied to reduce the pool of candidate perfumes. This filter uses the characteristics desired by the user and compares them with the attributes in the dataset to create an initial ranking. This ranking is based on how consistent each perfume is with the user’s specified preferences and objective scent attributes; it does not consider the ABA framework user profile. For each candidate perfume that has passed preliminary filtering, its attributes are transformed into ground ABA facts. The user’s ABA framework and the candidate perfume facts are translated into an ASP program, and Clingo determines which perfumes specified by the user’s goal belong to the answer set of that program, and hence are claims accepted in an ABA framework’s stable extension. An analysis of the answer set produced by Clingo makes it possible to identify which specific rules of the ABA framework were used to derive the recommended claims from the facts. The number of used rules helps calculate a similarity score, indicating the degree of compatibility between the perfume and the user profile. This score is combined with the preliminary rank to obtain a final rank, computed for item as:

Ranking() = 0.5 · () + 0.5 · () ( 1 )

Where () represents the argumentative contribution, which is quantified by the number of supporting rules derived by Clingo. () incorporates a combination of heuristic parameters relating the relevance of to the request. These parameters include ratings, matching features, genre compatibility and diferences in intensity.

Although the similarity formula presented here has been calibrated specifically for perfumery, the modularity of the ARES architecture allows alternative, more generic similarity metrics to be integrated. This modular approach ensures that the system’s high-level structure remains unchanged, enabling ARES to adapt efectively to diferent scenarios and domains. While the current choice aims at achieving an optimal balance between generality and accuracy for the specific use case, it also paves the way for future extensions to get broader applicability.

Explanation The recommendations generated are not simply suggestions, but are supported by an explicit formal reasoning process, where each step is verifiable. This is made possible by the system’s ability to construct argument trees and reasoning paths, defined as the specific paths within the argument tree, which link facts in the background knowledge (such as user preferences and item features) to the ifnal recommendation claim. This transparency makes it possible to determine which features or rules influenced the decision. To translate these reasoning paths into user-understandable explanations, the system makes use of the Gemini 2.0 Flash model. The LLM receives the extracted reasoning paths as input, along with other formal argumentation evidence, and, guided by a carefully formulated prompt, generates a natural language explanation. This prompt is designed to constrain the LLM to produce consistent descriptions that are faithful to the argumentative process, ensuring that the explanation is ifrmly linked to the actual data and inferences, thus preventing the generation of hallucinations.

This approach difers from some recent trends in Explainable AI, such as Chain of Thought (CoT) prompting [19]. Although CoT prompting aims to explicate the intermediate steps of a model’s reasoning, recent studies [20] have shown that LLMs can generate CoTs that are plausible but not always sound and reflecting the actual process of constructing an answer, thus creating an illusion of reasoning [ 21]. In our system, however, the faithfulness of the reasoning chain is guaranteed: the reasoning paths are not an arbitrary textual construction, but are derived directly from facts and rules of the ABA framework. Thus, the generated explanation is verifiable and corresponds faithfully to the actual inferential process, ofering robust and reliable explainability.

Example 6 (Recommendation, reasoning paths and explanation). Reasoning paths are extracted from a single argument tree generated by the recommendation for the item amberlush. With reference to Examples 4 and 5, Figure 2 shows a path that supports the rule for sweet ingredients and a path for oriental ingredients. The prompt uses the argument tree and the recommended item details to generate a natural language explanation.

4. Iterative Learning

The ARES chatbot architecture in Figure 1b serves as an advanced conversational interface for user interaction. Its main purpose is to facilitate the collection of information and feedback from the user in natural language, overcoming the limitations of traditional structured interfaces. It allows the user to rate and ask questions about perfumes, and provide comments indicating liking or disliking. Information gathered through the chatbot is used to dynamically update the user’s profile and refine future recommendations. This architecture is built by emulating an agentic style [22], where the system adapts its behavior based on the user’s request.

• NLP Parsing: Receives user textual prompts as input and uses Natural Language Processing (NLP) techniques to parse the text, identify expressed communicative intent, and extract relevant entities (item name, class and sentiment). The implementation leverages the capabilities of an LLM to classify user requests [23]. • Operational Routines: Based on the output of the NLP Parsing Module, the message is routed to one of the specialized operational routines. Each routine is designed to perform specific operations in response to particular categories of user requests. • Routine Integration Module: This module represents the point of convergence of the various operational routines. It is responsible for aggregating and standardizing the outputs generated by the individual specialized routines, ensuring a consistent, processable format for integration at later stages of the learning cycle. The new structured information is then passed on to the User Profiling and ABALearn modules.

The interaction through the chatbot enables the iterative learning of the ABA framework representing the user’s profile. Information and feedback acquired through the conversational interface is formalized in the format needed by the ABALearn engine, with feedback management routines playing a key role in this formalization. When the user expresses a positive sentiment toward a specific feature, the system generates a new rule within the user’s ABA framework. Similarly to the case of the user profiling module, dummy items are introduced to serve as explicit positive examples. The handling of negative sentiment, on the other hand, is more complex and is addressed through an integrated approach with ABALearn: information about disliked items is formalized through the creation of a dummy item with the negative feature, which is added to the set of negative examples provided in input to ABALearn. This mechanism allows the learning engine to identify existing rules that enforce liking of the dummy item and automatically generate the necessary contrary to prevent such derivation. Finally, feedback related to specific perfumes results in a direct update of the lists of positive and negative examples in the user profile, allowing the user to change her/his mind about previously expressed preferences.

Once all the collected and formalized information has updated the inputs, the learning process is re-executed. This iterative cycle, unlike static models, allows the ABA framework to evolve dynamically, incorporating new knowledge from more recent user interactions, with the goal of progressively improving the accuracy and relevance of future recommendations. This feature of ARES can be seen as a realisation of contestable learning [24, 18], which gives users the ability to interact with the system and question its decisions or recommendations. In other words, users can question the rules that lead to the acceptance of an undesired claim and, vice versa allow the system to learn new rules that lead to the acceptance of a desired claim. The redress of the system after contestation is obtained by making the previously learnt rules defeasible, and then learning new rules, without re-learning from scratch. Example 7 (Contestation: Ingredient with negative sentiment). After the amberlush recommendation received in Example 6, the user may contest the system by marking amber as an undesirable feature. To redress the ABA framework after this contestation, a new dummy item 2 is created, together with rules specifying that 2 has the amber ingredient and that amber is an oriental type of scent. Then, ABALearn identifies any rule that can be used for entailing (2). This is the previously learnt rule () ← _ (, ), () (see Example 5). Now, by applying the assumption introduction transformation, ABALearn adds a new assumption ℎ_2(), thus rendering the rule defeasible and, by rote learning, also adds a rule for the contrary _ℎ_2() of the assumption, thus introducing an exception to the general rule: % Background knowledge for dummy item p2 ingredient_of(A, B) :- A=p2, B=amber. oriental(B) :- B=amber. % Learnt rules like(A) :- ingredient_of(A,B), oriental(B), alpha_2(A,B).

c_alpha_2(A, B) :- ingredient_of(A, B), B=amber.

5. Evaluation

The ARES evaluation process was conducted to quantify the efectiveness and performance of the proposed model by comparing it with established methodologies. The evaluation methodology employed standard quantitative metrics and a cross-validation protocol based on the Leave-One-Out Cross-Validation technique [25]. This technique involves temporarily removing a single item from each user’s profile for use as a test item, training the system on the reduced profile and evaluating its ability to correctly predict the omitted item. For comparison, several Collaborative Filtering algorithms were used, including KNN (User-based Nearest-Neighbor), SVD (Singular Value Decomposition), NMF (Non-negative Matrix Factorization) and CoClustering.

The evaluation metrics adopted include Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), which measure the accuracy of numerical predictions, and Precision@n, Recall@n, and FMeasure@n, which assess the quality of recommendations in terms of relevance and completeness.

The experimental results in Table 1 show that the ARES approach performs in line with State-of-Art algorithms in terms of MAE and RMSE, indicating good predictive accuracy. However, a lower recall was observed for ARES than for the collaborative algorithms, suggesting that in highly subjective domains such as perfumery, collective preferences may ofer added value over purely content analysis. Overall, our experiments show that our approach does not sacrifice too much performance with respect

KNN to State-of-the-Art systems, while providing transparency and explainability. A comparison with other explainable RS, e.g., CAFE [8] and CountER [9], is left for future work.

Besides explainability, a distinctive feature of ARES is its support to contestability (see Section 4). The efects of contestability are evaluated through an iterative simulation that compared ARES with the KNN algorithm. This simulation allowed us to observe the trend of the mean absolute error over several iterations for a single user, with five simulations. The graph in Figure 3 associated with this test revealed that ARES converges to a lower absolute error, demonstrating remarkable adaptive ability and consistency in the representation of preferences. Alternatively, the KNN model showed less stable behavior, with the absolute error fluctuating along iterations without clear convergence. This result emphasizes how our approach is particularly suitable in scenarios where user preferences evolve over time, thus confirming its validity from a dynamic perspective and its ability to adapt efectively.

6. Conclusions

We explored the application of ABALearn for the development of explainable recommender systems, addressing the problem of opacity inherent in traditional models. We introduced ARES (ARgumentationbased Explainable recommendation System), a system that generates traceable and understandable recommendations based on an explicit reasoning process. Our architecture integrates a formal representation via ABA frameworks, which model the user’s preference profile through rules, assumptions and contraries, and an LLM to translate argumentative explanations into natural language. The linking of explanations to formal derivations in ABA ensures the faithfulness of the inferential process and prevents the generation of hallucinations.

A distinguished feature of ARES is its iterative and adaptive learning mechanism, enabled by a conversational chatbot interface. This dynamic interaction enables the system to continuously update the user profile based on explicit and implicit feedback, allowing the ABA framework to evolve and refine recommendations over time.

Future Work. We envisage several directions for the future development of ARES. A first direction concerns the implementation of a hybrid model that combines the ARES content-based approach with collaborative filtering. Instead of considering only the interaction history, we could exploit the similarity between users, based on the similarity between the ABA frameworks generated for each of them, e.g., by comparing extensions (or answer sets of the corresponding ASP representations). This would enable us to take advantage of the personalisation inherent in content-based user profiles and the ability of collaborative filtering to detect common patterns among users with similar logical preference structures.

A second line of development focuses on presenting logical reasoning as Chain of Thoughts in natural language. Currently, explanations are generated by the LLM from the extracted reasoning paths. We could further explore automatic textual generation of these reasoning paths, making them even more understandable and narrative for the user. The ultimate goal is to provide a chain of reasoning that is not only plausible, but whose soundness and faithfulness to the inference process is ensured by direct derivation from the rules of the ABA framework, unlike approaches generating an illusion of reasoning.

Finally, ARES focuses on the perfumery domain, but its design principles and architecture are general. We plan to implement other instances of the system in diferent domains. It is reasonable to expect that in domains where the features of the items to be recommended are more objective (e.g., technological or financial products), the benefits of ARES may be even more pronounced.

Acknowledgments

We thank support by the Royal Society, UK (IEC\R2\222045). Toni was funded by the ERC (grant agreement No. 101020934) and by J.P. Morgan and the RAEng, UK, under the Research Chairs Fellowships scheme (RCSRF2021\11\45). De Angelis and Proietti were supported by the MUR PRIN 2022 Project DOMAIN funded by the EU-NextGenerationEU (2022TSYYKJ, CUP B53D23013220006, PNRR, M4.C2.1.1), the PNRR MUR project PE0000013-FAIR (CUP B53C22003630006), and the INdAM - GNCS Project Argomentazione Computazionale per apprendimento automatico e modellazione di sistemi intelligenti (CUP E53C24001950001). De Angelis and Proietti are members of the INdAM-GNCS research group.

Declaration on Generative AI

The authors have not employed any Generative AI tools.

[1]

Said , On explaining recommendations with large language models: a review, Frontiers in Big Data 7 ( 2025 ). doi: 10 .3389/fdata. 2024 . 1505284 .

[2]

Zhang ,

Chen , Explainable recommendation: A survey and new perspectives , Foundations and Trends® in Information Retrieval 14 ( 2020 ) 1 - 101 . doi: 10 .1561/1500000066.

[3]

Bondarenko ,

Dung ,

Kowalski ,

Toni , An abstract, argumentation-theoretic approach to default reasoning , Artificial Intelligence 93 ( 1997 ) 63 - 101 .