1. Introduction

model for fair and explainable recom mendation in the loan domain

Giandomenico Cornacchia

giandomenico.cornacchia@poliba.it 0 1 2 3

Fedelucio Narducci

fedelucio.narducci@poliba.it 0 1 2 3

Azzurra Ragone

azzurra.ragone@it.ey.com 0 1 3 0 Environments (ComplexRec) Joint Workshop @ RecSys 2021 1 Fairness , Explainability, Human-centered computing, Conversational systems 2 Politecnico di Bari - Via E. Orabona 4 , Bari (I-70125) , Italy 3 Workshop Proce dings

2021

Recommender systems have been widely used in the Financial Services domain and can play a crucial role in personal loan comparison platforms. However, the use of AI in this domain has brought to light many opportunities as well as new ethical and legal risks. The customers can trust the suggestions of these systems only if the recommendation process is Interpretable, Understandable, and Fair for the end-user. Since products ofered within the banking sector are usually of an intangible nature, customer trust perception is crucial to maintain a long-standing relationship and ensure customer loyalty. To this end, in this paper, we propose a model for generating natural language and counterfactual explanations for a loan recommender system with the aim of providing fairer and more transparent suggestions. Politecnico di Bari - Via E. Orabona 4, Bari (I-70125), Italy Trustworthy AI, Financial Services, Loan recommender systems, development, and deployment to mitigate potential risks”1. choosing the best option among the many financial prodRevolution. It holds the promise of solving some of sociare becoming more and more pervasive, and, most of the time, users often interact with such systems without even knowing that life-changing decisions like mortgage grants, job ofers, patients screenings are in the hand of AI-based systems [1]. Moreover, such AI decisions plications have became key enablers and more deeply embedded in processes, financial services organizations need to cope with AI applications' inherent risks. This is true both from a compliance point of view (regulatory 3rd Edition of Knowledge-aware and Conversational Recommender Commons License Attribution 4.0 International (CC BY 4.0).

cial Intelligence (AI) is the engine of the Fourth Industrial reeling from lockdowns but requires thoughtful design

1. Introduction As stated by the World Economic Forum’s Global Future

CEUR

Workshop Proceedings (CEUR-WS.org) and ethical norms), and because the lack of trust is the most significant barrier to AI adoption and acceptance by users. In fact, AI systems often amplify social and ethical issues such as gender and demographic discrimination [ 2, 3 ], and they lack interpretability and explainability.

As sales activities of financial products require expert knowledge, recommender systems can ofer significant benefits to financial services supporting the client in ucts ofered by diferent banks. However, compared to the subject of conventional recommender systems, their application in financial domains is a challenging task: there is the need to adhere to the regulation, follow specific fairness criteria, and providing, at the same time, an explanation of your decisions (black-box approaches are

In this paper, we focus on the case of loan recommen

modeled as finding the right product of the lender company for the borrower, which, at the same time, satisfies their financial needs and will be likely to be paid back by the borrower.

In the last years, several online platforms for personal loan comparison2 have emerged to help individual borrowers analyze diferent loans proposed by third-party lenders and suggest the best option. These platforms simplify the process of shopping for a personal loan, showing the users all the loans that are pre-approved for, so they can compare ofers and make a conscious choice. In order to recommend the best loan for the user, on one side, these platforms usually ask several questions to profile 2To cite a few: https://www.creditkarma.com/, https://bo inatory, which cannot be allowed in highly regulated may sometimes result arbitrary, inconsistent, or discrim- not allowed). environments such as Financial Services. As these ap- dation. In this domain, the recommendation problem is 1https://www.weforum.org/communities/gfc-on-artificial-intell rrowell.com/, www.nerdwallet.com, www.meilleurtaux.com/, igence-for-humanity https://www.habito.com/, https://www.bankbazaar.com/ the client, like personal information (e.g., address, date of period. This proposal remarks on the importance of monbirth, Tax ID number), basic financial information (e.g., itoring the deployed AI systems based on a scale of risk. rent/mortgage payment, other major bills), requested The risk-based approach splits AI systems in four diferloan amount and ideal term length. On the other side, to ent categories, unacceptable risk, high risk, limited risk, ifll out the list of the best loans, the platforms have to minimal risk depending on the risk of the use case. AI evaluate several lenders, looking at key factors like inter- systems intended to be used to evaluate the creditworest rates, fees, loan amounts, and term lengths ofered, thiness of natural persons or establish their credit score customer service, and how fast you can get your funds. are placed in the high risk categories.

In this paper, we propose an approach to model a per- Furthermore, any application of artificial intelligence sonal loan recommender system that comply with the must be designed with responsibility and compliance to present European regulation (Section 2), guarantee fair- standards required by law. In the financial sector, this is ness criteria (Section 3), provide a meaningful explana- not an easy task to solve. On one side, it is required to tion of the decision of the algorithm (Section 4), and is show how an outcome has been reached and whether it able to provide a user-based explanation. In particular, was fair and unbiased. On the other, not all the rationales Section 4 focuses the attention on defining a general behind a decision can be disclosed to prevent users from model for generating natural language explanation in the gaming the system. aforementioned context of loan recommendations. In our Generally speaking, every time a risk review of an opinion, this explanation model can be easily integrated AI system is performed, it is required to show how an in a conversational recommender system able to interact outcome has been reached and whether it was fair and with the user by exchanging natural language messages. unbiased. This is not a one-time efort and should involve Furthermore, we enhance the power of explanations by the contribution of diferent stakeholders: data scientists, providing also a counterfactual analysis and explanation business people, audit and compliance functions, ethi(Section 5). In this way, we can provide more insightful cists, to name a few. explanations to make the interaction with the client more In the following, we will show how to cope with these eficient, compliant with regulations, and, at the same requirements. time, reinforce customer trust in the system.

3. Fairness 2. Regulation compliance The regulations of financial services do not start with

AI-based systems are increasingly attracting the atten- the recent laws of artificial intelligence. Rather, the latter tion of regulatory agencies and society at large, as they are a derivation of the steps taken by governments on can cause, although unintentionally, harm. Indeed, as ifnancial and social regulations between the 1960s and reported by the Ethics guidelines for trustworthy AI from 1980s. Indeed, governments have addressed discriminathe European Commission’s High-Level Expert Group tion against unprivileged groups as regulatory complion AI: ”The development, deployment, and use of any AI ance requirements since the 1960s [ 5 ], [ 6 ], [ 7 ]. In USA, solution should adhere to some fundamental ethical prin- the Fair Housing Act (FHA) and Equal Credit Opportuciples such as respect for human autonomy, prevention of nity Act (ECOA), which protect consumers by prohibiting harm, fairness, and explainability”[ 4 ]. Moreover, in EU unfair and discriminatory practices, have focused on enthe GDPR sets of the right to explanation: users have suring a quality of service that is independent of sensitive the right to ask for an explanation about an algorithmic characteristics such as gender, race, age, disability, etc., decision made about them. In the UK, the Financial Con- avoiding discrimination against minorities. duct Authority (FCA) requires firms to explain why a These principles can be condensed into the definition more expensive mortgage has been chosen if a cheaper of fairness, where fairness, accordingly to Mehrabi et option is available. The G20 has adopted the OECD AI al. [ 8 ], can be seen as ”the absence of any prejudice or Principles 3 for a trustworthy AI where it is underline favoritism toward an individual or a group based on their that users should not only understand AI outcomes but inherent or acquired characteristics”. Contextualising it also be able to challenge them. in the use of an AI system in financial services, it should

On 21 April 2021, the European Commission presented allocate opportunities, resources, or information fairly, the ”Proposal for a Regulation laying down harmonized thus avoiding social or historical biases. However, this rules on artificial intelligence” 4 a proposal law that could definition of fairness is independent of the technical conenter into force in the second half of 2022 in a transitional cepts that arise when using any classifier, and that is why the definitions of fairness are diferent and various.

Since those norms were not set to prevent discrimination in not-human decision making (as in the case of ML 3https://oecd.ai/ai-principles 4https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX %3A52021PC0206 algorithms), ”Ethics guidelines for a Trustworthy AI” [ 4 ] and ”The White Paper” [ 9 ] were released to give guidelines for ethical and safe use of AI. Some critical keys requirements are ”equity, diversity and not-discrimination” enclosed in the concept of fairness. More recently, with the ”Proposal for a Regulation laying down harmonized rules on artificial intelligence” credit scoring applications, including loan recommender systems, are classified in the high-risk domain. Before deploying any AI system, the Financial Institution has to pass diferent conformity steps, and one of these concerns with Fairness.

In our analysis, we refer to personal loan recommender systems that suggest to each customer a personalized list of potential loan products based on their profile. We use this case study since for personal loan the concept of equal opportunity is crucial, and it lies very often in the hands of ML algorithms with a high risk that they discriminate without the awareness of both the financial institution and the client.

As these automated decision-making systems are increasingly used, they must guarantee these principles of fairness. In the case under consideration, the recommender system that suggests diferent ofers based on the characteristics of the credit requested and the user’s profile must ensure that each ofer has been processed through fair algorithms on the provider side.

Going deeper with this analysis, the concept of fairness in provider-side algorithms of a personal loan recommendation could be linked to one or more of these three statistical criteria [ 10 ]: (i) Independence [ 11 ], (ii) Separation[ 12 ], and (iii) Suficiency [ 3 ]. The (i) Independence guarantees that the fraction of customer classified as good-risks is the same in each sensitive groups. Therefore, if the gender is considered as sensitive, both men and women should have the same percentage of goodrisk classification. The (ii) Separation criterion is related to the concepts of misclassification. Accordingly, the errors in classifying will be the same both in sensitive and non-sensitive groups. Finally, the (iii) Suficiency criterion states that the probability that an individual belonging to the good-risk class is classified as good-risk will be the same for both sensitive groups. In this case, if the algorithm shows a gender bias, for example, a woman that belongs to the good-risk customer could be classified in the bad-risk class.

Once defined the concept of fairness and described the dimensions it is based on, the next question is: how can the customer be sure that the recommended loans characteristics have been generated by fair-provider algorithms? In the next section we introduce another important requirements of the loan recommendation platform, the explanation. The platform and the loan provider, should be able to explain the outcome to the customer guaranteeing that the outcome is achieved under fairness constraint. Nowadays, this is often a step that is left out as AI systems already suggest loans to the customers but without giving in response the rationale behind the decision. However, following a black-box approach could lead to severe reputation damages for the financial institutions, as in the case of Apple and Goldman Sachs [ 13 ].

4. Explainability For many years, research on ML and, more generally,

AI algorithms has been focused on improving accuracy metrics such as precision, recall, etc. Recently, new laws and regulations [14] have introduced the need for those algorithms to show explanation capabilities in particular in a sensitive domain such as the financial one [ 15].

The ML algorithms belong to two main classes: interpretable and uninterpretable. More specifically, the former implement a white-box model design, the latter a black-box one. On this perspective, Sharma et al. [16] distinguish model-agnostic and model-specific explanations.

Model-agnostic methods provide an explanation that is not dependent on the ML model adopted and are generally used for black-box models. A surrogate model is thus implemented with the aim of simulating the behavior of the original algorithm.

Several methods have been proposed to explain blackbox models. In this paper we focus on SHAP [17]. SHAP is inspired by the cooperative game theory based on the Shapley Values. Each feature is considered a player that contributes diferently to the outcome (i.e., the algorithm decision). Considering the original theory, we have to compute all the possible combinations with the other sets of features. This choice is, first of all, impractical but, above all, computationally ineficient. Therefore, SHAP does not compute all the possible combinations between all the features but performs only a random set of combinations for eficiency constraints. SHAP provides a ranked list of the features that contributed the most to the less to the outcome. However, the explanation provided by this method probably is not so clear for a customer who does not have experience with how an algorithm works. For this reason, if we want to improve the user’s trust and, in general, the user experience with the system, we need to make the explanation more understandable. In that direction, we guess that an efective solution could be to transform the output produced by software like SHAP in a natural language sentence. Figure 1 represents our proposed workflow for generating an explanation and a counterfactual explanation in order to recommend also corrective actions to the user. For the sake of simplicity, here we show the pipeline focusing on a single decision taken from the ML algorithm of a given lender. Naturally, the loan recommender will receive this information from all the lender services invoked. Let us suppose that the user asks for a personal loan through consists of a set of couples <feature,score> (e.g., <income, the following message: ”I would like to borrow 16,000€ to 0.8>). buy a car, and I would like to pay back over 24 months”. Let us consider the example in Figure 1: The credit Then the platform will ask to provide personal informa- amount is too high based on the salary and the duration tion such as age, income, etc., to be sent to the lender is too long. In that case the template for the explanation services. Once received the diferent proposals from the is: <feature> <verb> <adverb> <adjective> <motivation> lender platforms, a list is ranked according to one or more followed by a new set of <feature> <verb> <adverb> adjeccriteria (e.g., rate, decision, etc.) and proposed to the user. tive> without motivation. The problem is to properly fill Let us assume that each algorithm respects fairness crite- each slot and compose the whole explanation. ria with regulatory bodies’ labels as proof of compliance In the above mentioned example, the number of feawith that criteria. Each proposal (i.e., accepted or denied) tures taken into account for generating the explanation is provided with a feature-based SHAP explanation that are three: the credit amount, the salary, and the duration shows how the ML algorithm has produced that result. each of which associated to adverbs and/or adjectives Next, those SHAP values are transformed in a natural (e.g., too high, too long, etc.). The number of features language explanation like: e.g., ”The credit amount is too used for generating the explanation can be set as desired. high based on the salary and the duration is too long.”. However, since the explanation has to be as useful as

A further interesting contribution in this direction is possible, too much features can, in some cases, losing provided by a counterfactual analysis obtained by a fea- efectiveness and eficiency. ture perturbation step (see Section 5.1). This explanation In our model, the generation of the natural language shows how to modify the the loan request for getting explanation exploits a set of rewriting rules using the the loan accepted [18]. For example, the system can add: Back-Naur Form (BNF) as described in the following. Reduce the credit amount to 10,000€, shorten the duration Even though these templates and rules can be exploited to 18 months, ..., and the loan request will probably be also in other domains, the terminal symbols (e.g., the accepted. credit amount, the duration, long, short, etc.) are specific

But how can we generate this kind of natural language for a loan application. explanation? In the next section, we propose a templatebased formal model able to transform the SHAP values into a natural language sentence.

<explanation> ::= <sentence> | <explanation> <conjunction> <sentence> <sentence> ::= <feature> <verb> <adverb> <adjective> <sentence> ::= <sentence> <motivation> 5. A model for generating NL <motivation> ::= <motivation> <conjunction> <motiexplanation vation> <motivation> ::= <adverbial phrase> <feature> The model we designed for generating Natural Language <adverbial phrase> ::= ‘based on’ | (etc.) explanations is inspired by Musto et al. [19]. <adverb> ::= ‘too’ | ’so’ | ’few’ | ’almost’ | ’enough’ (etc.)

These rewriting rules can be applied for generating, <action> ::= ’reduce’|’expand’|’shorten’|etc. for example, the explanation The credit amount is too high <feature> ::= ’the credit amount’|’the duration’|etc. based on the salary and the duration is too long. <value> ::= ’10,000€’|’18 months’|

A further problem is the choice of adverbs and adjec- <conjunction> ::= ‘and’ | ’but’ | , |(etc.) tives. For the adverbs, we defined a matching between value intervals and the intensity of the adverb. As an The counterfactual explanation has a small set of rules, example, if the SHAP value of a feature is 0.8 (the high- in fact it includes a feature, the corrective actions, and est interval)5, the corresponding <adverb> will be ’too’ optionally the desirable new feature value. Since the emphasizing how this feature has a strong impact on counterfactual analysis works by perturbing all the feathe loan application decision. Obviously, the associa- tures of a determined instance, the recommended actions tion between the <feature> and the type of <adjective> should impact the minimum set of features that allow to is not arbitrary, but it depends on the type of <feature> change the algorithm decision. is considered. Therefore, for each feature we defined a The action is chosen according to the relation between vocabulary of compatible adjectives. the old and the new feature value. For example, if the old value for the feature duration was 24 and the new value 5.1. Counterfactual explanation after the perturbation is 18, the verb (action) chosen will be reduce. Regarding the values, if the new value is equal to the original one, the respective feature will not be included in the explanation since there is no corrective action to be done, otherwise the new perturbed value will be shown in the explanation.

In the previous subsection, we have described how a loan

recommendation platform can generate the explanation for each decision given by a provider.

To make our explanation more efective, we propose to the user some indications useful for revising her request and getting the loan application accepted. This is obtained through a counterfactual explanation. 6. Conclusion and future research

The counterfactual explanation consists of a set correc- directions tive actions to the characteristics of the requested loan, based on the results of a counterfactual analysis. Provid- This work proposes a model to generate natural language ing a counterfactual explanation is an opportunity for explanation for ML decisions in the context of loan recthe loan provider that results in an additional service to ommendation platforms. In the first part of the paper, enhance customer satisfaction and make the customer we analyzed which fairness metrics can be used for evalaware of his or her chances of getting a loan. This service uating the ML model. Next, for improving the system will result in a Responsible and Trustworthy use of AI transparency, financial platforms must understand the systems towards customers. causality of the learned representations, and explain their

The counterfactual analysis performs a perturbation decisions through visualization tools or natural language. on the feature space of the customer’s loan application. Shapley values could help understand more on what feaThe perturbation will generate a new sample that will be tures influence the outcome, however it is not very huconsidered as a new application. Subsequently, the coun- man friendly. For this reason, a model for generating NL terfactual analysis will detect the new nearest sample to explanations from Shapley values has been proposed. the original one that will be accepted by the ML algo- Another contribution is the definition of a counterfacrithm. The result of this analysis will consist in detecting tual explanation based on the result of a counterfactual the change in the loan’s characteristics of the customer analysis, This results in a set of corrective actions to be and recommending corrective actions. performed by the user.

The approach we adopted for generating the counter- The defined model finds a straightforward application factual explanation is the same described in the previous in a scenario of conversational recommender system. section, namely a set of BNF rewriting rules. The user expresses her request in natural language, the

Following the previous example, a counterfactual ex- platform compares the diferent ofers and provides an planation can be: ”Reduce the credit amount to 10,000€, explanation for each of them. The user can thus ask for shorten the duration to 18 months.”. help on how to modify her request for getting the loan. The BNF template is: Eventually, the platform, thanks to the counterfactual analysis and explanation, can provide a set of actions <counterfactualexplanation>::= <sentence>|<counter- for getting the application accepted. However, the confactualexplanation> <conjunction> <sentence> versational system should preserve from discovering the <sentence>::= <action><feature><value> complete set of decision criteria avoiding adverse action from unfair users.

5Please remember that the SHAP values are between 0 and 1

In the future work, first of all, the whole pipeline and [14] K. Croxson, P. Bracke, C. Jung, Explaining why the conversational environment will be implemented (e.g, computer says ‘no’, FCA 5 (2019) 31. intent recognizer, entity recognizer, sentiment analyzer, [15] N. Bussmann, P. Giudici, D. Marinelli, J. Papenbrock, NL generator, etc.). Then, extensive experimental evalua- Explainable machine learning in credit risk mantions and user studies have to be carried out for assessing agement, Computational Economics 57 (2021). the efectiveness of the model both in terms of the ca- [16] R. Sharma, C. Schommer, N. Vivarelli, Building up pability of generating NL explanations and in terms of explainability in multi-layer perceptrons for credit improved user experience. risk modeling, in: DSAA, IEEE, 2020, pp. 761–762. [17] S. M. Lundberg, S. Lee, A unified approach to interpreting model predictions, in: NIPS, 2017, pp.

References 4765–4774. [18] I. Stepin, J. M. Alonso, A. Catala, M. Pereira-Fariña,

A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence, IEEE Access 9 (2021) 11974–12001. [19] C. Musto, F. Narducci, P. Lops, M. De Gemmis, G. Semeraro, Explod: A framework for explaining recommendations based on the linked open data cloud, in: Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, Association for Computing Machinery, New York, NY, USA, 2016, p. 151–154. URL: https://doi.org/10.1145/2959100. 2959173. doi:1 0 . 1 1 4 5 / 2 9 5 9 1 0 0 . 2 9 5 9 1 7 3 .

[1]

Barocas ,

Hardt ,

Narayanan , Fairness and Machine Learning, fairmlbook .org, 2019 .

[2]

Cohen ,

Z. C.

Lipton ,

Mansour , Eficient candidate screening under multiple tests and implications for fairness , in: FORC , volume 156 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik , 2020 , pp. 1 : 1 - 1 : 20 .

[3]

Chouldechova , Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , Big data 5 ( 2017 ) 153 - 163 .

[4] High-Level Expert Group on AI, Ethics guidelines for trustworthy AI, Report , European Commission, Brussels, 2019 .

[5]

Federal

Reserve Board , The truth in lending act , 1968 .

[6] Congress of the United States , Fair housing act, 1968 .

[7]

Federal

Trade Commission , Equal credit opportunity act , 1974 .

[8]

Mehrabi ,

Morstatter ,

Saxena ,

Lerman ,

Galstyan , A survey on bias and fairness in machine learning , 2019 . a r X i v : 1 9 0 8 . 0 9 6 3 5 .

[9]

White

Paper on Artificial Intelligence: Public consultation towards a European approach for excellence and trust , CONSULTATION RESULTS , European

Commission

, Brussels, 2020 . URL: https://wa yback.archive-it.org/12090/20210726215107/https: //ec.europa. eu/digital-single-market/en/news/whi te-paper-artificial-intelligence-public-consultation -towards-european-approach-excellence.

[10]

Kozodoi ,

Jacob ,

Lessmann , Fairness in credit scoring: Assessment, implementation and profit implications , arXiv preprint arXiv:2103 . 01907 ( 2021 ).

[11]

Dwork ,

Hardt ,

Pitassi ,

Reingold ,

Zemel , Fairness through awareness , in: ITCS , 2012 , pp. 214 - 226 .

[12]

Hardt ,

Price ,

Srebro , Equality of opportunity in supervised learning , in: NIPS , 2016 , pp. 3315 - 3323 .

[13]

R. P.

Bartlett ,

Morse ,

Wallace ,

Stanton , Algorithmic discrimination and input accountability under the civil rights acts , Available at SSRN 3674665 ( 2020 ).