=Paper=
{{Paper
|id=Vol-2960/paper12
|storemode=property
|title=A General Model for Fair and Explainable Recommendation in the Loan Domain (Short paper)
|pdfUrl=https://ceur-ws.org/Vol-2960/paper12.pdf
|volume=Vol-2960
|authors=Giandomenico Cornacchia,Fedelucio Narducci,Azzurra Ragone
|dblpUrl=https://dblp.org/rec/conf/recsys/CornacchiaNR21
}}
==A General Model for Fair and Explainable Recommendation in the Loan Domain (Short paper)==
A general model for fair and explainable recommendation in the loan domain Giandomenico Cornacchia1 , Fedelucio Narducci1 and Azzurra Ragone2 1 Politecnico di Bari – Via E. Orabona 4, Bari (I-70125), Italy 2 EY Business and Technology solution – Via Oberdan 40, Bari (I-70125), Italy Abstract Recommender systems have been widely used in the Financial Services domain and can play a crucial role in personal loan comparison platforms. However, the use of AI in this domain has brought to light many opportunities as well as new ethical and legal risks. The customers can trust the suggestions of these systems only if the recommendation process is Interpretable, Understandable, and Fair for the end-user. Since products offered within the banking sector are usually of an intangible nature, customer trust perception is crucial to maintain a long-standing relationship and ensure customer loyalty. To this end, in this paper, we propose a model for generating natural language and counterfactual explanations for a loan recommender system with the aim of providing fairer and more transparent suggestions. Keywords Politecnico di Bari – Via E. Orabona 4, Bari (I-70125), Italy Trustworthy AI, Financial Services, Loan recommender systems, Fairness, Explainability, Human-centered computing, Conversational systems 1. Introduction and ethical norms), and because the lack of trust is the most significant barrier to AI adoption and acceptance by As stated by the World Economic Forum’s Global Future users. In fact, AI systems often amplify social and ethical Council on Artificial Intelligence for Humanity:”Artifi- issues such as gender and demographic discrimination cial Intelligence (AI) is the engine of the Fourth Industrial [2, 3], and they lack interpretability and explainability. Revolution. It holds the promise of solving some of soci- As sales activities of financial products require expert ety’s most pressing issues, including repowering economies knowledge, recommender systems can offer significant reeling from lockdowns, but requires thoughtful design, benefits to financial services supporting the client in development, and deployment to mitigate potential risks”1 . choosing the best option among the many financial prod- These risks are related to the fact that AI applications ucts offered by different banks. However, compared to are becoming more and more pervasive, and, most of the subject of conventional recommender systems, their the time, users often interact with such systems without application in financial domains is a challenging task: even knowing that life-changing decisions like mortgage there is the need to adhere to the regulation, follow spe- grants, job offers, patients screenings are in the hand cific fairness criteria, and providing, at the same time, an of AI-based systems [1]. Moreover, such AI decisions explanation of your decisions (black-box approaches are may sometimes result arbitrary, inconsistent, or discrim- not allowed). inatory, which cannot be allowed in highly regulated In this paper, we focus on the case of loan recommen- environments such as Financial Services. As these ap- dation. In this domain, the recommendation problem is plications have became key enablers and more deeply modeled as finding the right product of the lender com- embedded in processes, financial services organizations pany for the borrower, which, at the same time, satisfies need to cope with AI applications’ inherent risks. This their financial needs and will be likely to be paid back by is true both from a compliance point of view (regulatory the borrower. In the last years, several online platforms for personal 3rd Edition of Knowledge-aware and Conversational Recommender loan comparison2 have emerged to help individual bor- Systems (KaRS) & 5th Edition of Recommendation in Complex rowers analyze different loans proposed by third-party Environments (ComplexRec) Joint Workshop @ RecSys 2021, lenders and suggest the best option. These platforms sim- September 27–1 October 2021, Amsterdam, Netherlands plify the process of shopping for a personal loan, showing Envelope-Open giandomenico.cornacchia@poliba.it (G. Cornacchia); fedelucio.narducci@poliba.it (F. Narducci); the users all the loans that are pre-approved for, so they azzurra.ragone@it.ey.com (A. Ragone) can compare offers and make a conscious choice. In or- Orcid 0000-0001-5448-9970 (G. Cornacchia); 0000-0002-9255-3256 der to recommend the best loan for the user, on one side, (F. Narducci); 0000-0002-3537-7663 (A. Ragone) these platforms usually ask several questions to profile © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 To cite a few: https://www.creditkarma.com/, https://bo 1 https://www.weforum.org/communities/gfc-on-artificial-intell rrowell.com/, www.nerdwallet.com, www.meilleurtaux.com/, igence-for-humanity https://www.habito.com/, https://www.bankbazaar.com/ the client, like personal information (e.g., address, date of period. This proposal remarks on the importance of mon- birth, Tax ID number), basic financial information (e.g., itoring the deployed AI systems based on a scale of risk. rent/mortgage payment, other major bills), requested The risk-based approach splits AI systems in four differ- loan amount and ideal term length. On the other side, to ent categories, unacceptable risk, high risk, limited risk, fill out the list of the best loans, the platforms have to minimal risk depending on the risk of the use case. AI evaluate several lenders, looking at key factors like inter- systems intended to be used to evaluate the creditwor- est rates, fees, loan amounts, and term lengths offered, thiness of natural persons or establish their credit score customer service, and how fast you can get your funds. are placed in the high risk categories. In this paper, we propose an approach to model a per- Furthermore, any application of artificial intelligence sonal loan recommender system that comply with the must be designed with responsibility and compliance to present European regulation (Section 2), guarantee fair- standards required by law. In the financial sector, this is ness criteria (Section 3), provide a meaningful explana- not an easy task to solve. On one side, it is required to tion of the decision of the algorithm (Section 4), and is show how an outcome has been reached and whether it able to provide a user-based explanation. In particular, was fair and unbiased. On the other, not all the rationales Section 4 focuses the attention on defining a general behind a decision can be disclosed to prevent users from model for generating natural language explanation in the gaming the system. aforementioned context of loan recommendations. In our Generally speaking, every time a risk review of an opinion, this explanation model can be easily integrated AI system is performed, it is required to show how an in a conversational recommender system able to interact outcome has been reached and whether it was fair and with the user by exchanging natural language messages. unbiased. This is not a one-time effort and should involve Furthermore, we enhance the power of explanations by the contribution of different stakeholders: data scientists, providing also a counterfactual analysis and explanation business people, audit and compliance functions, ethi- (Section 5). In this way, we can provide more insightful cists, to name a few. explanations to make the interaction with the client more In the following, we will show how to cope with these efficient, compliant with regulations, and, at the same requirements. time, reinforce customer trust in the system. 3. Fairness 2. Regulation compliance The regulations of financial services do not start with AI-based systems are increasingly attracting the atten- the recent laws of artificial intelligence. Rather, the latter tion of regulatory agencies and society at large, as they are a derivation of the steps taken by governments on can cause, although unintentionally, harm. Indeed, as financial and social regulations between the 1960s and reported by the Ethics guidelines for trustworthy AI from 1980s. Indeed, governments have addressed discrimina- the European Commission’s High-Level Expert Group tion against unprivileged groups as regulatory compli- on AI: ”The development, deployment, and use of any AI ance requirements since the 1960s [5], [6], [7]. In USA, solution should adhere to some fundamental ethical prin- the Fair Housing Act (FHA) and Equal Credit Opportu- ciples such as respect for human autonomy, prevention of nity Act (ECOA), which protect consumers by prohibiting harm, fairness, and explainability”[4]. Moreover, in EU unfair and discriminatory practices, have focused on en- the GDPR sets off the right to explanation: users have suring a quality of service that is independent of sensitive the right to ask for an explanation about an algorithmic characteristics such as gender, race, age, disability, etc., decision made about them. In the UK, the Financial Con- avoiding discrimination against minorities. duct Authority (FCA) requires firms to explain why a These principles can be condensed into the definition more expensive mortgage has been chosen if a cheaper of fairness, where fairness, accordingly to Mehrabi et option is available. The G20 has adopted the OECD AI al. [8], can be seen as ”the absence of any prejudice or Principles 3 for a trustworthy AI where it is underline favoritism toward an individual or a group based on their that users should not only understand AI outcomes but inherent or acquired characteristics”. Contextualising it also be able to challenge them. in the use of an AI system in financial services, it should On 21 April 2021, the European Commission presented allocate opportunities, resources, or information fairly, the ”Proposal for a Regulation laying down harmonized thus avoiding social or historical biases. However, this rules on artificial intelligence” 4 a proposal law that could definition of fairness is independent of the technical con- enter into force in the second half of 2022 in a transitional cepts that arise when using any classifier, and that is why the definitions of fairness are different and various. 3 https://oecd.ai/ai-principles Since those norms were not set to prevent discrimina- 4 https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX tion in not-human decision making (as in the case of ML %3A52021PC0206 algorithms), ”Ethics guidelines for a Trustworthy AI” [4] as AI systems already suggest loans to the customers but and ”The White Paper” [9] were released to give guide- without giving in response the rationale behind the de- lines for ethical and safe use of AI. Some critical keys re- cision. However, following a black-box approach could quirements are ”equity, diversity and not-discrimination” lead to severe reputation damages for the financial in- enclosed in the concept of fairness. More recently, with stitutions, as in the case of Apple and Goldman Sachs the ”Proposal for a Regulation laying down harmonized [13]. rules on artificial intelligence” credit scoring applications, including loan recommender systems, are classified in the high-risk domain. Before deploying any AI system, 4. Explainability the Financial Institution has to pass different conformity For many years, research on ML and, more generally, steps, and one of these concerns with Fairness. AI algorithms has been focused on improving accuracy In our analysis, we refer to personal loan recommender metrics such as precision, recall, etc. Recently, new laws systems that suggest to each customer a personalized list and regulations [14] have introduced the need for those of potential loan products based on their profile. We algorithms to show explanation capabilities in particular use this case study since for personal loan the concept in a sensitive domain such as the financial one [15]. of equal opportunity is crucial, and it lies very often in The ML algorithms belong to two main classes: in- the hands of ML algorithms with a high risk that they terpretable and uninterpretable. More specifically, the discriminate without the awareness of both the financial former implement a white-box model design, the latter a institution and the client. black-box one. On this perspective, Sharma et al. [16] dis- As these automated decision-making systems are in- tinguish model-agnostic and model-specific explanations. creasingly used, they must guarantee these principles Model-agnostic methods provide an explanation that is of fairness. In the case under consideration, the recom- not dependent on the ML model adopted and are gener- mender system that suggests different offers based on ally used for black-box models. A surrogate model is thus the characteristics of the credit requested and the user’s implemented with the aim of simulating the behavior of profile must ensure that each offer has been processed the original algorithm. through fair algorithms on the provider side. Several methods have been proposed to explain black- Going deeper with this analysis, the concept of fair- box models. In this paper we focus on SHAP [17]. SHAP ness in provider-side algorithms of a personal loan rec- is inspired by the cooperative game theory based on the ommendation could be linked to one or more of these Shapley Values. Each feature is considered a player that three statistical criteria [10]: (i) Independence [11], (ii) contributes differently to the outcome (i.e., the algorithm Separation[12], and (iii) Sufficiency [3]. The (i) Indepen- decision). Considering the original theory, we have to dence guarantees that the fraction of customer classified compute all the possible combinations with the other as good-risks is the same in each sensitive groups. There- sets of features. This choice is, first of all, impractical fore, if the gender is considered as sensitive, both men but, above all, computationally inefficient. Therefore, and women should have the same percentage of good- SHAP does not compute all the possible combinations be- risk classification. The (ii) Separation criterion is related tween all the features but performs only a random set of to the concepts of misclassification. Accordingly, the combinations for efficiency constraints. SHAP provides errors in classifying will be the same both in sensitive a ranked list of the features that contributed the most and non-sensitive groups. Finally, the (iii) Sufficiency to the less to the outcome. However, the explanation criterion states that the probability that an individual provided by this method probably is not so clear for a belonging to the good-risk class is classified as good-risk customer who does not have experience with how an will be the same for both sensitive groups. In this case, if algorithm works. For this reason, if we want to improve the algorithm shows a gender bias, for example, a woman the user’s trust and, in general, the user experience with that belongs to the good-risk customer could be classified the system, we need to make the explanation more un- in the bad-risk class. derstandable. In that direction, we guess that an effective Once defined the concept of fairness and described solution could be to transform the output produced by the dimensions it is based on, the next question is: how software like SHAP in a natural language sentence. Fig- can the customer be sure that the recommended loans ure 1 represents our proposed workflow for generating characteristics have been generated by fair-provider algo- an explanation and a counterfactual explanation in order rithms? In the next section we introduce another impor- to recommend also corrective actions to the user. For the tant requirements of the loan recommendation platform, sake of simplicity, here we show the pipeline focusing on the explanation. The platform and the loan provider, a single decision taken from the ML algorithm of a given should be able to explain the outcome to the customer lender. Naturally, the loan recommender will receive this guaranteeing that the outcome is achieved under fairness information from all the lender services invoked. Let us constraint. Nowadays, this is often a step that is left out Figure 1: Workflow for generating explanation and counterfactual explanation for loan application suppose that the user asks for a personal loan through consists of a set of couples(e.g., ). buy a car, and I would like to pay back over 24 months”. Let us consider the example in Figure 1: The credit Then the platform will ask to provide personal informa- amount is too high based on the salary and the duration tion such as age, income, etc., to be sent to the lender is too long. In that case the template for the explanation services. Once received the different proposals from the is: lender platforms, a list is ranked according to one or more followed by a new set of adjec- criteria (e.g., rate, decision, etc.) and proposed to the user. tive> without motivation. The problem is to properly fill Let us assume that each algorithm respects fairness crite- each slot and compose the whole explanation. ria with regulatory bodies’ labels as proof of compliance In the above mentioned example, the number of fea- with that criteria. Each proposal (i.e., accepted or denied) tures taken into account for generating the explanation is provided with a feature-based SHAP explanation that are three: the credit amount, the salary, and the duration shows how the ML algorithm has produced that result. each of which associated to adverbs and/or adjectives Next, those SHAP values are transformed in a natural (e.g., too high, too long, etc.). The number of features language explanation like: e.g., ”The credit amount is too used for generating the explanation can be set as desired. high based on the salary and the duration is too long.”. However, since the explanation has to be as useful as A further interesting contribution in this direction is possible, too much features can, in some cases, losing provided by a counterfactual analysis obtained by a fea- effectiveness and efficiency. ture perturbation step (see Section 5.1). This explanation In our model, the generation of the natural language shows how to modify the the loan request for getting explanation exploits a set of rewriting rules using the the loan accepted [18]. For example, the system can add: Back-Naur Form (BNF) as described in the following. Reduce the credit amount to 10,000€, shorten the duration Even though these templates and rules can be exploited to 18 months, ..., and the loan request will probably be also in other domains, the terminal symbols (e.g., the accepted. credit amount, the duration, long, short, etc.) are specific But how can we generate this kind of natural language for a loan application. explanation? In the next section, we propose a template- based formal model able to transform the SHAP values ::= | ::= ::= 5. A model for generating NL ::= ::= The model we designed for generating Natural Language ::= ‘based on’ | (etc.) explanations is inspired by Musto et al. [19]. ::= ‘too’ | ’so’ | ’few’ | ’almost’ | ’enough’ (etc.) The principal insight is that our natural language ex- ::= ‘high’ | ’long’ | ’short’ | ’little’ | (etc.) planation can be generated by exploiting a template com- ::= ‘and’ | ’but’ | , |(etc.) posed of some slots that can be filled with features, ad- ::= ‘the credit amount’ | ’the duration’ | ’the verbs, and adjectives according to the the output pro- salary’ | (etc.) duced by SHAP. We remember that the SHAP output ::= ‘is’ | ’are’ | ’has’ | ’have’ | ’is not’| (etc.) These rewriting rules can be applied for generating, ::= ’reduce’|’expand’|’shorten’|etc. for example, the explanation The credit amount is too high ::= ’the credit amount’|’the duration’|etc. based on the salary and the duration is too long. ::= ’10,000€’|’18 months’| A further problem is the choice of adverbs and adjec- ::= ‘and’ | ’but’ | , |(etc.) tives. For the adverbs, we defined a matching between value intervals and the intensity of the adverb. As an The counterfactual explanation has a small set of rules, example, if the SHAP value of a feature is 0.8 (the high- in fact it includes a feature, the corrective actions, and est interval)5 , the corresponding will be ’too’ optionally the desirable new feature value. Since the emphasizing how this feature has a strong impact on counterfactual analysis works by perturbing all the fea- the loan application decision. Obviously, the associa- tures of a determined instance, the recommended actions tion between the and the type of should impact the minimum set of features that allow to is not arbitrary, but it depends on the type of change the algorithm decision. is considered. Therefore, for each feature we defined a The action is chosen according to the relation between vocabulary of compatible adjectives. the old and the new feature value. For example, if the old value for the feature duration was 24 and the new value 5.1. Counterfactual explanation after the perturbation is 18, the verb (action) chosen will be reduce. Regarding the values, if the new value is equal In the previous subsection, we have described how a loan to the original one, the respective feature will not be recommendation platform can generate the explanation included in the explanation since there is no corrective for each decision given by a provider. action to be done, otherwise the new perturbed value To make our explanation more effective, we propose will be shown in the explanation. to the user some indications useful for revising her re- quest and getting the loan application accepted. This is obtained through a counterfactual explanation. 6. Conclusion and future research The counterfactual explanation consists of a set correc- directions tive actions to the characteristics of the requested loan, based on the results of a counterfactual analysis. Provid- This work proposes a model to generate natural language ing a counterfactual explanation is an opportunity for explanation for ML decisions in the context of loan rec- the loan provider that results in an additional service to ommendation platforms. In the first part of the paper, enhance customer satisfaction and make the customer we analyzed which fairness metrics can be used for eval- aware of his or her chances of getting a loan. This service uating the ML model. Next, for improving the system will result in a Responsible and Trustworthy use of AI transparency, financial platforms must understand the systems towards customers. causality of the learned representations, and explain their The counterfactual analysis performs a perturbation decisions through visualization tools or natural language. on the feature space of the customer’s loan application. Shapley values could help understand more on what fea- The perturbation will generate a new sample that will be tures influence the outcome, however it is not very hu- considered as a new application. Subsequently, the coun- man friendly. For this reason, a model for generating NL terfactual analysis will detect the new nearest sample to explanations from Shapley values has been proposed. the original one that will be accepted by the ML algo- Another contribution is the definition of a counterfac- rithm. The result of this analysis will consist in detecting tual explanation based on the result of a counterfactual the change in the loan’s characteristics of the customer analysis, This results in a set of corrective actions to be and recommending corrective actions. performed by the user. The approach we adopted for generating the counter- The defined model finds a straightforward application factual explanation is the same described in the previous in a scenario of conversational recommender system. section, namely a set of BNF rewriting rules. The user expresses her request in natural language, the Following the previous example, a counterfactual ex-platform compares the different offers and provides an planation can be: ”Reduce the credit amount to 10,000€,explanation for each of them. The user can thus ask for shorten the duration to 18 months.”. help on how to modify her request for getting the loan. The BNF template is: Eventually, the platform, thanks to the counterfactual analysis and explanation, can provide a set of actions ::= | versational system should preserve from discovering the ::= complete set of decision criteria avoiding adverse action from unfair users. 5 Please remember that the SHAP values are between 0 and 1 In the future work, first of all, the whole pipeline and [14] K. Croxson, P. Bracke, C. Jung, Explaining why the conversational environment will be implemented (e.g, computer says ‘no’, FCA 5 (2019) 31. intent recognizer, entity recognizer, sentiment analyzer, [15] N. Bussmann, P. Giudici, D. Marinelli, J. Papenbrock, NL generator, etc.). Then, extensive experimental evalua- Explainable machine learning in credit risk man- tions and user studies have to be carried out for assessing agement, Computational Economics 57 (2021). the effectiveness of the model both in terms of the ca- [16] R. Sharma, C. Schommer, N. Vivarelli, Building up pability of generating NL explanations and in terms of explainability in multi-layer perceptrons for credit improved user experience. risk modeling, in: DSAA, IEEE, 2020, pp. 761–762. [17] S. M. Lundberg, S. Lee, A unified approach to in- terpreting model predictions, in: NIPS, 2017, pp. References 4765–4774. [18] I. Stepin, J. M. Alonso, A. Catala, M. Pereira-Fariña, [1] S. Barocas, M. Hardt, A. Narayanan, Fairness and A survey of contrastive and counterfactual explana- Machine Learning, fairmlbook.org, 2019. tion generation methods for explainable artificial [2] L. Cohen, Z. C. Lipton, Y. Mansour, Efficient can- intelligence, IEEE Access 9 (2021) 11974–12001. didate screening under multiple tests and implica- [19] C. Musto, F. Narducci, P. Lops, M. De Gemmis, G. Se- tions for fairness, in: FORC, volume 156 of LIPIcs, meraro, Explod: A framework for explaining rec- Schloss Dagstuhl - Leibniz-Zentrum für Informatik, ommendations based on the linked open data cloud, 2020, pp. 1:1–1:20. in: Proceedings of the 10th ACM Conference on [3] A. Chouldechova, Fair prediction with disparate Recommender Systems, RecSys ’16, Association for impact: A study of bias in recidivism prediction Computing Machinery, New York, NY, USA, 2016, instruments, Big data 5 (2017) 153–163. p. 151–154. URL: https://doi.org/10.1145/2959100. [4] High-Level Expert Group on AI, Ethics guidelines 2959173. doi:1 0 . 1 1 4 5 / 2 9 5 9 1 0 0 . 2 9 5 9 1 7 3 . for trustworthy AI, Report, European Commission, Brussels, 2019. [5] Federal Reserve Board, The truth in lending act, 1968. [6] Congress of the United States, Fair housing act, 1968. [7] Federal Trade Commission, Equal credit opportu- nity act, 1974. [8] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and fairness in ma- chine learning, 2019. a r X i v : 1 9 0 8 . 0 9 6 3 5 . [9] White Paper on Artificial Intelligence: Public con- sultation towards a European approach for excel- lence and trust, CONSULTATION RESULTS, Euro- pean Commission, Brussels, 2020. URL: https://wa yback.archive-it.org/12090/20210726215107/https: //ec.europa.eu/digital-single-market/en/news/whi te-paper-artificial-intelligence-public-consultation -towards-european-approach-excellence. [10] N. Kozodoi, J. Jacob, S. Lessmann, Fairness in credit scoring: Assessment, implementation and profit im- plications, arXiv preprint arXiv:2103.01907 (2021). [11] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, Fairness through awareness, in: ITCS, 2012, pp. 214–226. [12] M. Hardt, E. Price, N. Srebro, Equality of oppor- tunity in supervised learning, in: NIPS, 2016, pp. 3315–3323. [13] R. P. Bartlett, A. Morse, N. Wallace, R. Stanton, Algo- rithmic discrimination and input accountability un- der the civil rights acts, Available at SSRN 3674665 (2020).