Explainability for Misinformation in Financial Statements Sushodhan Vaishampayan1 , Akshada Shinde1 , Aditi Pawde1 , Sachin Pawar1 , Manoj Apte1 and Girish Keshav Palshikar1 1 TCS Research, Pune, India Abstract Anomaly Detection techniques find application in various domains but they fail to explain why a particular data point is anomalous from domain perspective. In this paper, we attempt to provide explanation for anomalousness of a point which in our case is a company having misinformation in its financial statements. We propose 3 novel methods and experiment with a publicly available real dataset of financial statements of 4091 companies listed on Indian stock market. We also propose a novel evaluation method for evaluating significance of generated explanations in absence of the ground truth. We show that our method Explanation using Maximal Isolation (EMI) generates precise and statistically significant explanations as compared to baseline methods. Keywords Anomaly Detection, Explainability, Financial Audit, Misinformation 1. Introduction To provide user understandable meaning to the re- sults of AD, attempts are being made to develop methods Anomaly detection (AD) has been considered as a crucial that can explain the working of the AD techniques. The task in various applications. It helps us to identify the area of research which deals with developing explana- scenarios which could lead to possible failure of a system tions for the models1 (mostly complex) is referred to as as well as to obtain novel insights about it. The field eXplainable AI (XAI). However, these methods provide covers various application domains like fraud detection, explanations describing why different AI (in our case intrusion detection, fault detection, failure detection etc. AD) models are producing certain kinds of predictions. Many times, the users of an application are unable to Other research area that serves the purpose of generat- understand why a particular instance could be termed as ing explanation for anomalousness of a point is Outlying anomalous from the domain perspective. For example, in Aspect Mining (OAM). Given a point, the goal of OAM intrusion detection, sudden rise in the CPU and memory techniques is to discover the aspects of the data in which usage could be termed as anomalous. However, only by the point becomes an outlier or interesting. XAI aims at careful analysis of other parameters like network flow, providing explanation in varied form such as weighted traffic congestion etc. the anomaly can be differentiated or non-weighted subset of features, set of rules, pictorial between intrusion or computation expensive process exe- representation and natural language [1]. OAM restricts it- cution. Similarly, in fraudulent Financial Statements (FS) self to produce explanation as a set of features in the form detection, if a company is susceptible of being fraudulent, of a subspace. XAI explains learning of an underlying auditors of FS would prefer to know what fields from the detector and thus explanation can change if the detector company filings are making that company susceptible is changed. OAM gives holistic view for interestingness of the fraud. Such justifications or explanations help to of a point and is detector agnostic. perform further investigations to know if the company In this paper, we attempt to provide explanation for a is really fraudulent or it is just a false alarm which would company that is susceptible of having misinformation in save company’s reputation. Such additional knowledge its financial filings. “Misinformation” in FS is any infor- helps to understand the anomalous nature from the do- mation falsely mentioned e.g. overestimation on assets, main’s point of view. underestimation of liabilities etc. In our previous work [2], we have attempted to show detection of misinforma- CIKM’22: Advances in Intepretable Machine Learning and Artifical tion from the FS. We take it ahead to provide explanation Intelligence (AIMLAI), October 17–21, 2022, Atlanta, Georgia for the reported companies. We illustrate the technique ∗ Corresponding author. by performing the experiments on a real dataset. † These authors contributed equally. Contributions of the paper are as follows: Envelope-Open sushodhan.sv@tcs.com (S. Vaishampayan); sakshada.shinde@tcs.com (A. Shinde); pawde.aditi@tcs.com • 3 novel methods for explanation generation. (A. Pawde); sachin7.pe@tcs.com (S. Pawar); manoj.apte@tcs.com • A novel evaluation method for generated expla- (M. Apte); gk.palshikar@tcs.com (G. K. Palshikar) Orcid 0000-0002-0877-7063 (S. Vaishampayan) nations in the absence of the ground truth. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License 1 Attribution 4.0 International (CC BY 4.0). In our case model is always AD model. XAI is a vast field used for CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) explaining learning of mostly supervised tasks. 2. Related Work a set of optimal number of 2-D focus-plots based on the budget provided by the user in such a way that some The most basic form of explanation for an outlier is the of the anomalies have maximum anomaly score and are subspace in which the point is highly discriminated from visually incriminated in the plot. Authors propose an other points. The outlying aspects [3] are identified either approximation algorithm to solve the NP-Hard problem by selecting top 𝑘 subspaces with the highest measure of generating optimal number of plots. of anomalous behavior, called as Score and Search or None of the above methods including [14] and [15], selecting a small relevant subspace aligned with the tra- perform qualitative evaluation of the explanation in ab- ditional feature selection problem of classification called sence of ground truth. Some of the methods are model de- Feature Selection [4]. Authors of [3] used distance- pendent therefore quality of the generated explanations based outlying degree (OD) and a framework of dynamic depends on accuracy of the model. Our method EiForest subspace search, called HOS-miner to determine the sub- uses iForest as a data structure and extracts other novel space in which a query object is an outlier. A heuristic features from it as against using only the path length as based search framework called OAMiner, developed in scoring mechanism of a subspace as in iPath [6]. Use of [5], searches the subspaces effectively. They rank all sub- only path length limits correctness of the explanations to spaces based on a kernel density estimation of a query the accuracy of the iForest algorithm. Rule set produced object in that subspace. Authors of [6] propose density Z- by our EMI method gives a subspace in 𝑚-dimensional score and iPath as dimensionally unbiased methods of de- space where the anomalous point is most isolated and termining outlying aspects and a beam search algorithm there is no learning involved as against Explainer [12] to tackle the challenge of search through exponentially in which rules are in disjunctive form and decision trees high number of subspaces. OARank - a hybrid frame- are trained using imbalanced data. work developed in [7] leverages the efficiency of feature selection approaches and the effectiveness and versatility Algorithm 1: EMD of score-and-search based methods. In first stage, the input : 𝐷, 𝑉 , 𝑧, 𝑘0 , 𝑐; s.t. 1 ≤ 𝑘0 ≤ |𝑉 |; 𝑐 = 1.0 features are ranked according to the potential to make output : 𝐸𝑠𝑒𝑡 s.t. for each 𝜙 ∈ 𝐸𝑠𝑒𝑡 , 𝜙 ⊆ 𝑉 the point outlying and in second stage score-and-search begin 𝐸𝑠𝑒𝑡 = ∅ is performed on a smaller subset of the top ranked 𝑘 « 𝑚 for 𝑘 = 𝑘0 to 0 do features where 𝑚 is the total number of features. foreach 𝜙 ∈ 2𝑉 and |𝜙| = 𝑘 do foreach 𝑧 ∈ 𝐷 do Local Outliers with Graph Projection (LOGP) [8] de- 𝑑𝑧 = 𝑅𝑉 (𝑧) − 𝑅𝑉 \𝜙 (𝑧); fines a set of objective functions that learn the local dis- if 𝑑𝑧 > 0 and 𝑑𝑧 > 𝜇 + 𝑐 ⋅ 𝜎 then criminating subspace for a point in the transformed form 𝐸𝑠𝑒𝑡 = 𝐸𝑠𝑒𝑡 ∪ 𝜙; of a graph. Outlying score of a point is computed as sta- return 𝐸𝑠𝑒𝑡 tistical distance of a point to its neighboring points in the transformed subspace. Authors of [9] proposed a novel criteria that measures the probability density function (pdf) associated with attribute value of an outlier with Table 1 respect to pdf associated with same attribute values of Summary feature vector for EiForest other instances. Lower the pdf, more likely an instance is Name Description outlier. Anomaly Contribution Explainer (ACE) [10] and 𝑓1 Average Depth of the trees ACE-KL give contributions of each feature as a vector 𝑓2 𝑓3 Average size of the leaf containing 𝑧 |𝑝𝑣 | of real numbers. ACE approximates neighborhood of 𝑓4 𝑓5 Average % drop in the partition after split number of short paths (less than the maximum tree depth) an outlier by generating neighboring points and then 𝑓6 𝑓7 The level at which 𝑣 is present in 𝑝𝑣 on average Average % drop in the partition after split for short paths tries to fit a linear regression model to those neighbors 𝑓8 The level at which 𝑣 is present in short paths on average with a modified loss function. Additional regularizer introduced in ACE-KL model tries to maximize the KL divergence between a uniform distribution and the cal- culated distribution of contributions. Authors of [11] 3. Problem definition propose sequential feature explanations (SFE), obtained We have a 𝑚-dimensional dataset 𝐷 = {𝑥1 , 𝑥2 , .., 𝑥𝑛 } where by solving an optimization problem, wherein features each 𝑥𝑖 ∈ 𝑅𝑚 and 𝑉 = {𝑣1 , 𝑣2 , .., 𝑣𝑚 } denotes feature set. are presented to the users one at a time until a confident Let us consider we have an anomalous instance 𝑧 such judgment can be made about the anomaly. that 𝑧 ∈ 𝐷, which is obtained by some technique un- The Explainer [12] provides expalanation in the form known to us. The objective is to generate an explanation of disjunction of rules learnt by decision trees in random 𝐸 that makes the point anomalous. 𝐸 could be set of forest for a given anomalous point. Given a set of outliers features i.e. 𝐸 ⊆ 𝑉 or set of rules. and corresponding feature set, LOOKOUT [13] produces As mentioned earlier, 𝐷 is dataset of 𝑛 companies leading 𝑧 and containing 𝑣. Refer Table 1 for detailed de- where each company is represented in the form of 18- scription. We construct the set of summary vectors 𝐹𝑣 dimensional feature vector. 𝑧 is an anomalous company for all points for all variables in the dataset. We then that is susceptible of having misinformation in its FS. compute the Mahalanobis distance 𝜋(𝑣) from the mean of 𝐹𝑣 for each 𝑣 ∈ 𝑉. Once we get the distances for all 𝑣 ∈ 𝑉, the top 𝑘 variables are selected as an explanation 4. Proposed methods 𝐸 when sorted in the decreasing order of distances. 4.1. Explanation using Mahalanobis 4.3. Explanation using Maximal Isolation Distance (EMD) (EMI) We sort all the points in 𝐷 in descending order of their Mahalanobis distance from the mean vector of 𝐷. 𝑅𝑉 (𝑥), We propose a method based on Integer Linear Program- defined as Mahalanobis rank, is the rank of the point ming (ILP) that isolates an anomalous point to maximum 𝑥 ∈ 𝐷 in this sorted list. For any proper subset 𝐴 ⊂ 𝑉 possible extent. The explanation 𝐸 generated by EMI is of features, the function 𝑅𝑉 \𝐴 (𝑥) is similarly defined, conjunction between 𝐿 specified number of conditions. except that the Mahalanobis distance for points in 𝐷 These conditions when applied as filters on the entire is computed after removing values of all features in 𝐴 dataset, would minimize the number of points other than from every point in 𝐷. Note that a lower (smaller) rank the anomalous point which satisfy all the 𝐿 conditions. indicates that the point is far from the mean vector in Given set of features 𝑉 and an anomalous point 𝑧 which terms of Mahalanobis distance. is to be explained, the explanation would be in the form Potentially, explanation 𝐸 ⊆ 𝑉 can be any set from 𝐴𝑁 𝐷(𝑣(≤ | ≥)𝑧𝑣 ); 𝑣 ∈ 𝐸 where 𝑧𝑣 is value of 𝑧 for feature power set 2𝑉 . Algorithm EMD produces set of candidate 𝑣 and 𝐸 ⊂ 𝑉 ; |𝐸| = 𝐿. These 𝐿 conditions can be consid- explanations 𝐸𝑠𝑒𝑡 for 𝑧 such that for each set 𝜙 ∈ 𝐸𝑠𝑒𝑡 , rank ered as an explanation for anomalous nature of the point difference is greater than a predefined threshold of 𝜇 +𝑐 ⋅𝜎, 𝑧, because they describe in what way the point 𝑧 is dif- where 𝜇 and 𝜎 are mean and standard deviation of all rank ferent from the rest of the points in the data-set. Table 3 differences; and hence explains why 𝑧 is anomalous. We describes the ILP formulation in detail. Constraints 𝐶3 , restrict size of candidate set 𝜙 to 𝑘0 . If no such subset is 𝐶4 , 𝐶5 , 𝑡ℎand 𝐶6 enforce that y[𝑗] becomes 1 if and only found, the algorithm returns the empty set. if the 𝑗 point breaks at least one condition used in the We compute the belief of an explanation 𝜙 ∈ 𝐸𝑠𝑒𝑡 by explanation. The objective function maximizes the num- using the standard deviation 𝜎 of the difference in 𝑅𝑉 (𝑧) ber of such points. Effectively, it minimizes the number and 𝑅𝑉 \𝜙 (𝑧) for all the instances. We compute the belief of other points which satisfy all the conditions in the 𝑅 (𝑧)−𝑅 (𝑧) explanation along with z which is the anomalous point as 𝐵𝑒𝑙(𝑧, 𝜙) = 𝑉 \𝜙 𝜎 𝑉 . 𝐵𝑒𝑙 is nothing but the number to be explained. of standard deviations the rank difference 𝑅𝑉 \𝜙 (𝑧)−𝑅𝑉 (𝑧) is away from the mean of all the rank differences for 𝑧. In other terms, it is the Mahalanobis distance of the rank 5. Experiments difference for 𝑧 from the mean of all rank differences. Each set 𝜙 and it’s respective belief value is given as 5.1. Dataset an input to Dempster-Shafer evidence combination [16] In this paper, we use the dataset similar to the one used method. Output set with highest belief given by this in [2]. FS and other financial documents such as annual method is considered as valid 𝐸. 4.2. Explanation using iForest (EiForest) Table 2 Variables along with summary statistics iForest [17] recursively partitions the data by randomly Notation Name Mean St. Dev. selecting the features and its values for splitting. The data 𝑣1 Trade Receivables 128.71 713.41 instances which get isolated in earlier splits are consid- 𝑣2 𝑣3 Total Current Assets Total Non-Current Assets 607.54 1004.4 4023.13 7889.83 ered as anomalies. We tried to exploit this randomization 𝑣4 𝑣5 Total Assets Fixed Assets 3477.89 542.43 37621.78 4975.47 concept with the help of iForest. We constructed a forest 𝑣6 𝑣7 Inventories Total Current Liabilities 157.16 509.89 1466.18 3367.49 of 𝑇 trees. Let 𝑃 𝑧 be set of 𝑇 paths that lead to 𝑧. For 𝑣8 𝑣9 Cash And Cash Equivalents Total Non-Current Liabilities 99.46 471.15 1008.52 4309.78 a given instance 𝑧 we found the set of features 𝑉𝑃 ⊆ 𝑉 𝑣10 𝑣11 Total Shareholders Funds Total liabilities 628.39 981.04 5014.18 6869.38 that appeared on at least one path in 𝑃 𝑧 , leading to iso- 𝑣12 𝑣13 Total Operating Revenues Total Revenue 1071.59 1102.55 11585.01 11752.28 lation of 𝑧. For each variable 𝑣 ∈ 𝑉𝑃 we constructed a 𝑣14 𝑣15 Profit/Loss Before Tax Revenue From Operations [Net] 89.85 1049.7 1019.65 11188.61 8-dimensional summary feature vector 𝐹𝑣𝑧 using the paths 𝑣16 𝑣17 Total Expenses Depreciation And Amortisation Expenses 1013.97 35.37 11216.54 313.28 𝑣18 Net CashFlow From Operating Activities 115.06 1414.2 Table 3 Parameter settings: Parameter values for EMD algo- ILP formulation for generating explanations rithm are set as 𝑐 = 1.0 and 𝑘0 = 3. For EiForest, we set Parameters: 𝑇 = 1000 and retain top 5 features (𝑘 = 5). For EMI, first •𝑚: Number of features in the dataset we experiment with 𝐿 = 2. If the point is not sufficiently •𝑛: Number of points in the dataset •z: The anomalous point to be explained isolated we experiment with 𝐿 = 3. For SHAP and LIME •𝐿: Maximum number of features to be included in the explanation we have retained top 5 features having non-negative •𝑀1 : 𝑛 × 𝑚 size matrix representing whether other points have higher values than the anomalous point weight to maintain uniformity in the results. •𝑀1 [𝑗, 𝑖] = 1 only if 𝑖𝑡ℎ feature of 𝑗 𝑡ℎ point is greater than z[𝑖] •𝑀1 [𝑗, 𝑖] = 0 otherwise •𝑀2 : 𝑛 × 𝑚 size matrix representing whether other points have lower values than the anomalous point 5.3. Evaluation using ground truth •𝑀2 [𝑗, 𝑖] = 1 only if 𝑖𝑡ℎ feature of 𝑗 𝑡ℎ point is less than z[𝑖] •𝑀2 [𝑗, 𝑖] = 0 otherwise We have extracted audit reports for 4091 companies as Variables: mentioned in section 5.1. Companies which receive ad- •x1 : 𝑚 length binary array such that x1 [𝑖] = 1 implies that the 𝑖𝑡ℎ feature is included in the explanation as 𝑣𝑖 ≤ 𝑧[𝑖] verse comments from auditors are labeled as anomalous3 . •x2 : 𝑚 length binary array such that x2 [𝑖] = 1 implies that the 𝑖𝑡ℎ feature is included Variables which are mentioned in the auditor comments in the explanation as 𝑣𝑖 ≥ 𝑧[𝑖] for those companies and are also part of the 18 variables, •y: 𝑛 length array such that: •y[𝑗] = 1 only if ∃𝑖 ((𝑀1 [𝑗, 𝑖] = 1) ∧ (x1 [𝑖] = 1)) ∨ ((𝑀2 [𝑗, 𝑖] = 1) ∧ (x2 [𝑖] = 1)) are extracted manually. These extracted variables act as //y[𝑗] is 1 only if 𝑗 𝑡ℎ point breaks at least one of the conditions used in the explanation ground truth or gold standard. Refer table 4 for generated •y[𝑗] = 0 otherwise (y need not be an integer variable.) explanations along with ground truth. Variables that are Objective: part of the ground truth are highlighted. •Maximize ∑𝑗 y[𝑗] //maximize the number of other points which do not satisfy at least one condition used in the explanation To judge the accuracy of the generated explanation, Constraints: 𝑚 we consider precision 𝑃, recall 𝑅 and 𝐹1 measure for each •𝐶1 : ∑𝑖=1 (x1 [𝑖] + x2 [𝑖]) ≤ 𝐿 (The number of variables chosen in the final explanation can be at most L.) explanation. We computed the 𝑃, 𝑅 and 𝐹1 measure for •𝐶2 : x1 [𝑖] + x2 [𝑖] ≤ 1, ∀𝑖 s.t. 1 ≤ 𝑖 ≤ 𝑚 each generated explanation using the ground truth we (A variable should not be repeated in the set of L variables used for the explanation.) •𝐶3 : y[𝑗] ≥ x1 [𝑖] ⋅ 𝑀1 [𝑗, 𝑖], ∀𝑖𝑗 s.t. 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛 extracted manually. Results of this evaluation are pre- (y[𝑗] has to be at least 1 if 𝑀1 [𝑗, 𝑖] is 1 for any feature 𝑖 which is included in the explanation.) sented in table 5. This choice of selecting top 5 features •𝐶4 : y[𝑗] ≥ x2 [𝑖] ⋅ 𝑀2 [𝑗, 𝑖], ∀𝑖𝑗 s.t. 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛 (y[𝑗] has to be at least 1 if 𝑀2 [𝑗, 𝑖] is 1 for any feature 𝑖 which is included in the explanation.) for SHAP, LIME and EiForest affects the precision values. 𝑚 •𝐶5 : y[𝑗] ≤ ∑𝑖=1 (x1 [𝑖] ⋅ 𝑀1 [𝑗, 𝑖] + x2 [𝑖] ⋅ 𝑀2 [𝑗, 𝑖]), ∀𝑗 s.t. 1 ≤ 𝑗 ≤ 𝑛 However, what should be optimal length of the expla- (y[𝑗] should remain 0 for the points which do not contain 1 for any of the selected variables in 𝑀1 [𝑗] and 𝑀2 [𝑗].) •𝐶6 : y[𝑗] ≤ 1, ∀𝑗 s.t. 1 ≤ 𝑗 ≤ 𝑛 nation can be disputable. It can be observed that SHAP (y[𝑗] should be at most 1.) and LIME are able to detect at least 1 variable for most of the companies (8 out of 10 for both SHAP and LIME). EMI has given precision of 0.33 or above for 6 out of results, financial ratios, capital structure, annual reports 10 companies. SHAP and LIME have the highest recall. and audit reports for about 8000 Indian listed companies However, average 𝑃 and 𝐹1 is highest for EMI method. are available2 for 10 years. We web-scrapped the FS Few points that are worth mentioning are as follows: of 4091 companies which were operating in the year A company can be susceptible of having misinforma- 2014 and extracted 18 variables from their balance sheet tion because of multiple reasons. Not all reasons can and income statement. Refer table 2 for their summary be captured in the given set of 18 variables. Also, we statistics (values are in units of Rupees 10 million). have manually extracted variables from audit reports based on our knowledge of the domain. Any domain 5.2. Baseline methods supervision can improve the ground truth. Each method of explanation generation can discover different aspects We compare our methods with SHAP [18] and LIME [19] of misinformation. Hence, considering ensemble of all which are widely used in the literature of explainability results is also possible. for the task of classification and regression. To generate explanation for the task of anomaly detection, we created a labeled dataset of 282 companies. Among which, 49 5.4. Evaluation in the absence of ground companies having ‘qualified audit opinion’ were identi- truth fied as anomalous and marked as class label ‘1’. Other We propose a novel method to evaluate quality of the companies were labeled with class label ‘0’. Then we generated explanations in the absence of ground truth. trained a Random Forest Classifier on the labeled dataset The intuition behind this method is that the anomalous- and generated explanations for the anomalous instances. ness of a company should be significantly dependent on We chose 10 qualified companies as query points and the variables given in the explanation. So a better ex- generated explanations using all the methods. planation would contain the variables which have the 2 3 https://www.moneycontrol.com/ Annotated ground truth data can be made available on request Table 4 Explanations generated by all the methods Sr no. Company Ground truth SHAP LIME EMD EiForest EMI 1 Winsome Diamond {𝑣14 , 𝑣1 } {𝑣11 , 𝑣16 , 𝑣14 , 𝑣5 , 𝑣7 } {𝑣16 , 𝑣5 , 𝑣11 , 𝑣7 , 𝑣14 } {𝑣16 , 𝑣18 , 𝑣6 } {𝑣3 , 𝑣7 , 𝑣9 , 𝑣11 , 𝑣13 } {𝑣5 ≤ 47.27 ⋀ 𝑣14 ≤ −256.33} 2 Ashapura Mine {𝑣14 , 𝑣10 } {𝑣11 , 𝑣4 , 𝑣10 , 𝑣5 , 𝑣16 } {𝑣11 , 𝑣10 , 𝑣16 , 𝑣5 , 𝑣7 } {𝑣1 , 𝑣14 , 𝑣18 } {𝑣11 , 𝑣10 , 𝑣17 , 𝑣6 , 𝑣15 } {𝑣10 ≤ −144.3 ⋀ 𝑣14 ≥ 141.27} 3 Western Ministi {𝑣7 , 𝑣11 , 𝑣14 , 𝑣10 } {𝑣14 , 𝑣4 , 𝑣16 , 𝑣2 , 𝑣3 } {𝑣10 , 𝑣9 , 𝑣12 , 𝑣15 , 𝑣4 } All {𝑣3 , 𝑣14 , 𝑣9 , 𝑣1 , 𝑣18 } {𝑣2 ≤ 0.0 ⋀ 𝑣18 ≥ 115.06} 4 Oudh Sugar Mill {𝑣14 , 𝑣10 } {𝑣14 , 𝑣16 , 𝑣11 , 𝑣5 , 𝑣4 } {𝑣14 , 𝑣7 , 𝑣11 , 𝑣5 , 𝑣16 } {𝑣1 , 𝑣18 , 𝑣8 } {𝑣6 , 𝑣7 , 𝑣9 , 𝑣11 , 𝑣12 } {𝑣2 ≤ 1056.51 ⋀ 𝑣6 ≥ 951.19} 5 Sarda Papers {𝑣17 , 𝑣5 , 𝑣4 , 𝑣14 } {𝑣9 , 𝑣12 , 𝑣4 , 𝑣15 , 𝑣5 } {𝑣12 , 𝑣15 , 𝑣5 , 𝑣4 , 𝑣6 } NA {𝑣8 , 𝑣4 , 𝑣15 , 𝑣14 , 𝑣7 } {𝑣10 ≤ 0.01 ⋀ 𝑣11 ≤ 4.1 ⋀ 𝑣4 ≥ 4.11} 6 Nicco Uco Fin {𝑣11 , 𝑣14 , 𝑣10 } {𝑣11 , 𝑣14 , 𝑣12 , 𝑣16 , 𝑣5 } {𝑣11 , 𝑣10 , 𝑣16 , 𝑣7 , 𝑣14 } {𝑣11 , 𝑣18 , 𝑣6 } {𝑣7 , 𝑣14 , 𝑣13 , 𝑣9 , 𝑣12 } {𝑣10 ≤ −524.1 ⋀ 𝑣11 ≤ 537.41} 7 Atlanta {𝑣10 , 𝑣3 , 𝑣14 } {𝑣11 , 𝑣5 , 𝑣16 , 𝑣4 , 𝑣7 } {𝑣11 , 𝑣16 , 𝑣5 , 𝑣7 , 𝑣4 } {𝑣1 , 𝑣14 , 𝑣8 } {𝑣7 , 𝑣16 , 𝑣6 , 𝑣4 , 𝑣2 } {𝑣13 ≤ 314.28 ⋀ 𝑣12 ≥ 312.1} 8 Samtel Color {𝑣11 , 𝑣6 , 𝑣2 } {𝑣11 , 𝑣14 , 𝑣12 , 𝑣16 , 𝑣10 } {𝑣11 , 𝑣10 , 𝑣16 , 𝑣14 , 𝑣5 } {𝑣1 , 𝑣18 , 𝑣8 } {𝑣7 , 𝑣11 , 𝑣10 , 𝑣14 , 𝑣16 } {𝑣10 ≤ −550.14 ⋀ 𝑣11 ≤ 810.76} 9 Aruna Hotels {𝑣2 , 𝑣7 , 𝑣6 } {𝑣14 , 𝑣16 , 𝑣11 , 𝑣5 , 𝑣12 } {𝑣16 , 𝑣5 , 𝑣14 , 𝑣18 , 𝑣4 } {𝑣10 , 𝑣17 , 𝑣3 } {𝑣6 , 𝑣9 , 𝑣2 , 𝑣17 , 𝑣13 } {𝑣4 ≤ 131.98 ⋀ 𝑣5 ≥ 119.92} 10 CFL Capital {𝑣11 } {𝑣10 , 𝑣11 , 𝑣14 , 𝑣5 , 𝑣12 } {𝑣11 , 𝑣10 , 𝑣14 , 𝑣9 , 𝑣16 } {𝑣11 , 𝑣18 , 𝑣6 } {𝑣7 , 𝑣14 , 𝑣12 , 𝑣13 , 𝑣18 } {𝑣10 ≤ −496.27 ⋀ 𝑣11 ≤ 506.19} Table 5 Precision, Recall and 𝐹1 measure for all methods for 10 companies SHAP LIME EMD EiForest EMI Sr no. Company 𝑃 𝑅 𝐹1 𝑃 𝑅 𝐹1 𝑃 𝑅 𝐹1 𝑃 𝑅 𝐹1 𝑃 𝑅 𝐹1 1 Winsome Diamond 0.20 0.50 0.29 0.20 0.50 0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.50 2 Ashapura Mine 0.20 0.50 0.29 0.20 0.50 0.29 0.33 0.50 0.40 0.20 0.50 0.29 1.00 1.00 1.00 3 Western Ministi 0.20 0.25 0.22 0.20 0.25 0.22 0.22 1.00 0.36 0.20 0.25 0.22 0.00 0.00 0.00 4 Oudh Sugar Mill 0.20 0.50 0.29 0.20 0.50 0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5 Sarda Papers 0.40 0.50 0.44 0.40 0.50 0.44 0.00 0.00 0.00 0.40 0.50 0.44 0.33 0.25 0.29 6 Nicco Uco Fin 0.40 0.67 0.50 0.60 1.00 0.75 0.33 0.33 0.33 0.20 0.33 0.25 1.00 0.67 0.80 7 Atlanta 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.33 0.33 0.00 0.00 0.00 0.00 0.00 0.00 8 Samtel Color 0.20 0.33 0.25 0.20 0.33 0.25 0.00 0.00 0.00 0.20 0.33 0.25 0.50 0.33 0.40 9 Aruna Hotels 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.33 0.33 0.00 0.00 0.00 10 CFL Capital 0.20 1.00 0.33 0.20 1.00 0.33 0.33 1.00 0.50 0.00 0.00 0.00 0.50 1.00 0.67 Avearge 0.20 0.43 0.26 0.22 0.46 0.29 0.16 0.32 0.19 0.15 0.23 0.18 0.38 0.38 0.37 Table 6 value indicates other way round. Zero implies that there Results for evaluation with method A and B is no change in the nature of the point. To determine Sr no. Company SHAP LIME EMD EiForest EMI whether the difference Δ𝐸,𝐴 (𝑧), is statistically significant 1 Winsome Diamond (1,1) (1,1) (0,0) (1,1) (1,0) or not, we use the following two methods. 2 Ashapura Mine (0,0) (0,0) (0,0) (0,0) (0,1) 3 Western Ministi (0,1) (1,1) (0,0) (0,0) (0,0) 4 Oudh Sugar Mill (0,0) (0,0) (0,0) (1,1) (1,1) 5 6 Sarda Papers Nicco Uco Fin (0,0) (0,0) (1,0) (0,1) (0,0) (0,0) (0,0) (0,0) (0,0) (0,1) 5.4.1. Method A: Comparison with “normal” 7 8 Atlanta Samtel Color (0,0) (1,1) (0,0) (1,1) (0,0) (0,0) (0,0) (1,1) (0,0) (1,1) companies 9 Aruna Hotels (0,1) (0,1) (0,0) (0,1) (0,0) 10 CFL Capital (0,1) In this method, we judge the effect of variable pertur- (0,1) (0,0) (0,0) (1,1) Total (2,5) bation on other companies. We randomly choose 30 (4,6) (0,0) (3,4) (4,5) companies 𝐶 = {𝑥|𝑥 ≠ 𝑧} and compute Δ𝐸,𝐴 (𝑥) for all these companies by perturbing variables in 𝐸. Note that, largest effect on the anomaly score of the company. here we are checking for 𝐸 given by some method for For a given point 𝑥 in dataset, we define Δ𝐸,𝐴 (𝑥) as an anomalous company 𝑧, e.g. {𝑣5 , 𝑣14 } for Winsome di- difference in anomaly score of 𝑥 and 𝑥 ′ where 𝑥 ′ is per- amond. So we perturb values of {𝑣5 , 𝑣14 } for these 30 turbed version of 𝑥 and 𝐸 is the explanation. The anomaly companies and obtain the score difference values as set score is obtained using anomaly detection technique 𝐴 𝑆𝐷 𝑁 . Therefore, 𝑆𝐷 𝑁 = {Δ𝐸,𝐴 (𝑥)|𝑥 ∈ 𝐶}; |𝑆𝐷 𝑁 | = 30. The such that higher the score, more anomalous is the point. statistical significance of Δ (𝑧) with respect to 𝑆𝐷 𝑁 is 𝐸,𝐴 From the original point 𝑥, we replace the values of vari- determined using one-sided one sample 𝑡-test where the ables in 𝐸 by their corresponding median values to get null and alternate hypotheses are as follows: 𝑥 ′ . Therefore, Δ𝐸,𝐴 (𝑥) = 𝐴(𝑥) − 𝐴(𝑥 ′ ). For example, for Winsome Diamond if original anomaly score using 𝐻0 : mean of 𝑆𝐷 𝑁 = Δ𝐸,𝐴 (𝑧) anomaly detection technique 𝐴 is 0.8 and score obtained 𝐻1 : mean of 𝑆𝐷 𝑁 < Δ𝐸,𝐴 (𝑧) after perturbing variables 𝑣5 and 𝑣14 (explanation pro- vided by EMI) is 0.6 then Δ{𝑣5 ,𝑣14 },𝐴 (Winsome Diamond) If the p-value is less than significance level 𝛼 = 0.05, the = 0.8 − 0.6 = 0.2. In our experiments we have used auto- null hypothesis is rejected and Δ𝐸,𝐴 (𝑧) is accepted to be encoder based anomaly detector from pyOD package statistically significant and hence 𝐸 is a good explanation [20]. Practically, any anomaly detection technique can for the anomalousness of the selected company. In table be used. Depending on how well 𝐸 explains 𝑧, Δ𝐸,𝐴 (𝑧) 6, we mark 1 as the first value of each tuple wherever ex- can be positive, negative or even zero. Positive value planation obtained is found to be significant with respect indicates that 𝑧 ′ is more ‘normal’ than 𝑧 and negative to this method. 5.4.2. Method B: Comparison with other subsets [4] D. Samariya, J. Ma, S. Aryal, A comprehensive of variables survey on outlying aspect mining methods, arXiv preprint arXiv:2005.02637 (2020). In this method, for the given company 𝑧, we randomly [5] L. Duan, G. Tang, J. Pei, J. Bailey, A. Campbell, choose set of variables 𝑊 of size |𝐸| from power set 2𝑉 C. Tang, Mining outlying aspects on numeric data, 30 times and each time compute Δ𝑊 ,𝐴 (𝑧) e.g. randomly DMKD 29 (2015) 1116–1151. choosing variable set of size 2 such that none of the vari- [6] N. X. Vinh, J. Chan, S. Romano, J. Bailey, C. Leckie, ables in this set is 𝑣5 or 𝑣14 for Winsome Diamond. We K. Ramamohanarao, J. Pei, Discovering outlying as- repeat this process 30 times and obtain score difference pects in large datasets, DMKD 30 (2016) 1520–1555. values 𝑆𝐷 𝑊 as follows: 𝑆𝐷 𝑊 = {Δ𝑊 ,𝐴 (𝑧)|𝑊 ∈ 2𝑉 ), |𝑊 | = [7] N. X. Vinh, J. Chan, J. Bailey, C. Leckie, K. Ramamo- |𝐸|, 𝑊 ∩ 𝐸 = ∅}. Finally, we check the statistical signif- hanarao, J. Pei, Scalable outlying-inlying aspects icance of Δ𝐸,𝐴 (𝑧) w.r.t. 𝑆𝐷 𝑊 as described in Method A discovery via feature ranking, in: PAKDD, Springer, above. In table 6, we mark 1 as the first value of each tuple 2015, pp. 422–434. wherever explanation obtained is found to be significant [8] X. H. Dang, I. Assent, R. T. Ng, A. Zimek, E. Schu- with respect to this method. bert, Discriminative features for identifying and Evaluation results: Table 6 shows that number of com- interpreting outliers, in: International Conference panies for which EMI produces statistically significant on Data Engineering, IEEE, 2014, pp. 88–99. explanations is at par with one of the baselines, though [9] F. Angiulli, F. Fassetti, G. Manco, L. Palopoli, Out- the explanation length is short and explanation is in the lying property detection with numerical attributes, form of conjunction of conditions. DMKD 31 (2017) 134–163. [10] X. Zhang, M. Marwah, I.-t. Lee, M. Arlitt, D. Gold- 6. Conclusions and future work wasser, Ace–an anomaly contribution explainer for cyber-security applications, in: International Explainability has various notions in the literature of ma- Conference on Big Data, IEEE, 2019, pp. 1991–2000. chine learning. In this paper, we aim at providing expla- [11] M. A. Siddiqui, A. Fern, T. G. Dietterich, W.-K. nation for companies that have misinformation in their Wong, Sequential feature explanations for anomaly FS so that auditors can perform further investigations. detection, ACM TKDD 13 (2019) 1–22. We propose 3 novel methods viz., mahalanobis distance [12] M. Kopp, T. Pevnỳ , M. Holeňa, Anomaly expla- based EMD, iForest based EiForest and ILP based EMI nation with random forests, Expert Systems with method. We have extracted 18 financial variables from Applications 149 (2020) 113187. FS of 4091 Indian listed companies. We generated expla- [13] N. Gupta, D. Eswaran, N. Shah, L. Akoglu, C. Falout- nations for companies whose FS had misinformation as sos, Beyond outlier detection: Lookout for pictorial per our knowledge. For illustration purpose, we chose 10 explanation, in: ECML-PKDD, 2018, pp. 122–138. companies and evaluated the quality of generated expla- [14] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, nations. We observe that EMI method generates compar- K.-R. Müller, Evaluating the visualization of what a atively precise and statistically significant explanations. deep neural network has learned, Transactions on EMI method gives output in the form of conjunction of Neural Networks and Learning Systems 28 (2016) conditions and is more desirable. 2660–2673. Going ahead we plan to widen the scope by experi- [15] J. A. Oramas Mogrovejo, K. Wang, T. Tuytelaars, menting with more variables and companies. Finally we Visual explanation by interpretation: Improving vi- aim at capturing the domain knowledge and generating sual feedback capabilities of deep neural networks, explanations in more user friendly format. in: ICLR, openReview, 2019. [16] K. Sentz, S. Ferson, Combination of evidence in dempster-shafer theory (2002). References [17] F. T. Liu, K. M. Ting, Z.-H. Zhou, Isolation forest, in: 8th ICDM, IEEE, 2008, pp. 413–422. [1] V. Y. Tchaghe, G. Smits, O. Pivert, Anomaly expla- [18] H. W. Kuhn, A. W. Tucker, Contributions to the nation: A review, DKE (2021). Theory of Games, 28, Princeton University Press, [2] A. Shinde, S. Vaishampayan, M. Apte, G. K. Pal- 1953. shikar, Unsupervised detection of misinformation [19] M. T. Ribeiro, S. Singh, C. Guestrin, ”why should i in financial statements 35 (2022). trust you?” explaining the predictions of any classi- [3] J. Zhang, M. Lou, T. W. Ling, H. Wang, Hos-miner: fier, in: ACM SIGKDD, 2016, pp. 1135–1144. A system for detecting outlying subspaces of high- [20] Y. Zhao, Z. Nasrullah, Z. Li, Pyod: A python toolbox dimensional data, in: VLDB, 2004, pp. 1265–1268. for scalable outlier detection, JMLR 20 (2019) 1–7.