Ontology-Driven Method for Ranking Unexpected Rules

Ontology-Driven Method for Ranking Unexpected Rules MohamedSaidHamani saidhamani@hotmail.com Mohamed Boudiaf-M'sila University

Algeria

RamdaneMaamri rmaamri@yahoo.fr Mentouri-Constantine University

Algeria

Ontology-Driven Method for Ranking Unexpected Rules 37AA406104DC269EBDA3256B42B36348 GROBID - A machine learning software for extracting information from scholarly documents data mining ontology unexpectedness association rules domain knowledge subjective measures semantic distance

Several rule discovery algorithms have the disadvantage to discover too much patterns sometimes obvious, useless or not very interesting to the user. In this paper we propose a new approach for patterns ranking according to their unexpectedness using semantic distance calculated based on a prior background knowledge represented by domain ontology organized as DAG (Directed Acyclic Graph) hierarchy.

Introduction

Knowledge discovery in databases (data mining) has been defined in [6] as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns from data. Association rule algorithms [1] are rule-discovery methods that discover patterns in the form of IF-THEN rules. It was noticed that most algorithm of data mining generates a large number of rules who are valid but obvious or not very interesting to the user [23,22,30,13]. The presence of the huge number of rules makes it difficult for the user to identify those that are of interest. To address this issue most approaches on knowledge discovery use objective measures of interestingness, such as confidence and support [1], for the evaluation of the discovered rules. These objective measures capture the statistical strength of a pattern. The interestingness of a rule is essentially subjective [23,30,13,11]. Subjective measures of interestingness, such as unexpectedness [16,31,4], assume that the interestingness of a pattern depends on the decision-maker and does not solely depend on the statistical strength of the pattern. Although objective measures are useful, they are insufficient in the determination of the interestingness of rules. One way to approach this problem is by focusing on discovering unexpected patterns [29,30,13,14,19,20] where unexpectedness of discovered patterns is usually defined relative to a system of prior expectations. In this paper we define a degree of unexpectedness based on the semantic distance of the rule vocabulary and relative to a prior knowledge represented by ontology. Ontology represents knowledge with the relationships between concepts. It is organized as a DAG (Directed Acyclic Graph) hierarchy. We propose a new approach for ranking the most interesting rules according to conceptual distance (distance between the antecedent and the consequent of the rule) relative to the hierarchy. Highly related concepts are grouped together in the hierarchy. The more concepts are far away, the less are related to each other. The less concepts are related to each other and take part of the definition of a rule the more surprising the rule is and therefore interesting. With such ranking, a user can check fewer rules on the top of the list to extract the most pertinent ones.

Method Presentation

Data-mining is the process of discovering patterns in data. Data-mining methods have the drawbacks to generate a very large number of rules that are not of interest to the user. The use of objective measures of interestingness, such as confidence and support, is a step toward interestingness. Objective measures of interestingness are data driven; they measure the statistical strength of the rule and do not exploit domain knowledge and intuition of the decision maker. Beside objective measures, our approach exploit domain knowledge represented by ontology organized as DAG hierarchy. The nodes of the hierarchy represent the rules vocabulary. For a rule like (x AND y→z) x, y and z are nodes in the hierarchy. The semantic distance between the Antecedent (x AND y) and the consequent (z) of the rule is a measure of interestingness. The more the distance is high, the more the rule is unexpected and therefore interesting. Based on this measure a ranking algorithm helps in selecting those rules of interest to the user.

Semantic distance

Two main categories of algorithms for computing the semantic distance between terms organized in a hierarchical structure have been proposed in the literature [9]: distance-based approaches and information content-based approaches. The general idea behind the distance-based algorithms [24,12,32] is to find the shortest path between two terms in terms of number of edges. Information content-based approaches [10,24] are inspired by the perception that pairs of words which share many common contexts are semantically related. We will be using distance-based approaches in this paper. In an IS-A semantic network, the simplest form of determining the distance between two elemental concept nodes, A and B, is the shortest path that links A and B, i.e. the minimum number of edges that separate A and B or the sum of weights of the arcs along the shortest path between A and B [24].

In the hierarchy of Figure 1, the edges distance between nodes of the graph with weight=1 is: Dist(Apple, Kiwi) = 2 Dist(Carrots, Pepper) = 2 Dist(Apple, Meat) = 4 Dist(Fruit, Red Meat) = 4

Ontology

The prior knowledge of domain or a process in the field of data mining can help to select the appropriate information (preprocessing), decrease the space of hypothesis (processing), to represent results in a more comprehensible way and to improve process (post processing) [5]. Ontology expresses the domain knowledge which includes semantic links between domain individuals described as relations of inter-concepts or roles [7].

(X, Y)= max(h(X,Y),h(Y,X)) where h(X, Y)= max Xi∈X min Yj ∈Y X i − X j

The function h(X,Y) is called the directed Hausdorff 'distance' from X to Y (this function is not symmetric and thus is not a true distance). It identifies the point Xi∈X that is farthest from any point of Y, and measures the distance from Xi to its nearest neighbor in Y. the Hausdorff distance, H(X,Y), measures the degree of mismatch between two sets, as it reflects the distance of the point of X that is farthest from any point of Y and vice versa [8]. This expression measures semantic distance between groups X 1 ∧ . . . ∧ X k and Y 1 ∧ . . . ∧ Y m of concepts which contain k X i and m atomic Y j concepts respectively.

Rules ranking

In this section we introduce an algorithm to rank rules according to their degree of unexpectedness based on background knowledge. The rules we consider are on the form "body → head" where "body" and "head" are conjunctions of concepts in vocabulary of the ontology. We assume that other techniques carry out the task of patterns discovery and eliminated the patterns that do not satisfy objective criteria. With such ranking, a user can check simply few patterns on the top of the list to confirm rule pertinence.

; i ∈[1,k] ; j ∈[1,m] Body = X 1 ∧ . . . ∧ X k Head = Y 1 ∧ . . . ∧ Y m For i=1 to ND begin For j=1 to ND Distance (X i , X j ) =shortest path between X i , X j ; End For i=1 to N begin DU [i] = (Distance(X 1 ∧ . . . ∧ X k , Y 1 ∧ . . . ∧ Y m ))/2D

End Sort Descending degree of unexpectedness DU.

Example

In this section we present results from applying our method to the hierarchy of Figure 1 with a set of association rules R = {Apple → Kiwi; Apple → Carrots; P epper, Carrots → T urkey, Chicken} resulting from a data mining process.

Nodes distance Computation

The number of graph nodes in Figure 1 is ND=16 and the depth of the graph is D=3. The semantic distance (the minimum number of edges that separate 2 nodes) computation of Figure 1 graph nodes is presented in the following table (Table 1) where every cell represents the distance between the node on the line and the corresponding one on the column.

We have presented only the leaves of the hierarchy in Table 1 due to the fact that all the rules R are expressed using leaves concepts of the hierarchy.

Degree of unexpectedness computation

The maximum depth of the hierarchy in Figure 1 The degree of unexpectedness for a given rule X→Y is calculated using our expression DU(X→Y)=Distance(X,Y)/2D and the resulting computation is presented in ( The order of rules would be (c), (b), (a) based on degree of unexpectedness descending order as shown in (Table 2).From decision system point of view the rule (c) belongs to a higher level (Food) than the rule (b) that belongs to level (vegetable-dishes). The rule (a) belongs to a lower level (Fruit). More we move up on in the hierarchy more the decision is important and the vision of the decision maker is broader and therefore the discovered rule is more interesting. Rule (c) is the crossing result of domains (vegetables-dishes, Meat) which are farther than domains (vegetables, Fruits) of the rule (b). The rule (a) concerns domain (Fruit) only and therefore it is the less interesting.

(X,Y)= Distance (X 1 ∧ . . . ∧ X k , Y 1 ∧ . . . ∧ Y m ) = max(h(X, Y ), h(Y, X))

Experiments

The experiments were performed using a census income database with 48.842 records [3] with an implementation of our algorithm. To generate the association rules, we used the implementation of the Apriori algorithm [2] with a minimum support value equal 0.2 and a confidence value equal 0.2. The number of the generated rules set is 2225. In order to perform the experiments, we created the taxonomy of 81 weighted concepts based on the data set we are studying, as shown in (Table 3).

We conducted two tests, the first one with a weight value equals to one for all concepts with results presented in (Figure 2). The second test was conducted with different weights on the atomic concepts level (see Table 3 for weights), with results presented in (Figure 3). (Figure 2) and (Figure 3) are the extracted first two lines within each distance value for each test.

Fig. 3. Second Test Ranking Results

Looking to the results we notice:

1. Best results are those for highest weight (Figure 3 with Bachelors concept). 2. Best results from both tests are cross level concepts (higher subsumer like 'Personal', 'Education', 'Work' or 'census-income') and not those within the same concept level. 3. Low results from both tests (last 2 lines) are within the same concept level like 'Personal'.

Our approach is based on a hierarchy in (Table 3) which guides the resulting rules. The maximum hierarchy depth is 3 and it is the same as the minimum depth; this hierarchy is distributing the load equally between its different branches. The first test was conducted with weight equals to 1, for all concepts; In this case all 3. Experiment Taxonomy concepts have the same degree of interest to the user. The ranking rules algorithm picks those with higher subsumer concept. The common subsumer for the rules ((1), ( 2) and (3) of (Figure 2) is the top concept 'census-income', however The common subsumer for the rules (4) and ( 5) is the concept 'Work'. Rule (1) concerns 'sex' and 'occupation', however rules (2) and (3) are about education and occupation. The last 2 rules (4) and ( 5) express the relation between 'occupation' and 'salary-class'. We believe a rule like (1), ( 2) or ( 3) is more interesting, because it is giving us information between 'Education' and 'Personal' information and it involves a higher decision maker (strategic) than the one concerning 'occupation' and 'salary' that can concerns payroll for instance. The second test was conducted with a weight of 'bachelors' concept equals to 7 (among other concepts settings see Table 3).The user in this case is putting more emphasis on this concept by setting its weight to a high value or because it is really that important in the domain of study. The ranking rules algorithm picks those with higher weight. The common subsumer for the rules (1) and ( 2) of (Figure 3) is the concept 'census-income', but in this case with a 'Bachelors' concept as member of the rule. In this case the user is focusing his study on people with 'bachelors' education and their relation to 'Personal' information or 'Work'. The common subsumer for the last 2 rules of (Figure 3) is the concept 'Personal'. These rules express the relation between 'sex', 'age' and 'matrial-status' concepts. Even though interestingness is subjective (What's interesting of one may not be of the same degree of interest to the other), we believe more we move up on in the hierarchy, more the decision is important and the vision of the decision maker is broader,stratigic and important; therefore the discovered rule is more interesting. Our approach follows this vision.

Related Works

Unexpectedness of patterns has been studied in [29,30,13,14,19,20] and defined in comparison with user beliefs. A rule is considered interesting if it affects the levels of conviction of the user. The unexpectedness is defined in probabilistic terms in [29,30] while in [13] it is defined as a distance and it is based on a syntactic comparison between a rule and a conviction. Similarity and distance are defined syntactically based on the structure of the rules and convictions. A rule and a conviction are distant if the consequence of the rule and conviction is similar but antecedents are distant or vice versa. In [21] the focus is on discovering minimal unexpected patterns rather than using any of the post processing approaches, such as filtering, to determine the minimal unexpected patterns from the set of all the discovered patterns. In [18] unexpectedness is defined from the point of view of a logical contradiction of a rule and conviction, the pattern that contradict a prior knowledge is unexpected. It is based on the contradiction of the consequence of the rule and the consequence of belief. Given a rule A→B and a belief X→Y, if B AND Y is False with A AND X is true for broad group of data, the rule is unexpected. In [15] , the subjective interestingness (unexpectedness) of a discovered pattern is characterized by asking the user to specify a set of patterns according to his/her previous knowledge or intuitive feelings. This specified set of patterns is then used by a fuzzy matching algorithm to match and rank the discovered patterns. [26,27,28] has taken a different approach to the discovery of interesting patterns by eliminating noninteresting association rules. Rather than getting the users define their entire knowledge of a domain, they are asked to identify several non-interesting rules, generated by the Apriori algorithm. [25] use genetic algorithm to dynamically maintain and search populations of rule sets for the most interesting rules rather than act as post-processor. The rules identified by the genetic algorithm compared favorably with the rules selected by the domain expert [17]. Most researches on the unexpectedness makes a syntactic or semantic comparison between a rule and a belief. Our definition of unexpectedness is based on the structure of background knowledge (hierarchy) underlying the terms (vocabulary) of the rule. We are taking a different approach from all the preceding work. The preceding work is a filtering process based on a belief expressed as rules that the user has to enter. We are proposing a ranking process and the knowledge are not expressed as rules, but as hierarchy of concepts ontology. Ontologies enable knowledge sharing. Sharing vastly increases the potential for knowledge reuse and therefore allows our approach to get free knowledge just from using domain ontologies already available like "ONTODerm" for dermatology, "BIO-ONT" for biomedicine, "ASFA, OneFish , FIGIS , AGROVOC" for Food,etc.

In this paper we proposed a new approach to estimate the degree of unexpectedness of a rule with respect to ontology and ranking patterns according to their unexpectedness, defined on the base of ontological distance. The ranking algorithm proposed uses an ontology to calculate the distance between the antecedent and the consequent of rules on which is based the ranking. The more the conceptual distance is high, the more the rule represents a high degree of interest. This work constitutes a contribution to post analysis stage to help the user identify the most interesting patterns.

In the future, we plan to incorporate a semantic distance threshold in the algorithm of calculation of frequent items, to exploit others relation of ontology other than "IS-A". We are also validating our approach on fuzzy ontology to take into account vague and imprecise information.

Fig. 1 .1Fig. 1. hierarchy example

AlgorithmInput: Ontology, Set of rules Output: Ordred set of rules R: Set of rules R= {Ri/ Ri=body → head} where i ∈[1,N] ND: Number of nodes N: number of rules D: Maximum depth of the hierarchy DU: Array of size N representing degree of unexpectedness Xi, Yj : Atomic Concepts

Forthe set of rules R = {(a), (b), (c)} where: (a) Apple→ Kiwi (b) Apple → Carrots (c) Pepper, Carrots → Turkey, Chicken The detail computation distance of the rules (a), (b), (c) is : (a) Dist(Apple, Kiwi)=2 (b) Dist(Apple, Carrots)=4 (c ) Dist(Pepper∧Carrots, Turkey∧ Chicken)= max(h(Pepper∧Carrots, Turkey∧ Chicken), h(Turkey∧ Chicken,Pepper∧Carrots)) h(Pepper∧Carrots, Turkey∧ Chicken)=6 h(Turkey∧ Chicken,Pepper∧Carrots)=6 (c)Dist(Pepper∧Carrots, Turkey∧ Chicken)= 6

Fig. 2 .2Fig. 2. First Test Ranking Results

Table 1 .1Graph nodes distanceNodes Apple Kiwi Carrots Pepper Beef Mutton Turkey ChickenApple02446666Kiwi20446666Carrots44026666Pepper44206666Beef66660244Mutton66662044Turkey66664402Chicken 66664420Distance

is D=3. For a given rule X→Y where X=X 1 ∧ . . . ∧ X k and Y=Y 1 ∧ . . . ∧ Y k

Table 2 )2Label RuleDistance Degree of unexpectedness(a) Apple → Kiwi22/6=0.33(b) Apple → Carrots44/6=0.66(c)Pepper, Carrots → Turkey, Chicken66/6=1.00

Table 2 .2Rules degree of unexpectedness

Database mining: A performance perspective RAgrawal TImielinski ASwami IEEE Transactions on Knowledge and Data Engineering 5 6 December 1993 <author> <persName><forename type="first">Chistian</forename><surname>Borgelt</surname></persName> </author> <ptr target="http://www.borgelt.net/software.html" /> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b2"> <monogr> <ptr target="ftp://ftp.ics.uci.edu/pub/machine-learning-databases/census-income/" /> <title level="m">census income Book review: 'fuzzy set theory and its applications DidierDubois Diffusion scientifique HJZimmermann Kluwer Academic Publ. Dordrecht 1991 48 2nd edition A new algorithm for mining fuzzy association rules in the large databases based on ontology ZahraFarzanyar MohammadrezaKangavari SattarHashemi ICDM Workshops IEEE Computer Society 2006 From data mining to knowledge discovery: An overview MUsama GregoryFayyad PadhraicPiatetsky-Shapiro Smyth Advances in Knowledge Discovery and Data Mining 1996 Towards principles for the design of ontologies used for knowledge sharing TRGruber Formal Ontology in Conceptual Analysis and Knowledge Representation NGuarino RPoli

Deventer, The Netherlands

Kluwer Academic Publishers 1993 Comparing images using the hausdorff distance PDaniel GregoryAHuttenlocher WilliamJKl Rucklidge IEEE Transactions on Pattern Analysis and Machine Intelligence 15 1993 Semantic similarity based on corpus statistics and lexical taxonomy JayJJiang DavidWConrath CoRR, cmp-lg/9709008 1997 informal publication Semantic similarity based on corpus statistics and lexical taxonomy JayJJiang DavidWConrath September 20 1997 Comment: 15 pages. Postscript only Finding interesting rules from large sets of discovered association rules MikaKlemettinen HeikkiMannila PirjoRonkainen HannuToivonen AInkeriVerkamo Third International Conference on Information and Knowledge Management (CIKM'94) RNabil BharatKAdam YelenaBhargava Yesha ACM Press November 1994 Combining local context and WordNet similarity for word sense identification ClaudiaLeacock MartinChodorow WordNet: An Electronic Lexical Database ChristaineFellbaum

Cambridge, Massachusetts

The MIT Press 1998 Post-analysis of learned rules BingLiu WynneHsu Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference

Menlo Park

AAAI Press / MIT Press 1996 Using general impressions to analyze discovered classification rules BingLiu WynneHsu ShuChen Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97) DavidHeckerman HeikkiMannila DarylPregibon RamasamyUthurusamy the Third International Conference on Knowledge Discovery and Data Mining (KDD-97) AAAI Press 1997 31 Finding interesting patterns using user expectations BingLiu WynneHsu Lai-FunMun Hing-YanLee IEEE Trans. Knowl. Data Eng 11 6 1999 Selecting among rules induced from a hurricane database JohnAMajor JohnJMangano J. Intell. Inf. Syst 4 1 1995 A survey of interestingness measures for knowledge discovery KennethMcgarry Knowledge Eng. Review 20 1 2005 On the discovery of unexpected rules in data mining applications BPadmanabhan ATuzhilin Procs. of the Workshop on Information Technology and Systems (WITS '97) s. of the Workshop on Information Technology and Systems (WITS '97) 1997 On the Discovery of Unexpected Rules in Data Mining Applications A belief-driven method for discovering unexpected patterns BalajiPadmanabhan AlexanderTuzhilin KDD 1998 Unexpectedness as a measure of interestingness in knowledge discovery BalajiPadmanabhan AlexanderTuzhilin January 09 1999 On characterization and discovery of minimal unexpected patterns in rule discovery BalajiPadmanabhan AlexanderTuzhilin IEEE Trans. Knowl. Data Eng 18 2 2006 Discovery, analysis, and presentation of strong rules GregoryPiatetsky -Shapiro Knowledge Discovery in Databases AAAI/MIT Press 1991 The interstigness of deviations GregoryPiatetsky -Shapiro ChristopherJMatheus KDD Workshop 1994 Development and application of a metric on semantic nets RRada HMili EBicknell MBlettner IEEE Transactions on Systems, Man, and Cybernetics 19 1 January-February 1989 Discovering interesting knowledge from a science and technology database with a genetic algorithm WesleyRomão AlexAlvesFreitas Itana Maria DeSouza Gimenes Appl. Soft Comput 4 2 2004 Interestingness via what is not interesting SigalSahar KDD 1999 Interestingness preprocessing SigalSahar ICDM 2001 On incorporating subjective interestingness into the mining process SigalSahar ICDM 2002 On subjective measures of interestingness in knowledge discovery AbrahamSilberschatz AlexanderTuzhilin KDD 1995 What makes patterns interesting in knowledge discovery systems AbrahamSilberschatz AlexanderTuzhilin IEEE Transactions on Knowledge and Data Engineering 8 6 1996 Learning useful rules from inconclusive data RamasamyUthurusamy MUsama WScottFayyad Spangler Knowledge Discovery in Databases 1991 Verb semantics and lexical selection ZhibiaoWu MarthaPalmer Annual Meeting of the Association for Computational Linguistics

New Mexico State University; Las Cruces, New Mexico

1994