Introduction

Ontology-Driven Method for Ranking Unexpected Rules

Mohamed Said Hamani

saidhamani@hotmail.com 1

Ramdane Maamri

rmaamri@yahoo.fr 0 0 Mentouri-Constantine University , Algeria 1 Mohamed Boudiaf-M'sila University , Algeria

Several rule discovery algorithms have the disadvantage to discover too much patterns sometimes obvious, useless or not very interesting to the user. In this paper we propose a new approach for patterns ranking according to their unexpectedness using semantic distance calculated based on a prior background knowledge represented by domain ontology organized as DAG (Directed Acyclic Graph) hierarchy.

Introduction Semantic distance

Two main categories of algorithms for computing the semantic distance between terms organized in a hierarchical structure have been proposed in the literature [9]: distance-based approaches and information content-based approaches. The general idea behind the distance-based algorithms [24, 12, 32] is to nd the shortest path between two terms in terms of number of edges. Information content-based approaches [10, 24] are inspired by the perception that pairs of words which share many common contexts are semantically related. We will be using distance-based approaches in this paper. In an IS-A semantic network, the simplest form of determining the distance between two elemental concept nodes, A and B, is the shortest path that links A and B, i.e. the minimum number of edges that separate A and B or the sum of weights of the arcs along the shortest path between A and B [24].

In the hierarchy of Figure 1, the edges distance between nodes of the graph with weight=1 is: Dist(Apple, Kiwi) = 2 Dist(Carrots, Pepper) = 2 Dist(Apple, Meat) = 4 Dist(Fruit, Red Meat) = 4 2.2

Ontology

The prior knowledge of domain or a process in the eld of data mining can help to select the appropriate information (preprocessing), decrease the space of hypothesis (processing), to represent results in a more comprehensible way and to improve process (post processing)[5]. Ontology expresses the domain knowledge which includes semantic links between domain individuals described as relations of inter-concepts or roles [7]. For a given rule R :X ! Y where X = X1 ^ : : : ^ Xk , Y = Y1 ^ : : : ^ Ym and D is the maximum depth of the hierarchy, we de ne the degree of unexpectedness (DU) of a rule R as : DU(R)=Distance(X,Y)/2D. To compute the distance between groups of concepts, we choose to use Hausdor Distance Distance(X, Y)= max(h(X,Y),h(Y,X)) where

max min h(X, Y)= Xi2X Yj2Y kXi Xj k

The function h(X,Y) is called the directed Hausdor 'distance' from X to Y (this function is not symmetric and thus is not a true distance). It identi es the point Xi2X that is farthest from any point of Y, and measures the distance from Xi to its nearest neighbor in Y. the Hausdor distance, H(X,Y), measures the degree of mismatch between two sets, as it re ects the distance of the point of X that is farthest from any point of Y and vice versa [8]. This expression measures semantic distance between groups X1 ^ : : : ^ Xk and Y1 ^ : : : ^ Ym of concepts which contain k Xi and m atomic Yj concepts respectively. 2.4

Rules ranking

In this section we introduce an algorithm to rank rules according to their degree of unexpectedness based on background knowledge. The rules we consider are on the form "body ! head" where "body" and "head" are conjunctions of concepts in vocabulary of the ontology. We assume that other techniques carry out the task of patterns discovery and eliminated the patterns that do not satisfy objective criteria. With such ranking, a user can check simply few patterns on the top of the list to con rm rule pertinence.

Algorithm

Input : Ontology, Set of rules Output : Ordred set of rules R: Set of rules R= fRi/ Ri=body ! headg where i 2[1,N] ND: Number of nodes N: number of rules D: Maximum depth of the hierarchy DU: Array of size N representing degree of unexpectedness Xi, Yj : Atomic Concepts; i 2[1,k] ; j 2[1,m] Body = X1 ^ : : : ^ Xk Head = Y1 ^ : : : ^ Ym For i=1 to ND begin For j=1 to ND Distance (Xi; Xj) =shortest path between Xi; Xj; End For i=1 to N begin DU [i] = (Distance(X1 ^ : : : ^ Xk; Y1 ^ : : : ^ Ym))=2D End Sort Descending degree of unexpectedness DU. 3

Example

In this section we present results from applying our method to the hierarchy of Figure 1 with a set of association rules R = fApple ! Kiwi; Apple ! Carrots; P epper; Carrots ! T urkey; Chickeng resulting from a data mining process. 3.1

Nodes distance Computation

The number of graph nodes in Figure 1 is ND=16 and the depth of the graph is D=3. The semantic distance (the minimum number of edges that separate 2 nodes) computation of Figure 1 graph nodes is presented in the following table (Table 1) where every cell represents the distance between the node on the line and the corresponding one on the column.

We have presented only the leaves of the hierarchy in Table 1 due to the fact that all the rules R are expressed using leaves concepts of the hierarchy. 3.2

Degree of unexpectedness computation

The maximum depth of the hierarchy in Figure 1 is D=3.

For a given rule X!Y where X=X1 ^ : : : ^ Xk and Y=Y1 ^ : : : ^ Yk Distance(X,Y)= Distance (X1 ^ : : : ^ Xk; Y1 ^ : : : ^ Ym) = max(h(X; Y ); h(Y; X)) For the set of rules R = f(a), (b), (c)g where: (a) Apple! Kiwi (b) Apple ! Carrots (c) Pepper, Carrots ! Turkey, Chicken The detail computation distance of the rules (a), (b), (c) is : (a) Dist(Apple, Kiwi)=2 (b) Dist(Apple, Carrots)=4 (c ) Dist(Pepper^Carrots, Turkey^ Chicken)= max(h(Pepper^Carrots, Turkey^ Chicken), h(Turkey^ Chicken,Pepper^Carrots)) h(Pepper^Carrots, Turkey^ Chicken)=6 h(Turkey^ Chicken,Pepper^Carrots)=6 (c)Dist(Pepper^Carrots, Turkey^ Chicken)= 6 The degree of unexpectedness for a given rule X!Y is calculated using our expression DU(X!Y)=Distance(X,Y)/2D and the resulting computation is presented in (Table 2) The order of rules would be (c), (b), (a) based on degree of unexpectedness descending order as shown in (Table 2).From decision system point of view the rule (c) belongs to a higher level (Food) than the rule (b) that belongs to level (vegetable-dishes). The rule (a) belongs to a lower level (Fruit). More we move up on in the hierarchy more the decision is important and the vision of the decision maker is broader and therefore the discovered rule is more interesting. Rule (c) is the crossing result of domains (vegetables-dishes, Meat) which are farther than domains (vegetables, Fruits) of the rule (b). The rule (a) concerns domain (Fruit) only and therefore it is the less interesting. 4

Experiments

The experiments were performed using a census income database with 48.842 records [3] with an implementation of our algorithm. To generate the association rules, we used the implementation of the Apriori algorithm [2] with a minimum support value equal 0.2 and a minimum con dence value equal 0.2. The number of the generated rules set is 2225. In order to perform the experiments, we created the taxonomy of 81 weighted concepts based on the data set we are studying, as shown in (Table 3).

We conducted two tests, the rst one with a weight value equals to one for all concepts with results presented in (Figure 2). The second test was conducted with di erent weights on the atomic concepts level (see Table 3 for weights), with results presented in (Figure 3). (Figure 2) and (Figure 3) are the extracted rst two lines within each distance value for each test.

Looking to the results we notice: 1. Best results are those for highest weight (Figure 3 with Bachelors concept). 2. Best results from both tests are cross level concepts (higher subsumer like 'Personal', 'Education', 'Work' or 'census-income') and not those within the same concept level. 3. Low results from both tests (last 2 lines) are within the same concept level like 'Personal'.

Our approach is based on a hierarchy in (Table 3) which guides the resulting rules. The maximum hierarchy depth is 3 and it is the same as the minimum depth; this hierarchy is distributing the load equally between its di erent branches. The rst test was conducted with weight equals to 1, for all concepts; In this case all

Private Self-emp-not-inc

Self-emp-inc Self-emp-inc Federal-gov Local-gov

State-gov Without-pay Never-worked Tech-support Craft-repair Other-service

Sales Exec-managerial

Prof-specialty Handlers-cleaners Machine-op-inspct

Adm-clerical Farming- shing Transport-moving

Protective-serv Armed-Forces 50K 50K Tech-support

Bachelors Some-college

11th

HS-grad Prof-school Assoc-acdm .

Preschool

9 9 num 13 num num 15 13 15 concepts have the same degree of interest to the user. The ranking rules algorithm picks those with higher subsumer concept. The common subsumer for the rules (( 1 ), ( 2 ) and ( 3 ) of (Figure 2) is the top concept 'census-income', however The common subsumer for the rules ( 4 ) and ( 5 ) is the concept 'Work'. Rule ( 1 ) concerns 'sex' and 'occupation', however rules ( 2 ) and ( 3 ) are about education and occupation. The last 2 rules ( 4 ) and ( 5 ) express the relation between 'occupation' and 'salary-class'. We believe a rule like ( 1 ), ( 2 ) or ( 3 ) is more interesting, because it is giving us information between 'Education' and 'Personal' information and it involves a higher decision maker (strategic) than the one concerning 'occupation' and 'salary' that can concerns payroll for instance.

The second test was conducted with a weight of 'bachelors' concept equals to 7 (among other concepts settings see Table 3).The user in this case is putting more emphasis on this concept by setting its weight to a high value or because it is really that important in the domain of study. The ranking rules algorithm picks those with higher weight. The common subsumer for the rules ( 1 ) and ( 2 ) of (Figure 3) is the concept 'census-income', but in this case with a 'Bachelors' concept as member of the rule. In this case the user is focusing his study on people with 'bachelors' education and their relation to 'Personal' information or 'Work'. The common subsumer for the last 2 rules of (Figure 3) is the concept 'Personal'. These rules express the relation between 'sex', 'age' and 'matrial-status' concepts. Even though interestingness is subjective (What's interesting of one may not be of the same degree of interest to the other), we believe more we move up on in the hierarchy, more the decision is important and the vision of the decision maker is broader,stratigic and important; therefore the discovered rule is more interesting. Our approach follows this vision. 5

Related Works

Unexpectedness of patterns has been studied in [29, 30, 13, 14, 19, 20] and de ned in comparison with user beliefs. A rule is considered interesting if it a ects the levels of conviction of the user. The unexpectedness is de ned in probabilistic terms in [29, 30] while in [13] it is de ned as a distance and it is based on a syntactic comparison between a rule and a conviction. Similarity and distance are de ned syntactically based on the structure of the rules and convictions. A rule and a conviction are distant if the consequence of the rule and conviction is similar but antecedents are distant or vice versa. In [21] the focus is on discovering minimal unexpected patterns rather than using any of the post processing approaches, such as ltering, to determine the minimal unexpected patterns from the set of all the discovered patterns. In [18] unexpectedness is de ned from the point of view of a logical contradiction of a rule and conviction, the pattern that contradict a prior knowledge is unexpected. It is based on the contradiction of the consequence of the rule and the consequence of belief. Given a rule A!B and a belief X!Y, if B AND Y is False with A AND X is true for broad group of data, the rule is unexpected. In [15] , the subjective interestingness (unexpectedness) of a discovered pattern is characterized by asking the user to specify a set of patterns according to his/her previous knowledge or intuitive feelings. This speci ed set of patterns is then used by a fuzzy matching algorithm to match and rank the discovered patterns. [26, 27, 28] has taken a di erent approach to the discovery of interesting patterns by eliminating noninteresting association rules. Rather than getting the users de ne their entire knowledge of a domain, they are asked to identify several non-interesting rules, generated by the Apriori algorithm. [25] use genetic algorithm to dynamically maintain and search populations of rule sets for the most interesting rules rather than act as post-processor. The rules identi ed by the genetic algorithm compared favorably with the rules selected by the domain expert [17]. Most researches on the unexpectedness makes a syntactic or semantic comparison between a rule and a belief. Our de nition of unexpectedness is based on the structure of background knowledge (hierarchy) underlying the terms (vocabulary) of the rule. We are taking a di erent approach from all the preceding work. The preceding work is a ltering process based on a belief expressed as rules that the user has to enter. We are proposing a ranking process and the knowledge are not expressed as rules, but as hierarchy of concepts ontology. Ontologies enable knowledge sharing. Sharing vastly increases the potential for knowledge reuse and therefore allows our approach to get free knowledge just from using domain ontologies already available like "ONTODerm" for dermatology, "BIO-ONT" for biomedicine, "ASFA, OneFish , FIGIS , AGROVOC" for Food,etc.

Conclusion and future work

In this paper we proposed a new approach to estimate the degree of unexpectedness of a rule with respect to ontology and ranking patterns according to their unexpectedness, de ned on the base of ontological distance. The ranking algorithm proposed uses an ontology to calculate the distance between the antecedent and the consequent of rules on which is based the ranking. The more the conceptual distance is high, the more the rule represents a high degree of interest. This work constitutes a contribution to post analysis stage to help the user identify the most interesting patterns.

In the future, we plan to incorporate a semantic distance threshold in the algorithm of calculation of frequent items, to exploit others relation of ontology other than "IS-A". We are also validating our approach on fuzzy ontology to take into account vague and imprecise information. [19] Balaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In

KDD, pages 94{100, 1998. [20] Balaji Padmanabhan and Alexander Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery, January 09 1999. [21] Balaji Padmanabhan and Alexander Tuzhilin. On characterization and discovery of minimal unexpected patterns in rule discovery. IEEE Trans. Knowl. Data Eng, 18( 2 ):202{216, 2006. [22] Gregory Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in

Databases, pages 231{233. AAAI/MIT Press, 1991. [23] Gregory Piatetsky-Shapiro and Christopher J. Matheus. The interstigness of deviations. In KDD Workshop, pages 25{36, 1994. [24] R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE

Transactions on Systems, Man, and Cybernetics, 19( 1 ):17{30, January-February 1989. [25] Wesley Roma~o, Alex Alves Freitas, and Itana Maria de Souza Gimenes. Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput., 4( 2 ):121{137, 2004. [26] Sigal Sahar. Interestingness via what is not interesting. In KDD, pages 332{336, 1999. [27] Sigal Sahar. Interestingness preprocessing. In ICDM, pages 489{496, 2001. [28] Sigal Sahar. On incorporating subjective interestingness into the mining process. In ICDM, pages 681{684, 2002. [29] Abraham Silberschatz and Alexander Tuzhilin. On subjective measures of interestingness in knowledge discovery.

In KDD, pages 275{281, 1995. [30] Abraham Silberschatz and Alexander Tuzhilin. What makes patterns interesting in knowledge discovery systems.

IEEE Transactions on Knowledge and Data Engineering, 8( 6 ):970{974, 1996. [31] Ramasamy Uthurusamy, Usama M. Fayyad, and W. Scott Spangler. Learning useful rules from inconclusive data. In Knowledge Discovery in Databases, pages 141{158. 1991. [32] Zhibiao Wu and Martha Palmer. Verb semantics and lexical selection. In 32nd. Annual Meeting of the Association for Computational Linguistics, pages 133{138, New Mexico State University, Las Cruces, New Mexico, 1994.

[1]

Agrawal ,

Imielinski , and

Swami . Database mining: A performance perspective . IEEE Transactions on Knowledge and Data Engineering , 5 ( 6 ): 914 { 925 , December 1993 .

[2]

Chistian

Borgelt . http://www.borgelt.net/software.html.

[3] census income . ftp://ftp.ics.uci.edu/pub/machine-learning-databases/census-income/.

[4]

Didier

Dubois . Book review: 'fuzzy set theory and its applications' (2nd edition) by H . J. zimmermann. Di usion scienti que , 1991 . BUSEFAL, Kluwer Academic Publ. Dordrecht, V. 48 , p. 169 - 170 .

[5]

Zahra

Farzanyar , Mohammadreza Kangavari, and

Sattar

Hashemi . A new algorithm for mining fuzzy association rules in the large databases based on ontology . In ICDM Workshops , pages 65 { 69 . IEEE Computer Society, 2006 .

[6] Usama

Fayyad , Gregory Piatetsky-Shapiro, and Padhraic

Smyth . From data mining to knowledge discovery: An overview . In Advances in Knowledge Discovery and Data Mining , pages 1 { 34 . 1996 .

[7]

T. R.

Gruber . Towards principles for the design of ontologies used for knowledge sharing . In N. Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge Representation , Deventer, The Netherlands, 1993 . Kluwer Academic Publishers.

[8] Daniel

Huttenlocher , Gregory A. Kl , and William J. Rucklidge . Comparing images using the hausdor distance . IEEE Transactions on Pattern Analysis and Machine Intelligence , 15 : 850 { 863 , 1993 .

[9] Jay

Jiang and David W.

Conrath . Semantic similarity based on corpus statistics and lexical taxonomy . CoRR, cmp-lg/9709008 , 1997 . informal publication.

[10] Jay

Jiang and David W.

Conrath . Semantic similarity based on corpus statistics and lexical taxonomy . September 20 1997. Comment: 15 pages, Postscript only .

[11] Mika

Klemettinen

, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, and

A. Inkeri

Verkamo . Finding interesting rules from large sets of discovered association rules . In Nabil R. Adam, Bharat K. Bhargava , and Yelena Yesha, editors, Third International Conference on Information and Knowledge Management (CIKM'94) , pages 401 { 407 . ACM Press, November 1994 .

[12]

Claudia

Leacock and

Martin

Chodorow . Combining local context and WordNet similarity for word sense identi cation . In Christaine Fellbaum, editor, WordNet: An Electronic Lexical Database , pages 265 { 283 . The MIT Press, Cambridge, Massachusetts, 1998 .

[13]

Bing

Liu and

Wynne

Hsu . Post-analysis of learned rules . In Proceedings of the Thirteenth National Conference on Arti cial Intelligence and the Eighth Innovative Applications of Arti cial Intelligence Conference , pages 828 { 834 , Menlo

Park

, 1996 . AAAI Press / MIT Press.

[14] Bing

Liu

, Wynne Hsu, and

Shu

Chen . Using general impressions to analyze discovered classi cation rules . In David Heckerman,

Heikki

Mannila , Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97) , page 31. AAAI Press, 1997 .

[15] Bing

Liu

, Wynne Hsu, Lai-Fun Mun , and Hing-Yan Lee . Finding interesting patterns using user expectations . IEEE Trans. Knowl. Data Eng , 11 ( 6 ): 817 { 832 , 1999 .

[16] John

Major and John J. Mangano.

Selecting among rules induced from a hurricane database . J. Intell. Inf. Syst , 4 ( 1 ): 39 { 52 , 1995 .

[17]

Kenneth

McGarry . A survey of interestingness measures for knowledge discovery . Knowledge Eng. Review , 20 ( 1 ): 39 { 61 , 2005 .

[18]

Padmanabhan and

Tuzhilin . On the discovery of unexpected rules in data mining applications. In On the Discovery of Unexpected Rules in Data Mining Applications . In Procs. of the Workshop on Information Technology and Systems (WITS '97) , pages pp. 81 { 90 , 1997 .