Introduction

New Approach to Mining Fuzzy Association Rule with Linguistic Threshold Based on Hedge Algebras

Le Anh Phuong

Tran Dinh Khang

Nguyen Vinh Trung

1 0 Department of Computer Science, Hue University of Education, Hue University 1 Information Technology Center, Hue University of Education, Hue University 2 SoICT, Hanoi University of Science and Technology

The authors [2-5] have studied and presented the quantitative method of linguistic variables and linguistic threshold by fuzzy set. Chien-Hua Wang, Chin-Pang Tzong proposed an algorithms for mining fuzzy association rule [2]. In this paper, we extend the algorithms proposed in [2] for number data and linguistic variables by using hedge algebras.

fuzzy association rules linguistic threshold hedge algebra

Introduction

Data mining with the approach of association rules is one of important aspects in the field of data mining.

Many authors have presented various methods, algorithms of data mining by association rules with numerical support and confidence value. However, in reality, these values are natural linguistic ones. Besides, importance value of each item is evaluated not only by quantity, frequency of occurrence in each transaction but also by the qualitative evaluation of administrators (for those items) by natural language. And hedge algebra have met the requirements for directly processing calculation on linguistic value (without fuzzification, but with direct calculation based on qualitative semantic function and flexible calculation). Thus, it is necessary to establish a method of data mining by association rules with hedge algebra, in which the input is qualitative transactional database and qualitative evaluation table of those database items and the support, confidence values are also natural language ones. 2.1

Association rules

Let I = I1, I2, . . . , Im be a set of items. Let D, the task-relevant data, be a set of database transactions where each transaction T is a set of items, such is T ✓ I. Each transaction is associated with an identi er, called TID. Definition 1. An association rule has the form of X ! Y , where X ✓ Y ✓ I, and X \ Y = ✓ .

I, Definition 2. The support of association rule X ! Y the probability that X [ Y exists in a transaction in the database D.

support(X ! Y ) = |X \ Y | |N | Definition 3. The confidence of the association rule X ! Y is the probability that X [ Y exists given that a transaction contains X, i.e.

confidence (X ! Y ) = support(X [ Y ) = |X \ Y | support(X) |X| Where: |X| is the number of transactions, including X; |X \ Y | is the number of transactions, including X and Y ; N is the total of transaction database.

Mining the association rules of the database is finding all of the rules that have the degree of support and confidence greater than degree of support minsup and confidence minconf determined by the available user. 2.2

Hedge algebras (HA)

Let X be a linguistic variable and X be a set of its terms, called a term-domain of X. E.g. if X is the rotation speed of an electrical motor and linguistic hedges used to describe its speed are V ery, M ore, P ossibly, Little, denoted correspondingly for short by V, M, P and L, then X = {f ast, V f ast, M f ast, LP f ast, Lf ast, P f ast, Lslow, slow, P slow, V slow, ...} U 0, W, 1 is a term-domain of X.

It can be considered as an abstract algebra AX = (X, C, H,  ), where H is a set of linguistic hedges, which can be regarded as one-argument operations,  is called a semantics-based ordering relation on X and W, 0, 1 is a set of constants in X with fast and slow being primary terms of X and W, 0, 1 being additional elements in X interpreted as the neutral, the least and the greatest ones, respectively. Denote by hx the result of applying an h 2 H to x 2 X and by H(x) the set of all u 2 X generated algebraically from x by using hedges in H, i.e. H(x) = u: u = hn...h1x, h1, ..., hn 2 H.

It is natural that there is a demand to transform fuzzy sets defined on a real interval [a, b], which represents the meaning of terms in a term-domain X, into [a, b] or, for normalization, into [ 0, 1 ]. This defines a mapping of the termdomain X into [ 0, 1 ], called in the algebraic approach a semantically quantifying mapping. Now, we take these mappings in mind to define a notion of fuzziness measure. Let us consider a mapping f from X into [ 0, 1 ], which preserves the ordering relation on X. Then, the “size” of the set H(x), for x 2 X, can be measured by the diameter of f (H(x)) ✓ [ 0, 1 ]. That is that this diameter will be considered as a fuzzy measure of the term x. Taking this model of fuzziness measure in mind, we may adopt the following definition:

Let AX = (X, C, H,  ) be a linear HA. An f m: X ! [ 0, 1 ] is said to be a fuzzy measure of terms in X if: Definition 4. For each x 2 as follows: 1) if x = c+ or x = c then |x| = 1. 2) if x = hx0 then |x| = 1 + |x0|, for all h 2 H.

X, the length of x is denoted by |x|, and defined Proposition 1. The fuzziness measure (f m) and the fuzziness measure of hedge h, denoted by µ (h), 8 h 2 H, with the following properties: 1) f m(hx) = µ(h) ⇥ f m(x) with 8 x 2 X; 2) f m(c+) + f m(c ) = 1; 3) P 4) P 5) P q i p,i6=0 f m(hic) = f m(c), c 2 { c+, c }; q i p,i6=0 f m(hix) = f m(x); q i 1 µ(hi) = ↵ , P1 j p µ(hj ) = , ↵ + 3

Algorithm

+ Calculate the fuzzy of variable X: f m(X); + Identify fuzzy approximately of X: I(X) = [a, b]; + The fuzzy average value of the variable X: gt(X) = Step 2: Handling qualitative table: A set of m items with their importance evaluated by d managers + Calculate the fuzzy of linguistic variables; + Calculate the average o↵uzzy approximatelyqualitativeterms for all items. kdt⇠tb(j) = 1 d ⇥

d X (a(j)i, b(j)i) ; (has the form: [aj , bj ]) i=1 + Calculate the average of fuzzy value for each item: gtdt⇠tb(j) = aj + bj 2 ; where: aj and bj are the values of kdt⇠tb(j), which kdt⇠tb(j) = [aj , bj ]

Step 3: Handling n quantitative transactions.

+ Transform the quantitative valueas Aj (j = (1, m)) as X variables in HA (X 2 X), determined as follows: Xsl = (Xsl, Gsl, Hsl,  ), with: Gsl = {High, Low}, (High = H, Low = L); c+ = {H}; c = {L}; Hs+l = {V ery, M ore}; Hsl = {Less, P ossibly}; (with V ery > M ore; Less > P ossibly) - Selection: Dom(sl); fm(H); fm(L); fm(V); fm(M); fm(L); fm(P); - Identify fuzzy approximately of X is I(X), with X 2 X - Transform the quantitative value of item into [ 0, 1 ] respectively;

With each Aj 2 [ 0, 1 ] that into fuzzy approximately I(X), respectively; + Statistics of fuzzy partitions in D⇠ + Find the largest fuzzy partition as representative of each item jth: max countj = max(countji), with i = (1, K); Step 4: Calculate the fuzzy support of each item (j = 1, m), as: sup(j) = where gtdt⇠tb(j) is the qualitative value (calculated by formula (3), in step 2); max countj is the quantitative vaule (calculated by formula (4), in step 3); and N is the total number of transaction data, N = |D|.

Step 5: Filter out all items in D⇠ , such that: satisfied frequent item of minimum support: sup(item) minsup.

Step 6: Establish Fuzzy FP-tree: establish Header table; establish FP-tree

Step 7: Calculate the fuzzy qualitative of n-itemset (K n 2).

+ Find out of all frequent itemsets (denote by n-itemset) from FP-tree; + Calculate the qualitative of n-itemset.

Step 8: Calculate the fuzzy support of each n-itemset.

+ Using the formula (5) - in step 4: sup(n itemset) = Step 9: Export rules, calculate the confidence and check with minconf. Using the following substeps: + Check the association rules from result of step 8, each n-itemset with items (A1, A2, ..., An), (n = 2, M ): A1^ ... ^ Ai 1^ Ai+1...An ! Ai; (i = 1, M ) + Calculate the fuzzy confidence value of each possible fuzzy association rule as: conf (A ! B) = sup(A [ B) sup(A) ; (6) + Select the satisfied fuzzy association rule of minimum confidence.

During use of HA for fuzzy transaction database and quantify of linguitics, we view each element of HA is a fuzzy region. So, the process of creating fuzzy region based on the structure of HA will simple, intuitive, and more cient. 4

An example

In this section, an example is given to illustrate the proposed algorithm. Input: includes three data follows: 1. The data set includes six quantitative transactions, as show in Table 2. 2. The importance of the items is evaluated by three managers as shown in Table 3.

3. A pre-defined linguistic minimum support value min s and linguistic minimum confidence value min c.

Output: A set of fuzzy association rules. Method: Includes 9 following general steps:

Step 1: Identify minsup, minconf from the pre-defined threshold linguistic

Identify parameters in HA: X = (X, G, H,  ), with: G = {Low, High}; c+ = High (denoted by H); c = Low (denoted by L); H+ = {V ery, M ore}, H = {Less, P ossibly}; (with: V ery > M ore; Less > P ossibly) with: f m(L) = 0.3; f m(H) = 0.7; f m(V ) = f m(M ) = f m(L) = f m(P ) = 0.25; Identify fuzzy degree and fuzzy approximately of X:

With the variable X contains c = “Low”: + f m(V L) = 0.25 ⇥ 0.3 = 0.075 ) I(V L) = [0, 0.075] ) I(V L)T B = 3.75% + f m(M L) = 0.25 ⇥ 0.3 = 0.075 ) I(M L) = [0.075, 0.15] ) I(M L)T B = 11.25% + f m(P L) = 0.25 ⇥ 0.3 = 0.075 ) I(P L) = [0.15, 0.225] ) I(P L)T B = 18.75% + f m(LL) = 0.25 ⇥ 0.3 = 0.075 ) I(LL) = [0.225, 0.3] ) I(LL)T B = 26.25%

Similar, with the variable X contains c+ = “High”: + f m(LH) = 0.25⇥ 0.7 = 0.175 ) I(LH) = [0.3, 0.475] ) I(LH)T B = 38.75% + f m(P H) = 0.25 ⇥ 0.7 = 0.175 ) I(P H) = [0.475, 0.65] ) I(P H)T B = 56.25% + f m(M H) = 0.25 ⇥ 0.7 = 0.175 ) I(M H) = [0.65, 0.825] ) I(M H)T B = 73.75% + f m(V H) = 0.25 ⇥ 0.7 = 0.175 ) I(V H) = [0.825, 0.1] ) I(V H)T B = 91.25% - Select minsupport with linguistic thresholds as “Less Low” (denoted by LL) - Select minconf with linguistic thresholds as “More High” (denoted by MH) minsup = minsup(LL) = 26.25% minconf = minconf (M H) = 73.75% Step 2: Handling qualitative table: A set of m items with their importance evaluated by 03 managers.

Identify parameters in HA: Denote: I: Important; uI: UnImportant; O: Ordinary; VI: Very Important; VuI: Very UnImportant;

Xqt = (Xqt, Gqt, Hqt,  ), with: Gqt = {I mportant, U nI mportant}; c+ = I mportant; c = U nI mportant; Hq+t = {V ery, M ore}; Hqt = {Less, P ossibly}; (with: V ery > M ore; Less > P ossibly).

Let: Wqt = 0.5; f m(I ) = 0.4; f m(uI ) = 0.6; f m(V ) = 0.3; f m(M ) = 0.2; f m(L) = 0.3; f m(P ) = 0.2;

Should have: f m(V I ) = 0.3 ⇥ 0.4 = 0.12 ) I (V I ) = [0.88, 1]; f m(V uI ) = 0.3 ⇥ 0.6 = 0.18 ) I (V uI ) = [0, 0.18]; f m(O) = 0.5 ) I (O) = [0.25, 0.75];

Table 3 is converted into Table 4, where kdt⇠tb is the average of fuzzy approximately qualitative; gtdt⇠tb is the average of fuzzy value. + fuzzy approximately of support: ([0.6, 1] ⇥ 3.04)/6 = [0.304, 0.51]; + fuzzy value of support: (0.304 + 0.51)/2 = 0.41 = 41%.

Step 5: Filter out all items in D⇠ . Such that: satisfied frequent item of minimum support: sup(item) minsup.

If: sup(item) < minsup (with: minsup = 26.25%, result at Step 1) Then: remove item in table 8.

Step 6: Establish fuzzy FP-tree: see figure 1 Step 7: Calculate the fuzzy qualitative of n-itemset

Substep 7.1: Find out of all frequent itemsets (denote by n-itemset) from FP-tree (see Table 11) 2-item 3-item F.PH, B.PH: 1.52; F.PH, E.PH: 1.52; B.PH, E.PH: 2.28 F.PH, B.PH, E.PH: 1.52

itemset kdt⇠tb gtdt⇠tb F.PH, B.PH (0.693, 0.92) 81% F.PH, E.PH (0.6, 0.92) 76% B.PH, E.PH (0.693, 1) 85% F.PH, B.PH, (0.693, 0.92) 81%

E.PH

Result, we have 2 rules:

Itemset Support Minsup = 26.25% F.PH, E.PH 19% unselected F.PH, B.PH 21% unselected E.PH, B.PH 32% selected F.PH, B.PH, 21% unselected E.PH 5

Conclusion References

The paper is an extension of the evaluation of fuzzy association rules was researched by Chien-Hua Wang and Chin-Pang Tzong [ 2 ], using algebras instead of fuzzy sets. The optimization of the parameters of quantitative semantic content in order to fit various problems will be discussed in our next papers.

Tran

Thai Son and Nguyen Anh Tuan, Improve eciency of fuzzy association rule using hedge algebra approach , Journal of Computer Science and Cybernetics , v. 30 , n. 4 , 397 - 408 , 2014

2. Chien-Hua Wang and Chin-Tzong

Pang

, Finding Fuzzy Association Rules Using FWFP-Growth with linguistic Supports and Confidences , World Academy of Science, Engineering and Technology, 29 , 1133 - 1141 , 2009

3. Chien-Hua

Wang

, Chin-Tzong Pang and Sheng-Hsing

Liu

, Mining association rules uses fuzzy weighted FP-growth , Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS) , 2012 Joint 6th International Conference on, 13498461 , 983 - 988 , 2012

4. Tzung-Pei

Hong

, Chun-Wei Lin and Wen-Hsiang

, Lingguitic data mining with fuzzy FP-trees , Expert Systems with Applications 37 , 4560 - 4567 , 2010

5. Tzung-Pei

Hong

, Minh-Jer Chiang and Shyue-Liang

Wang

, Data Mining with Linguistic Thresholds, Int.Jcontemp. Math. Sciences, vol 7 , n. 35 , 1711 - 1725 , 2012