<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Formal Concept Analysis Approach to Discover Association Rules from Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Mondher</forename><surname>Maddouri</surname></persName>
							<email>mondher.maddouri@fst.rnu.tn</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Sciences</orgName>
								<orgName type="institution">National Institute of Applied Sciences and Technology of Tunis -INSAT University of Carthago</orgName>
								<address>
									<addrLine>Centre Urbain Nord</addrLine>
									<postBox>B.P. 676</postBox>
									<postCode>1080</postCode>
									<settlement>TUNIS CEDEX</settlement>
									<country key="TN">TUNISIE</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Mathematics &amp; Computer Sciences</orgName>
								<orgName type="institution">National Institute of Applied Sciences &amp; Technology of Tunis -INSAT University of Carthago</orgName>
								<address>
									<addrLine>Centre Urbain Nord</addrLine>
									<postBox>B.P. 676</postBox>
									<postCode>1080</postCode>
									<settlement>TUNIS CEDEX</settlement>
									<country key="TN">TUNISIE</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Formal Concept Analysis Approach to Discover Association Rules from Data</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">944D725658D4A2FFE2B8A71BA0758556</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Discovery of association rules is a non-supervised task of data mining. Its mostly hard step is to look for the frequent itemsets embedded into large amounts of data. Based on the theory of Formal Concept Analysis, we suggest that the notion of Formal Concept generalizes the notion of itemset, since it takes into account the itemset (as the intent) and the support (as the cardinality of the extent). Accordingly, we propose a new approach to mine interesting item-sets as the optimal concepts covering a binary table (concept coverage). This approach uses a new quality-criteria of a rule: the gain that generalizes the support criteria.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The volume of stored data increases rapidly. In the past, it doubles each three centuries. Nowadays, it doubles each 72 days. The analysis of huge volumes of data by computers leads to the emerging field of Data-Mining and Knowledge-Discovery. This includes techniques to create knowledge from data sets serving for classification, clustering, prediction, estimation and optimization tasks <ref type="bibr" target="#b0">[1]</ref>.</p><p>The notion of "Concept" as a couple of intent and extent (serving to represent nuggets of knowledge) was introduced and formalized many decades ago <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. Recently, its use in Data-Mining and Knowledge-Discovery is in growth <ref type="bibr" target="#b4">[5]</ref>. Our ConceptMiner project (a development environment in MS-Visual C++) aims to design new heuristic algorithms for Data-Mining based on Formal Concept Analysis.</p><p>In this paper, we study the clustering problem that we resolve by the induction of association rules. In fact, the experts (biologists) want to know which sub-sequences of proteins that appear together in Tall Like Receptor (TLR) sequences or even in Similar TLR sequences <ref type="bibr" target="#b6">[8]</ref>.</p><p>In section 2, we resume the basic concepts of association rules and we outline the well-known techniques. In section 3, we introduce the basic definitions of Formal Concept Analysis that we use later. In section 4, we give two heuristic algorithms to calculate the optimal Item-sets from rows of data. Illustrative examples are given throughout the paper. Finally, we present our concluding remarks and the future perspectives of this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Basics in Association Rules</head><p>Discovering association rules is a non-supervised data-mining task that aims to induce symbolic conditional rules from a data file <ref type="bibr" target="#b0">[1]</ref>. The conditional rule has the following syntax: "IF conditions THEN results". The combination of conditions leads to a descriptive rule like: "IF A AND NOT B THEN C AND D". It can lead, also, to a predictive rule including the time dimension like: "IF condition at time_1 THEN result at time_2". This looks like: "IF he_bays(TV, today) THEN he_bayes(recorder, in_one_year)".</p><p>The association rules are used in a wide range of applications, due to their comprehensive format. Mainly, it is used to analyze the supermarket tickets. This helps in optimal stock management, in merchandizing and marketing <ref type="bibr" target="#b0">[1]</ref>.</p><p>During the few last years, a wide variety of algorithms have been proposed. The most popular algorithm is certainly the iterative one: Apriori <ref type="bibr" target="#b7">[9]</ref>. It consists in three steps. First aboard, it considers all the attributes/properties describing the set of data as the initial itemsets. Then, it combines the couples of item-sets in order to obtain larger item-sets. Finally, it searchs for the rules that can be derived from each itemset.</p><p>MaxEclat and MaxClique <ref type="bibr" target="#b8">[10]</ref> use graph theory to decompose the lattice into sublattices in order to mine maximal frequent item-sets. MaxMiner <ref type="bibr" target="#b9">[11]</ref> is an exploring algorithm for the maximal frequent item-sets with a look-ahead pruning strategy. PincerSearch <ref type="bibr" target="#b10">[12]</ref> follows both a bottom-up and a top-down search in the lattice at the same time while looking for the maximal frequent patterns. GenMax <ref type="bibr" target="#b11">[13]</ref> uses a backtracking search to enumerate all maximal frequent item-sets and eliminates progressively non maximal ones. Mafia <ref type="bibr" target="#b12">[14]</ref> works with a depth first search and uses pruning strategies to mine maximal frequent item-sets. DepthProject <ref type="bibr" target="#b13">[15]</ref> follows dynamic re-ordering to reduce the search space. Salleb &amp; al. <ref type="bibr" target="#b14">[16]</ref> uses a boolean algebra approach that loads the data base in main memory as a binary decision diagram, and searchs for maximal frequent item-sets by making iteratively the product of vectors representing the items. The reader can find further description on these algorithms in <ref type="bibr" target="#b11">[13,</ref><ref type="bibr" target="#b14">16]</ref>.</p><p>Generally, the number of association rules (induced by these algorithms) grows in an exponential manner with the number of data-rows (or the attributes). This can reach hundreds' of thousands using only some thousands of data rows <ref type="bibr" target="#b15">[17]</ref>. So, it becomes very hard to read and interpret them. The induced rules can be classified in three categories:</p><p>-Useful rules: that a human expert can understand and use.</p><p>-Trivial rules: that represents evidence. They are valid but never used.</p><p>-Weak rules: those are not acceptable by the expert and not understandable. To remedy to this problem, many methods were proposed <ref type="bibr" target="#b15">[17]</ref>. The basic one consists of calculating a minimal set of rules that allows us to re-produce the others. This can be done while calculating the rules <ref type="bibr" target="#b16">[18]</ref>, or after that using the concept lattice <ref type="bibr" target="#b17">[19,</ref><ref type="bibr" target="#b18">20]</ref>. Another method searches only the rules where the condition or the result part matches a particular group of terms <ref type="bibr" target="#b19">[21]</ref>. Agrawal et al. <ref type="bibr" target="#b7">[9]</ref> maintains a partial order relation between the rules using two measures: the support and the confidence. Consider the rule: X → Y , where X (conditions) and Y (results) are sets of atomic propositions:</p><p>-Support (X → Y): the percentage of examples satisfying both X and Y.</p><p>-Confidence (X → Y): the number of examples satisfying both X and Y, devided by the number of examples satisfying only the condition X. The authors introduce two thresholds defined by the end-user:</p><p>-MinSup: the minimal value of the rule support that can be accepted.</p><p>-MinConf: the minimal value of the rule confidence that can be accepted. The method discards all the item-sets with a support less than MinSup, and all the rules with a confidence less than MinConf.</p><p>To filter more efficiently the association rules, some authors <ref type="bibr" target="#b20">[22]</ref> introduce the conviction measure and <ref type="bibr" target="#b15">[17]</ref> propose five different measures like the benefit (interest) and the satisfaction.</p><p>In this paper, we introduce a new mesure: the gain. It combines the support of the rule with the length of the itemset. This is done thanks to the theory of Formal Concept Analysis <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, since we suggest that an item-set is completely represented by a formal concept as a couple of intent (the classic item-set) and extent (its support).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Basics in Formal Concept Analysis</head><p>Let O be a set of objects (or examples), P a set of properties (or attributes) and R a binary relation defined between O and P <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. The set of objects which have all properties in B :</p><formula xml:id="formula_0">B3={o∈O | oRp for all p∈B} (2)</formula><p>The couple of operators (4,3) is a Galois Connection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 3 [3]:</head><formula xml:id="formula_1">A formal concept of the context (O, P, R) is a pair (A, B) with A⊆O, B⊆P, A4=B and B3=A. (<label>3</label></formula><formula xml:id="formula_2">)</formula><p>We call A the extent and B the intent of the concept (A, B).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 4 [3]:</head><p>The set of all concepts of the context (O, P, R) is denoted by (O, P, R). An ordering relation (&lt;&lt;) is easily defined on this set of concepts by : </p><formula xml:id="formula_3">(A1, B1) &lt;&lt; (A2, B2) ⇔ A1⊆A2 ⇔ B2⊆B1 (4) P A B C D O o1 1 1 0 0 o2 1 1 0 0 o3 0 1 1 0 o4 0 1 1 1 o5 0 0 1<label>1</label></formula><formula xml:id="formula_4">1.a Formal context (O, P, R) FC7 FC3 FC1 A, B, C, D ∅ FC8 B, C, D o4 C, D o4, o5 FC6 B, C o3, o4 FC5 A, B o1, o2 FC4 C o3, o4, o5 B o1, o2, o3, o4 FC2 ∅ o1,o2, o3, o4, o5<label>1</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Basic Theorem for Concept Lattices [3] :</head><p>( (O, P, R), &lt;&lt;) is a complete lattice. It is called the concept lattice (or GALOIS lattice) of (O, P, R), for which infimum and supremum can be described as follow:  A coverage of the context (O, P, R) can be formed by the three concepts: {FC4, FC5 , FC6}. FC4 is the concept containing the items ({o1, o2}, {A, B}). FC5 is the concept containing the items ({o3, o4}, {B, C}). FC6 is the concept containing the items ({o4, o5}, {C, D}). The lattice constitutes also a concept coverage.</p><formula xml:id="formula_5">Sup i∈I (A i , B i )=((∪ i∈I A i )43, (∩ i∈I B i )) (5) Inf i∈I (A i , B i )=( ∩ i∈I A i , (∪ i∈I B i )<label>34</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discovery of Optimal Item-sets</head><p>The most computationally expensive step to discover association rules is the calculation of the frequent item-sets <ref type="bibr" target="#b13">[15]</ref>. Generally, this step consists of applying, iteratively, some heuristics to calculate candidate item-sets. At the iteration i, we combine the item-sets of the iteration i-1. After that, the support threshold (minsupp) is used to discard non-frequent candidates. The item-sets of iteration i-1, are also discarded. We keep the remaining item-sets of the latest iteration n (where n is the number of properties in the formal context).</p><p>In this step, we think that many approaches use, in an implicit manner, two basic characteristics of the association rules:</p><p>-General rules: we should ignore the item-sets with high support and low cardinality. General rules are week, since they are valid for numerous examples (or observations).</p><p>-Specific rules: we should ignore the item-sets with high cardinality and low support. Specific rules are week, since they are valid only for few examples (or observations).</p><p>The two characteristics are defined based on the cardinality of two special sets:</p><p>-Support: that is the cardinality of the set of examples verifying the rule. In Formal Concept Analysis, this represents the extent of a formal concept. -Cardinality of the Item-set: that is the number of properties in the item-set.</p><p>In Formal Concept Analysis, this represents the intent of a formal concept.</p><p>Consequently, we think that an association rule should not be represented only by the item-set (that is the intent).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>An association rule is completely and explicitly represented by a formal concept (as a couple of intent and extent).</head><p>Also, we think that the selection of "good" item-sets should not be based only on the support (that is the cardinality of the extent). A powerful selection should be based on the cardinalities of both the extent and the intent of its formal concept.</p><p>To formalize the new criterions, we give the following definitions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 7:</head><p>Let FC i = (A i , B i ) be a formal concept. We define:</p><p>-Length of a concept FC i : the number of properties in the intent B i of the concept.</p><p>-Width of a concept FC i : the number of examples in the extent A i of the concept.</p><p>-Gain of a concept: is a function of the width and the length, given by:</p><formula xml:id="formula_6">Gain(FC i )=width(FC i )*length(FC i )-[width(FC i )+length(FC i )]<label>(7)</label></formula><p>In a manner that the gain function cannot be maximised when one of the two parameters is low and the other is high. It becomes maximal only when the two parameters are high (a lot of examples and a lot of properties). Hence, we are sure that the rule is not a general one, neither a specific one. Note also that the Width indicates the support of the itemset (or the corresponding association rule).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 8 [4]:</head><p>A formal concept FC i = (A i , B i ) containing a couple (o, p) is said to be optimal if it maximizes the gain function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 9 [4]:</head><p>A coverage CV={ FC 1 , FC 2 , ..., FC k } of a context (O, P, R) is optimal if it is composed by optimal concepts.</p><p>Example: Figure <ref type="figure" target="#fig_3">2</ref> (graphic 2.b) represents the optimal concept FC5 containing the couple (o3, B) and having the gain value 0. Figure <ref type="figure" target="#fig_3">2</ref> (graphic 2.c) represents the non optimal concept FC2 containing the couple (o3, B) and having the gain value -1.</p><p>The optimal coverage of the context (O, P, R) is formed by three optimal concepts: {FC4., FC5 , FC6}. FC4 is the concept containing the items ({o1, o2}, {A, B}). FC5 is the concept containing the items ({o3, o4}, {B, C}). FC6 is the concept containing the items ({o4, o5}, {C, D}).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Heuristic Searching for Optimal Concept</head><p>The pseudo-concept containing the couple (o, p), introduced in definition 5, is the union of all the concepts containing (o, p). The importance of the notion of pseudoconcept, indicated by PCF, is that it can be calculated in a simple manner as the restriction of the relation R by the set of objects/examples described by p, so {p}4, and the set of properties describing the object o, so {o}3. Where <ref type="bibr" target="#b3">(4,</ref><ref type="bibr" target="#b2">3)</ref> is the Galois connection of the context (O, P, R).</p><p>Once we obtain the pseudo-concept PCF, two cases are considered:</p><p>-Case 1: PCF forms a formal concept. This means that no zero is found in the relation/matrix representing PCF. Then, PCF is the optimal concept that we are looking for. The algorithm stops. -Case 2: PCF is not a formal concept. This means, simply, that we can find some zero entries in the relation/matrix representing PCF. In this case, we will look for more restraint pseudo-concepts within the pseudo-concept PCF. So, we consider the pseudo-concepts containing the couples like (X, p) or (o, Y). These concepts contain, certainly, the couple (o, p). The heuristic that we use here is that: the optimal concept is included in the optimal pseudoconcept. So, we should calculate the Gain value of the different pseudoconcepts containing the couples like (X, p) or (o, Y). Then we choose the pseudo-concept having the greatest value of the Gain function to be the new PCF. This heuristic procedure (of case 2) is repeated until we meat the case 1: until PCF becomes a formal concept. To calculate the gain of a pseudo-concept, we introduce the general form of the previous function: Definition 10: Let PCF i = (A i , B i , R i ) be a pseudo-concept, where R i is the restriction of the binary relation R, to the subsets A i and B i . We define the: -Length of a pseudo-concept PCF i : the number of properties in B i .</p><p>-Width of a pseudo-concept PCF i : the number of examples in A i .</p><p>-Size of a pseudo-concept PCF i : the number of couples (of values equal to 1) in the pseudo-concept. When PCF i is a formal concept, we have:</p><formula xml:id="formula_7">) ( ) ( ) ( PCF Width PCF Length PCF Size × =<label>(8)</label></formula><p>-Gain of a pseudo-concept: is a function of the width, the length and the size given by ( <ref type="formula">9</ref>) :</p><formula xml:id="formula_8">( ) ( ) [ ] ) ( ) ( ) ( ) ( ) ( ) ( PCF Width PCF Length PCF Size PCF Width PCF Length PCF Size PCF Gain + − × ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × =</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Algorithm for Optimal Coverage</head><p>The problem of covering a binary relation by a set of optimal concepts is known as the problem of covering a binary matrix by a number of its complete sub-matrix. A complete sub matrix is a matrix where all its entries are equal to '1'. This was found as an NP-hard problem. Here, we use a polynomial heuristic solution.</p><p>Let R be the binary relation to cover. The heuristic solution consists of dividing R into p packages (subsets): P1, ..., Pp. Each package represents one or more couples. The idea is to construct step by step the optimal coverage of R. In the first step, we cover the relation R 1 =P 1 by CV 1 . In the k th step, let R k-1 =P 1 ∪ ... ∪ P k-1 and let CV k-1 be its optimal coverage. The task is to construct the optimal coverage CV k of R k =R k-1 ∪ P k using only the initial coverage CV k-1 and the package P k . In the p th step, we obtain a set of concepts covering the relation R.  Example: Let R be the relation of figure <ref type="figure" target="#fig_1">1</ref>. R is partitioned into five packages: P1={o1}x{A, B}, P2={o2}x{A, B}, P3={o3}x{B, C}, P4={o4}x{B, C, D} and P5={o5}x{C, D}. Initially, R is an empty relation and in each step we add a package. Figure <ref type="figure" target="#fig_5">3</ref> presents the incrementation step when adding P5.</p><p>In this step R4 encloses the four rows P1, ..., P4. The initial optimal coverage CV4 encloses the formal concepts FC4=({o1, o2}, {A, B}) and FC5=({o3, o4}, {B, C}) and FC7=({o4}, {B, C, D}) (graphic 3.a). The package P5 encloses only two couples. The pseudo concept containing the couple (o5, C) represents the union of two formal concepts FC3 and FC6. So, the optimal concept containing (o5, C) and (o5, D) is the concept FC6. The concept FC7 is redundant. Thus, the final coverage of R contains the concepts FC4, FC5 and FC6.</p><p>In conclusion, from the data set of figure <ref type="figure" target="#fig_1">1</ref>, we discover three Item-sets : {A, B}, {B, C} and {C, D}. These Item-sets will be processed later by the Apriori algorithm, to calculate the corresponding association rules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head><p>The experiments are done with a PC PENTIUM II, 233 MHZ, 144 Mo of RAM and running under MS-Windows 98. The analyzed data sets are organized in two groups. The first group is made by five data sets representing biological sequences coding macromolecules (table <ref type="table" target="#tab_0">1</ref>). The first two data sets are proprietary sets for the TLR macromolecules. The three others are taken from the machine-learning repository at UCI University. Classic tables of data make the second group of data sets (table <ref type="table" target="#tab_1">2</ref>). The first two data sets are also taken from the machine-learning repository at UCI University. The other sets are taken from a proprietary database.  To evaluate the performance of the proposed method (IAR: Incremental induction of Association Rules), we compare it to the existing approach Aprior <ref type="bibr" target="#b7">[9,</ref><ref type="bibr" target="#b21">23]</ref>. The Apriori method was programmed in C++ and integrated in our software package. We use the two methods to analyze the ten data sets. We choose two sets of parameters. The first one with MinSup=0.6 and MinConf=0.5 is used for the biological data sets (table <ref type="table" target="#tab_0">1</ref>). Fig. <ref type="figure">4</ref>: Comparing the CPU time</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Comparing the CPU time</head><p>Figure <ref type="figure">4</ref> presents the CPU time measured in seconds for the three methods. We remark that the three methods keep the same behavior overall the data sets. The Apriori method takes the greater time, since it is based on an exhaustive approach to test all the possible combinations. The proposed method IAR takes always a CPU time better than Apriori. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Measuring the similarity of rules</head><p>Figure <ref type="figure" target="#fig_6">5</ref> presents the percentage of rules that are obtained by IAR in comparison with APRIORI. We remark that the percentage is always less than the half (0.5). We remark also that Apriori gives, always, a lot of rules compared to IAR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>In this paper, we presented a new Formal Concept Analysis approach to search for optimal item-sets. We gave two heuristic algorithms to build the coverage of formal concepts and to create optimal item-sets from data records. The association rules can be induced from the Item-sets in the same manner as Apriori <ref type="bibr" target="#b7">[9]</ref>. We assume that a Formal Concept generalizes the notion of Item-set, since it is more powerful and more explicit. In fact, we note that an item-set represents only the intent of a Formal Concept. Also, the support of an item-set is the cardinality of the concept extent. Hence, the Formal Concept encloses more information than the itemset itself.</p><p>The first experimentation of the proposed method is done on a variety of data sets having different data formats (structured and unstructured). We compared the proposed method with a known one: Apriori <ref type="bibr" target="#b7">[9]</ref>. We measured their CPU time, the number of association rules, and the percentage of similar rules. We remarked that Apriori gives a great number of rules and takes a great time. The proposed method IAR is faster than Apriori. It gives a low number of rules that have the advantage of being the common rules.</p><p>Future work will focus on the quality of the created association rules. In fact, we plan to study the significance of the discovered rules for human experts. For this reason we propose to present the rules concerning the TLR macromolecules to our biologists in order to get their points of view. Accordingly, we plan to study quality measures other than the support and the confidence <ref type="bibr" target="#b15">[17,</ref><ref type="bibr" target="#b20">22]</ref>, in order to identify the measure that gives expert-acceptable rules.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Definition 1 [ 3 ]</head><label>13</label><figDesc>: A formal context (O, P, R) consists of two sets O and P and a relation R between O and P. The elements of O are called the objects and the elements of P are called the properties of the context. In order to express that an object o is in a relation R with a property p, we write oRp or (o, p)∈R and read it as "the object o has the property p". Definition 2 [3]: For a set A⊆O of objects and a set B⊆P of properties, we define : The set of properties common to the objects in A : A4={p∈P | oRp for all o∈A} (1)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. An example of formal context (O, P, R) and its corresponding Concept Lattice.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>) (6) Example : Figure 1 (graphic 1.a) is used to illustrate the notion of formal context (O, P, R). This context includes five objects {o1, o2, o3, o4, o5} described by four properties {A, B, C, D}. The concept lattice of this context is drawn in Figure 1 (graphic 1.b). It contains eight formal concepts. Definition 5 [4]: Let (o, p) be a couple in the context (O, P, R). The pseudo-concept PC containing the couple (o, p) is the union of all the formal concepts containing (o,p). Definition 6 [4]: A coverage of a context (O, P, R) is defined as a set of formal concepts CV={RE 1 , RE 2 , ..., RE n } in (O, P, R), such that any couple (o, p) in the context (O, P, R) is included in at least one concept of CV.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. An example of pseudo concept, optimal concept and non optimal concept containing the couple (o3,B). Example: To illustrate the notion of pseudo-concept and coverage, we consider the same formal context (O, P, R) from figure 1. Figure 2 (graphic 2.a) represents the pseudo-concept containing the couple (o3, B). It is the union of the concepts FC2 and FC5 (see figure 1).A coverage of the context (O, P, R) can be formed by the three concepts: {FC4, FC5 , FC6}. FC4 is the concept containing the items ({o1, o2}, {A, B}). FC5 is the concept containing the items ({o3, o4}, {B, C}). FC6 is the concept containing the items ({o4, o5}, {C, D}). The lattice constitutes also a concept coverage.</figDesc><graphic coords="5,49.79,139.61,360.03,124.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Algorithm</head><label></label><figDesc>for building the Coverage Begin Let R be partitioned to p packages P 1 , ..., P p . Let CV 0 :=∅. FOR k=1 to p DO Sort the couples of P k by the pertinence of their pseudo-concepts While (P k ≠∅) Do -Select a couple (a, b) in P k by the sorted order of the gain function (9) -Search PC : the pseudo-concept containing (a, b) within R k =CV k-1 ∪P k -Search FC: the optimal concept containing (a, b) within PC CV k :=(CV k-1 -{r∈CV k-1 / r⊆FC }) ∪{FC}: Delete all the redundant concepts from CV k P k :=P k -{(X,Y)∈P k /(X,Y)∈FC} End While End FOR End.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Illustration of the algorithm for concept coverage.</figDesc><graphic coords="8,62.25,399.33,320.69,130.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 5 :</head><label>5</label><figDesc>Fig. 5: Similarity between the knowledge bases of the two methods</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Properties of the biological data sets.</figDesc><table><row><cell>Biological Data Sets</cell><cell>Identifier</cell><cell>Rows</cell><cell>Columns</cell></row><row><cell>2 Far a way TLR Families</cell><cell>DB1</cell><cell>763</cell><cell>16</cell></row><row><cell>6 Near TLR Families</cell><cell>DB2</cell><cell>619</cell><cell>14</cell></row><row><cell>Promoter gene</cell><cell>DB3</cell><cell>106</cell><cell>11</cell></row><row><cell>Junction gene 3'</cell><cell>DB4</cell><cell>1150</cell><cell>19</cell></row><row><cell>Junction gene 5'</cell><cell>DB5</cell><cell>500</cell><cell>10</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Properties of the data sets having standard format.</figDesc><table><row><cell>Standard Data Sets</cell><cell>Identifier</cell><cell>Rows</cell><cell>Columns</cell></row><row><cell>Tic Tac Toe</cell><cell>DB6</cell><cell>958</cell><cell>10</cell></row><row><cell>Mushroom</cell><cell>DB7</cell><cell>1191</cell><cell>23</cell></row><row><cell>T20i4d1k</cell><cell>DB8</cell><cell>1000</cell><cell>9</cell></row><row><cell>T10I4D</cell><cell>DB9</cell><cell>5000</cell><cell>9</cell></row><row><cell>T20I6D</cell><cell>DB10</cell><cell>1000</cell><cell>9</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>The second one with MinSup=0.35 and MinConf=0.75 is used for the standard data sets (table2).</figDesc><table><row><cell>CPU time (sec)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>CPU time (sec)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>DB1</cell><cell>DB2</cell><cell>DB3</cell><cell>DB4</cell><cell>DB5</cell><cell></cell><cell>DB6</cell><cell>DB7</cell><cell>DB8</cell><cell>DB9</cell><cell>DB10</cell></row><row><cell>IAR</cell><cell>3930</cell><cell>304</cell><cell>9</cell><cell>18225</cell><cell>459</cell><cell>IAR</cell><cell>11480</cell><cell>32825</cell><cell>9587</cell><cell>71898</cell><cell>11983</cell></row><row><cell>Apriori</cell><cell>12451</cell><cell>916</cell><cell>12</cell><cell>22135</cell><cell>875</cell><cell>Apriori</cell><cell>17929</cell><cell>51268</cell><cell>14973</cell><cell>112295</cell><cell>18715</cell></row><row><cell></cell><cell cols="4">MinSup=0.6 and MinConf=0.5</cell><cell></cell><cell></cell><cell cols="4">MinSup=0.35 and MinConf=0.75</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_0">Mondher Maddouri</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Data mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Adriaans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zantinge</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1996">1996</date>
			<publisher>Addison Wesley Longman</publisher>
			<pubPlace>England</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Restructuring Lattice Theory : an Approach Based on Hierarchies of Concepts</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wille</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Symposium on Ordered Sets</title>
				<meeting>Symposium on Ordered Sets<address><addrLine>Dordrecht-Boston</addrLine></address></meeting>
		<imprint>
			<publisher>reidel</publisher>
			<date type="published" when="1982">1982</date>
			<biblScope unit="page" from="445" to="470" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Formal Concept Analysis: Mathematical Foundations</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ganter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wille</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>Springer</publisher>
			<pubPlace>Heidelberg-Berlin-New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Optimal rectangular decomposition of a finite binary relation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jaoua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Beaulieu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Belkhiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deshernais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reguig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Discrete Mathematics (sixth conference</title>
				<meeting><address><addrLine>Vancouver-Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1992-06">June 1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Why can Concept Lattice Support Knowledge Discovery in DataBases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wille</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">First Workshop on Formal Concept Analysis Advances for KDD</title>
				<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Contribution to Concept Learning : an Incremental approach for Inducing Production Rules from examples</title>
		<author>
			<persName><forename type="first">M</forename><surname>Maddouri</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000-05">May 2000</date>
		</imprint>
		<respStmt>
			<orgName>university of TUNIS II</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD dissertation from the</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A Data Mining Approach based on Machine Learning Techniques to Classify Biological Sequences</title>
		<author>
			<persName><forename type="first">M</forename><surname>Maddouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Elloumi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge Based Systems Journal</title>
		<imprint>
			<date type="published" when="2002-03">March 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Mining association rules between sets of items in large databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Imielinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Swami</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD Int. Conf. on Management of Data</title>
				<imprint>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="207" to="213" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Scalable algorithms for association mining</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="372" to="390" />
			<date type="published" when="2000-06">May/June 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Efficiently mining long patterns from databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bayardo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD Int. Conf. on Management of Data</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="85" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Pincer search: a new algorithm for discovering the maximum frequent set</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Kedem</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Extending Database Technology</title>
				<imprint>
			<date type="published" when="1998-03">March 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Efficiently mining maximal frequent itemsets</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gouda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Int. Conf. on Data Mining</title>
				<imprint>
			<date>November 01</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Mafia: a maximal frequent itemset algorithm for transactional databases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Calimlim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gehrke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Data Engineering</title>
				<imprint>
			<date type="published" when="2001-04">April 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Depth first generation of long patterns</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V V</forename><surname>Prasad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Knowledge Discovery and Data Mining</title>
				<imprint>
			<date type="published" when="2000-08">August 2000</date>
			<biblScope unit="page" from="108" to="118" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Mining maximal frequent item-sets by a Boolean based approach</title>
		<author>
			<persName><forename type="first">A</forename><surname>Salleb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Maazouzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vrain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conf. on Artificial Intelligence</title>
				<meeting><address><addrLine>Lyon France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002-07">July 2002</date>
			<biblScope unit="page" from="285" to="289" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Text mining by combining association rules and statistics</title>
		<author>
			<persName><forename type="first">H</forename><surname>Cherfi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Toussaint</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Workshop on Text Mining CIFT-02</title>
				<meeting><address><addrLine>Hammamet -Tunisia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">October 21-23, 2002</date>
			<biblScope unit="page" from="67" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Familles minimales d&apos;implication informatives resultant d&apos;un tableau de données binaries</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Guigues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Duquenne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mathematiques, informatique et Sciences Humaines</title>
		<imprint>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="page" from="5" to="18" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Intelligent structuring and reducing of association rules with formal concept analysis</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stumme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taouil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bastide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Pasquier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lakhal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Lecture Notes in Advances in Artificial Intelligence, KI-2000 Proceedings</title>
				<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="volume">2174</biblScope>
			<biblScope unit="page" from="335" to="350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Building and interpreting term dependencies using association rules extracted from Galois lattices</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Toussaint</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Simon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Content-Based Multimedia Information Access</title>
				<meeting><address><addrLine>Paris France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="1686" to="1693" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Finding interesting rules from large sets of discovered association rules</title>
		<author>
			<persName><forename type="first">M</forename><surname>Klemettinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Manilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ronkainen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Verkamo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Knowledge Management</title>
				<meeting><address><addrLine>Gaithersburg USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="401" to="407" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Mining the most interesting rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bayardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining</title>
				<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="145" to="154" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Zighed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Auray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Duru</surname></persName>
		</author>
		<title level="m">SIPINA: méthode et logiciel</title>
				<meeting><address><addrLine>Lyon-France</addrLine></address></meeting>
		<imprint>
			<publisher>Alexandre Lacassagne</publisher>
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Supervised and Unsupervised Discretization of Continuous Features</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dougherty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sahami</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int. Conf. on Machine Learning</title>
				<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
